Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python CouchDB can't save dict created from feedparser entry? (no attribute 'read')

I have a script that I want to read entries in an RSS feed and store the individual entries in JSON format into a CouchDB database.

The interesting part of my code looks something like this:

Feed = namedtuple('Feed', ['name', 'url'])

couch = couchdb.Server(COUCH_HOST)
couch.resource.credentials = (COUCH_USER, COUCH_PASS)

db = couch['raw_entries']

for feed in map(Feed._make, csv.reader(open("feeds.csv", "rb"))):
    d = feedparser.parse(feed.url)
    for item in d.entries:
        db.save(item)

When I try to run that code, I get the following error from the db.save(item):

AttributeError: object has no attribute 'read'

OK, so I then did a little debugging...

for feed in map(Feed._make, csv.reader(open("feeds.csv", "rb"))):
    d = feedparser.parse(feed.url)
    for item in d.entries:
        print(type(item))

results in <class 'feedparser.FeedParserDict'> -- ahh, so feedparser is using its own dict type... well, what if I try explicitly casting it to a dict?

for feed in map(Feed._make, csv.reader(open("feeds.csv", "rb"))):
    d = feedparser.parse(feed.url)
    for item in d.entries:
        db.save(dict(item))

Traceback (most recent call last):
  File "./feedchomper.py", line 32, in <module>
    db.save(dict(item))
  File "/home/dealpref/lib/python2.7/couchdb/client.py", line 407, in save
_, _, data = func(body=doc, **options)
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 399, in post_json
status, headers, data = self.post(*a, **k)
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 381, in post
**params)
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 419, in _request
credentials=self.credentials)
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 239, in request
    resp = _try_request_with_retries(iter(self.retry_delays))
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 196, in _try_request_with_retries
    return _try_request()
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 222, in _try_request
    chunk = body.read(CHUNK_SIZE)
AttributeError: 'dict' object has no attribute 'read'

w-what? That doesn't make sense, because the following works just fine and the type is still dict:

some_dict = dict({'foo': 'bar'})
print(type(some_dict))
db.save(some_dict)

What am I missing here?

like image 476
ashgromnies Avatar asked Mar 06 '26 22:03

ashgromnies


1 Answers

I found a way by serializing the structure to JSON, then back to a Python dict that I pass to CouchDB -- which will then reserialize it back to JSON to save(yeah, weird and not favorable, but it works?)

I had to do a custom serializer method for dumps because the repr of a time_struct can't be eval'd.

Source: http://diveintopython3.org/serializing.html

Code:

#!/usr/bin/env python2.7

from collections import namedtuple
import csv
import json
import time

import feedparser
import couchdb

def to_json(python_object):
    if isinstance(python_object, time.struct_time):
        return {'__class__': 'time.asctime',
                '__value__': time.asctime(python_object)}

    raise TypeError(repr(python_object) + ' is not JSON serializable')

Feed = namedtuple('Feed', ['name', 'url'])

COUCH_HOST = 'http://mycouch.com'
COUCH_USER = 'user'
COUCH_PASS = 'pass'

couch = couchdb.Server(COUCH_HOST)
couch.resource.credentials = (COUCH_USER, COUCH_PASS)

db = couch['raw_entries']

for feed in map(Feed._make, csv.reader(open("feeds.csv", "rb"))):
    d = feedparser.parse(feed.url)
    for item in d.entries:
        j = json.dumps(item, default=to_json)
        db.save(json.loads(j))
like image 150
ashgromnies Avatar answered Mar 08 '26 12:03

ashgromnies