Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse Multi-Part Email with Sub-parts using Python

Tags:

python

I am using this function to parse an email. I am able to parse "simple" multi-part emails, but it produces an error (UnboundLocalError: local variable 'html' referenced before assignment) when the email defines multiple boundaries (sub-parts). I would like the script to separate the text and html portions and return only the html portion (unless there is no html portion, return the text).

def get_text(msg):
    text = ""
    if msg.is_multipart():
        for part in msg.get_payload():
            if part.get_content_charset() is None:
                charset = chardet.detect(str(part))['encoding']
            else:
                charset = part.get_content_charset()
            if part.get_content_type() == 'text/plain':
                text = unicode(part.get_payload(decode=True),str(charset),"ignore").encode('utf8','replace')
            if part.get_content_type() == 'text/html':
                html = unicode(part.get_payload(decode=True),str(charset),"ignore").encode('utf8','replace')
        if html is None:
            return text.strip()
        else:
            return html.strip()
    else:
        text = unicode(msg.get_payload(decode=True),msg.get_content_charset(),'ignore').encode('utf8','replace')
        return text.strip()
like image 852
Ryan Avatar asked Mar 22 '26 20:03

Ryan


1 Answers

Like the comment said you always check html but only declare it in one of the specific cases. Thats what the error is telling you, you reference html before assigning it. In python it is not valid to check if something is None if it hasn't been assigned to anything. For example open the python interactive prompt:

>>> if y is None:
...   print 'none'
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'y' is not defined

As you can see you cannot merely check for none to see if a variable exists. Back to your specific case.

You need to initially set html to None, and then later you will be checking if it is still None. i.e. edit your code like this:

def get_text(msg):
text = ""
if msg.is_multipart():
    html = None
    for part in msg.get_payload():
        if part.get_content_charset() is None:
            charset = chardet.detect(str(part))['encoding']
        else:
            charset = part.get_content_charset()
        if part.get_content_type() == 'text/plain':
            text = unicode(part.get_payload(decode=True),str(charset),"ignore").encode('utf8','replace')
        if part.get_content_type() == 'text/html':
            html = unicode(part.get_payload(decode=True),str(charset),"ignore").encode('utf8','replace')
    if html is None:
        return text.strip()
    else:
        return html.strip()
else:
    text = unicode(msg.get_payload(decode=True),msg.get_content_charset(),'ignore').encode('utf8','replace')
    return text.strip()

This explains a little more: http://code.activestate.com/recipes/59892-testing-if-a-variable-is-defined/

like image 65
joshcartme Avatar answered Mar 25 '26 10:03

joshcartme