I am using this function to parse an email. I am able to parse "simple" multi-part emails, but it produces an error (UnboundLocalError: local variable 'html' referenced before assignment) when the email defines multiple boundaries (sub-parts). I would like the script to separate the text and html portions and return only the html portion (unless there is no html portion, return the text).
def get_text(msg):
text = ""
if msg.is_multipart():
for part in msg.get_payload():
if part.get_content_charset() is None:
charset = chardet.detect(str(part))['encoding']
else:
charset = part.get_content_charset()
if part.get_content_type() == 'text/plain':
text = unicode(part.get_payload(decode=True),str(charset),"ignore").encode('utf8','replace')
if part.get_content_type() == 'text/html':
html = unicode(part.get_payload(decode=True),str(charset),"ignore").encode('utf8','replace')
if html is None:
return text.strip()
else:
return html.strip()
else:
text = unicode(msg.get_payload(decode=True),msg.get_content_charset(),'ignore').encode('utf8','replace')
return text.strip()
Like the comment said you always check html but only declare it in one of the specific cases. Thats what the error is telling you, you reference html before assigning it. In python it is not valid to check if something is None if it hasn't been assigned to anything. For example open the python interactive prompt:
>>> if y is None:
... print 'none'
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'y' is not defined
As you can see you cannot merely check for none to see if a variable exists. Back to your specific case.
You need to initially set html to None, and then later you will be checking if it is still None. i.e. edit your code like this:
def get_text(msg):
text = ""
if msg.is_multipart():
html = None
for part in msg.get_payload():
if part.get_content_charset() is None:
charset = chardet.detect(str(part))['encoding']
else:
charset = part.get_content_charset()
if part.get_content_type() == 'text/plain':
text = unicode(part.get_payload(decode=True),str(charset),"ignore").encode('utf8','replace')
if part.get_content_type() == 'text/html':
html = unicode(part.get_payload(decode=True),str(charset),"ignore").encode('utf8','replace')
if html is None:
return text.strip()
else:
return html.strip()
else:
text = unicode(msg.get_payload(decode=True),msg.get_content_charset(),'ignore').encode('utf8','replace')
return text.strip()
This explains a little more: http://code.activestate.com/recipes/59892-testing-if-a-variable-is-defined/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With