Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse excel attachment from .eml file in python

I'm trying to parse a .eml file. The .eml has an excel attachment that's currently base 64 encoded. I'm trying to figure out how to decode it into XML so that I can later turn it into a CSV I can do stuff with.

This is my code right now:

import email

data = file('Openworkorders.eml').read()
msg = email.message_from_string(data)

for part in msg.walk():
    c_type = part.get_content_type()
    c_disp = part.get('Content Disposition')


    if part.get_content_type() == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
        excelContents = part.get_payload(decode = True)

        print excelContents

The problem is

When I try to decode it, it spits back something looking like this.

enter image description here

I've used this post to help me write the code above.

How can I get an email message's text content using Python?

Update:

This is exactly following the post's solution with my file, but part.get_payload() returns everything still encoded. I haven't figured out how to access the decoded content this way.

import email


data = file('Openworkorders.eml').read()
msg = email.message_from_string(data)
for part in msg.walk():
    if part.get_content_type() == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
        name = part.get_param('name') or 'MyDoc.doc'
        f = open(name, 'wb')
        f.write(part.get_payload(None, True)) 
        f.close()

        print part.get("content-transfer-encoding")
like image 818
Melody Anoni Avatar asked Oct 22 '25 01:10

Melody Anoni


1 Answers

As is clear from this table (and as you have already concluded), this file is an .xlsx. You can't just decode it with unicode or base64: you need a special package. Excel files specifically are a bit tricker (for e.g. this one does PowerPoint and Word, but not Excel). There are a few online, see here - xlrd might be the best.

like image 76
Josh Friedlander Avatar answered Oct 23 '25 17:10

Josh Friedlander