Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saving IMAP messages with Python mailbox module

I'm downloading messages from IMAP with imaplib into a mbox (with mailbox module):

import imaplib, mailbox
svr = imaplib.IMAP4_SSL('imap.gmail.com')
svr.login('[email protected]', 'mypaswword')
resp, [countstr] = svr.select("[Gmail]/All Mail", True)

mbox = mailbox.mbox('mails.mbox')

for n in range(...):
  resp, lst1 = svr.fetch(n, 'UID')    # the UID of the message
  resp, lst2 = svr.fetch(n, '(RFC822)')   # the message itself
  mbox.add(lst2[0][1])      # add the downloaded message to the mbox
  #
  # how to store the UID of this current mail inside mbox? 
  #

Let's download the mails with UID = 1 .. 1000. Next time, I would like to begin at the 1001th message and not from the 1st. However, mailbox.mbox does not store the UID anywhre. So next time I will open the mbox file, it will be impossible to know where we stopped.

Is there a natural way with the module mailbox to store the UID of the emails?

Or maybe I don't use mailbox + imaplib the way it should ?

like image 603
Basj Avatar asked Nov 28 '25 01:11

Basj


2 Answers

I hope it will be useful:

1) libraries and environment Win7 Anaconda3-4.3.1-Windows-x86_64.exe (new is available but that what I have used

2) To list all your mailboxes:

import getpass, imaplib, sys

def main():
      hostname = 'my.mail.server'
      username = 'my_user_name'
      m = imaplib.IMAP4_SSL(hostname)
      m.login(username, 'passowrd')

   try:
      print('Capabilities:', m.capabilities)
      print('Listing mailboxes ')
      status, data = m.list()
      print('Status:', repr(status))
      print('Data:')
      for datum in data:
         print(repr(datum))

   finally:
      m.logout()

if __name__ == '__main__':
   main()

3) Using generated above information we can dump all email messages from mail server to the directories:

import getpass, imaplib, sys, email, os , io
import codecs

BASE_NAME = 'msg_no_'
BASE_DIR = 'D:/my_email/'

def writeTofile(mailDir, partOfName, msg ):

   ## no need of dos backslash -- newDir = BASE_DIR + mailDir.replace('/', '\\')

   newDir = BASE_DIR + mailDir

   if not os.path.exists(newDir):
       os.makedirs(newDir)

   os.chdir(newDir)

   # print('Dir:' + os.getcwd() )

   file_name = BASE_NAME + partOfName  + '.eml'

   # print('Write:' + file_name)

   fw = open(newDir + '/' + file_name,'w', encoding="utf-8")
   fw.write( msg )
   fw.close()

   return


def processMailDir(m, mailDir):

   print('MailDIR:' + mailDir)

   m.select(mailbox=mailDir, readonly=True)
   typ, data = m.search(None, 'ALL')

   for num in data[0].split():
      typ, data = m.fetch(num, '(RFC822)')
      msg = email.message_from_bytes(data[0][1])

      smsg = msg.as_bytes().decode(encoding='ISO-8859-1')

      writeTofile(mailDir, num.decode(), smsg )

   m.close()

   return


def main():

   if len(sys.argv) != 3:
      hostname = 'my.mail.server'
      username = 'my_username'
      m = imaplib.IMAP4_SSL(hostname)
      m.login(username, 'password')

   else:
      hostname, username = sys.argv[1:]
      m = imaplib.IMAP4_SSL(hostname)
      m.login(username, getpass.getpass())

   try:
      print('Start...')

      processMailDir(m, 'INBOX')
      processMailDir(m, 'Sent')
      processMailDir(m, 'archive/2013/201301')
      processMailDir(m, 'archive/2013/201302')
# etc.. etc.. simple as it can be but not simpler
      print('Done...')

   finally:
      m.logout()

if __name__ == '__main__':
   main()

Above will dump your emails to: D:\my_email\INBOX\msg_no_1.eml ... msg_no203.eml

then you need this secret to open eml's on windows:

Administrator: cmd.com:

assoc .eml=Outlook.File.eml
ftype Outlook.File.eml="C:\Program Files (x86)\Microsoft Office\Office12\OUTLOOK.EXE" /eml "%1"

Dear stockoverflow censor - please be merciful, I would found above useful; for example this: smsg = msg.as_bytes().decode(encoding='ISO-8859-1') took a long to figure out.

like image 113
kris2k Avatar answered Nov 30 '25 14:11

kris2k


To answer your question: after staring at the docs for a long time I didn't see any cleanly way to do what you are looking for. If it is an absolute requirement that the UIDs be stored in the mbox file, then I'd suggest adding a custom UID header to the emails that you are storing:

message = email.message_from_string(lst2[0][1])
message.add_header("my_internal_uid_header", lst1[0][1])
mbox.add(message)

Now of course it is a HUGE pain to get the largest saved UID because you have to iterate through all the messages. I imagine that this would be really bad. If at all possible it would be better to store such information elsewhere.

Best of luck!

like image 36
JosiahDaniels Avatar answered Nov 30 '25 14:11

JosiahDaniels



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!