Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does imaplib return a lone parenthesis after each email?

Overview: When fetching an email via imap with imaplib in Python, the returned payload is a list of email tuples...with a single byte b')' inbetween each tuple.

I use your standard imaplib fetch call given a comma-separated byte string of UIDs:

resp, data = mailbox.fetch(b'1,2,3'), 'RFC822')

However, data looks like:

[ 
  (b'1 (BODY[HEADER.FIELDS (DATE TO CC FROM SUBJECT)] {181}',
    b'Date: Thu, 18 Jul 2013 16:08:07 -0700 From: Blah Blah\r\n\r\n'
  ),
  b')',
  (b'1 (BODY[HEADER.FIELDS (DATE TO CC FROM SUBJECT)] {181}',
    b'Date: Thu, 18 Jul 2013 16:08:07 -0700 From: Blah Blah\r\n\r\n'
  ),
  b')',
  ...
]

So now, when I iterate over that list, I have to skip over every other element to avoid the b')'. Obviously, that's not hard...but it feels like I'm doing something wrong, or that imaplib should be better parsing this closing parenthesis.

Why is that parenthesis there, and is there a way to use IMAP more correctly to eliminate it?


Thoughts:

Looks like the closing parenthesis is a built-in part of RFC822 but from what I understand of that spec (which isn't much, honestly) the parenthesis isn't supposed to come until the END of the payload, which to my mind would be after all the messages are read.

Edit: By the way, this parenthesis shows up whether or not you're fetching multiple messages...even if you follow imaplib's own example, you get back data looking like [(headers, payload), b')']

like image 681
tyleha Avatar asked Oct 15 '25 15:10

tyleha


1 Answers

The simple answer is that imaplib is not a parser, but a simple low-level library. When you make a fetch request, the responses for each message look something like this:

* 27 FETCH (A A-DATA B B-DATA C C-DATA)\r\n

That is, the response to fetch is a list of pairs of data items. The parentheses indicate to a parser that this is a (possibly) variable length list of data items. Specifically, when you request a body, it looks like this, conceptually:

* 27 FETCH (RFC822 BODYDATA)\r\n

When an IMAP server wishes to send a blob of data that might contain newlines or other weird characters, it uses an escaping mechanism that the standard refers to as literals. It provides a byte count in curly brackets, a new line pair, then a raw dump of that many bytes. Then the server goes back to what the command it was sending. This looks like this:

* 27 FETCH (RFC822 {457}\r\n___457 bytes of body data here___)\r\n

imaplib only knows about newlines and literals, but it otherwise doesn't parse responses beyond telling you what kind it is. It breaks it up into pieces like this:

* 27 FETCH (RFC822 {457}\r\n
___457 bytes of body data here___
)\r\n

These three lines are the three parts of the response you see: The stuff before the literal, the literal itself, and the stuff after the literal. In this case, it's a single parenthesis to close the list that got opened in the first part. If you requested multiple fetch pieces that needed literals to transmit, you'd see even more parts.

I for one would love if imaplib was more of a parser, but it really is just a low level access mechanism. For more complex work, a parser needs to be built on top of it.

like image 80
Max Avatar answered Oct 18 '25 03:10

Max



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!