Basically I want to read all new emails from an inbox and put them in a database. The reason I use python is because it has imaplib, but I know nothing about it.
Currently, I have something like this :
def primitive_get_text_blocks(email_message_instance):
maintype = email_message_instance.get_content_maintype()
if maintype == 'multipart':
return_parts = ""
for part in email_message_instance.get_payload():
if part.get_content_maintype() == 'text':
return_parts+= " "+ part.get_payload()
return return_parts
elif maintype == 'text':
return email_message_instance.get_payload()
return ""
fromField=con.escape(email_message["From"])
contentField=con.escape(primitive_get_text_blocks(email_message))
primitive get_text_blocks is copy pasted from somewhere.
The result is that I get database entries like this :
From what I understand, that has something to do with being encoded in utf-7. So I changed to get_payload(decode=True), but that gives me byte-arrays. If I append another decode('utf-8'), it sometimes crashes with errors like
'codec error can't decode to ...'.
I don't know how encodings work, I only want a unicode string with the body of my email.
Why is there no simple convert(charset from, charset to)? How do I get a readable email body (and address?). I've discovered IMAP Fetch Encoding and using decode_header I got no further.
--
I assume encoding is the way bytes represent characters, so with that in mind, shouldn't decode take a byte array and spit out a string? and here on stack overflow I came across somebody claming it had something to do with beeing encoded with utf-8 and utf-7. What does that even mean?
I did google and there appear to be tons of duplicates but the answers they got didn't really help me out (I've tried most of them)
解決方案
Turns out it's quite easy. Even though all documentation points to the glorious past when the unicode function still was a real thing, 'str' does the same.
So to recap, you have to pass 'decode=True' with 'getPayload' and wrap that around a str(...,'utf-8').