Comments (12)
I solved this, or at least, found a workaround.
Surround your call with a try and except like this:
message_data = b'\r\n'.join(lines)
try:
mail = mailparser.parse_from_bytes(message_data[b"RFC822"])
except Exception as e:
print('This mail has cirillic characters. Trying to parse from string...')
try:
mail = mailparser.parse_from_string(message_data[b"RFC822"].decode('ISO-8859-1'))
except Exception as e:
print('This mail is corrupted and cannot be parsed: %s' % str(e))
pass
This way, if the bytes parser fails it will fall back to the string parser and you can change the encoding.
I've been able to parse every single mail thrown at my server this way.
from mail-parser.
Hi. Thank you!
Where should I paste this code?
from mail-parser.
Maybe this snippet works only for Python 3. Can you do a PR here?
from mail-parser.
Sorry, but what is PR?
My script works for Python 3.
from mail-parser.
It's a Pull Request: https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests
from mail-parser.
Sorry, it seems I wasn't receiving notifications for this issue correctly.
In your main.py, line 54, you have the following code:
mail = mailparser.parse_from_bytes(message_data[b"RFC822"])
Replace that line with the snippet I wrote, omitting the message_data = b'\r\n'.join(lines)
line.
I did not modify mail-parser, just coded a workaround that goes in my app code. I've never done a PR before so I'm not sure I can help, but if it may solve this issue for everyone I could try.
from mail-parser.
The develop branch doesn't have any issue.
I will release the new version soon.
$ python3.9 -m mailparser -f ~/Downloads/mail_raw -sa -ap ~/Downloads/test
from mail-parser.
This issue still seems to occur with mail-parser==3.15.0
and german umlauts like ä
, ü
, or ö
or wrongly decoded strings like ü
.
@fedelemantuano was this issue fixed with version 3.15.0
?
How to reproduce
Raw email data:
Subject: foobar
To: foobar@example
From: [email protected]
Content-Type: multipart/mixed; boundary=somecontent
--somecontent
Content-Disposition: attachment; filename="Liste übersprungener 1.txt"
Content-Transfer-Encoding: base64
Content-Type: text/plain; charset=utf-8; name="Liste übersprungener 1.txt"
c3R1ZmY=
--somecontent--
Ready to use snippet:
import mailparser
_header = b'Subject: foobar\nTo: foobar@example\nFrom: [email protected]\nContent-Type: multipart/mixed; boundary=somecontent'
_body = b'--somecontent\nContent-Disposition: attachment; filename="Liste \xc3\xbcbersprungener 1.txt"\nContent-Transfer-Encoding: base64\nContent-Type: text/plain; charset=utf-8; name="Liste \xc3\xbcbersprungener 1.txt"\n\nc3R1ZmY=\n--somecontent--\n'
mailparser.parse_from_bytes(_header + b'\n\n' + _body)
Output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../mailparser/mailparser.py", line 118, in parse_from_bytes
return MailParser.from_bytes(bt)
File ".../mailparser/mailparser.py", line 241, in from_bytes
return cls(message)
File ".../mailparser/mailparser.py", line 138, in __init__
self.parse()
File ".../mailparser/mailparser.py", line 357, in parse
content_disposition = ported_string(
File ".../mailparser/utils.py", line 80, in wrapper
return normalize('NFC', func(*args, **kwargs))
File ".../mailparser/utils.py", line 114, in ported_string
return six.text_type(raw_data, encoding)
TypeError: decoding to str: need a bytes-like object, Header found
from mail-parser.
Please send me the raw mail, I can't test it from your snippet.
from mail-parser.
GitHub won't let me upload *.eml files, so i simply renamed it to txt: mail.txt
import mailparser
with open('mail.txt', 'rb') as infile:
text = infile.read()
mailparser.parse_from_bytes(text)
Returns the same issue as mentioned above.
from mail-parser.
Any progress on this?
from mail-parser.
I'm working on it. I will answer soon.
from mail-parser.
Related Issues (20)
- Drop simplejson requirement
- Multiple mail thread handling HOT 2
- When parsing eml attachment from Gmail, the attachment is being parsed as email instead as attachment HOT 2
- Ignore warnings - Email content 'calendar' not handled HOT 4
- Mime Header Decoding (RFC 2047) does not correctly resolve in case the display name contains an encoded comma
- from_ attribute contains two tuples for one sender
- mail format
- Is this able to parse latest reply? HOT 1
- Handle multi part/ alternative text emails? HOT 1
- Only parses Undelivered message for emails with bounced emails
- Make the specific receiver of the email available in a field HOT 1
- Extracting mail signature HOT 1
- Disable json indent by default
- newline breaks test
- Empty metadata when using mail-parser to parse .msg outlook emails (email-outlook-message-perl 0.918-2) HOT 1
- Issue when parsing an email message HOT 2
- CID not have a image data
- UnicodeDecodeError when parsing email with "\u" in its body HOT 2
- parse_from_bytes() not working on BytesIO() object HOT 1
- MailParserReceivedParsingError when parsing email domains ending in .id (Indonesia's .co.id and similar)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mail-parser.