wahlflo / eml_analyzer Goto Github PK
View Code? Open in Web Editor NEWA cli script to analyze an E-Mail in the EML format for viewing the header, extracting attachments, etc.
License: MIT License
A cli script to analyze an E-Mail in the EML format for viewing the header, extracting attachments, etc.
License: MIT License
It would be nice if the script can read the EML from standard input similar to how many other CLI tools work in Unix (and to some extent PowerShell).
$ cat my_cat_photos.eml | emlAnalyzer
# a more practical example:
$ curl -X GET 'https://graph.microsoft.com/v1.0/me/messages/.../attachments/.../$value' | emlAnalyzer
This would eliminate the need to have the EML saved on disk.
This tool is excellent for interactive usage. It would be really nice to be able to produce structured output (e.g. JSON, YAML. XML)
Example:
$ emlAnalyzer --input my_cat_photos.eml --header --tracking --attachments --text --html --url --format=json
Output:
{
"headers": {
"Received": ["...", "...", "..."],
"From": ["Bob <[email protected]>"],
"To": ["Alice <[email protected]>"]
},
"attachments": [
{ "name": "meow.jpg", "disposition": "inline", "type": "image/jpg" },
{ "name": "cat.png", "disposition": "inline", "type": "image/png" }
],
"text": "hi Alice, attached are some cat photos. some more can be found at https://http.cat/200",
"html": "<html><body><p>hi Alice,</p><p>attached are some cat photos</p><p>some more can be found at <a href='https://http.cat/200'>https://http.cat/200</a></p></body></html>",
"urls": [
"https://http.cat/200"
],
"reloaded_content": []
}
Haven't really thought about the --extract-all
flag but perhaps that can be supported by adding a content
property in each attachments object containing its base64-encoded representation. Or not support extracting attachments when returning structured output.
Hello,
I ran the Remnux distribution on a virtual machine (Ubuntu-based distribution for Malware analysis)
The problem is that during installation I get this error: https://pastebin.com/raw/RigS9EuA
I have tried the following installation methods:
Please help me install
Hi,
I face a issue when parsing outgoing mails:
Extract from the eml-File (save-as via Thunderbird):
"...
This is a cryptographically signed message in MIME format.
--------------ms090908070501060903060609
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
ä
..."
If the plain text contains i. e. a "ä" the output character becomes cyrillic "д" independend from the chosen output format "--text" or "--text --format json".
In comparison an extract from a eml-File that get's parsed correctly:
"...
--eTqZtiOboXMORarM2jeks2PNUJpOw=_O7X
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
...
"
Any idea?
I add a test-file.
Regards,
MrChang.
test.zip
This is a very convenient and powerful tool library; but when I use it to parse emails(Chinese email or English email) in batches, the following errors occasionally occur:
File "D:\anaconda3\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\anaconda3\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\anaconda3\Scripts\emlAnalyzer.exe_main.py", line 7, in
File "D:\anaconda3\lib\site-packages\eml_analyzer\cli_script.py", line 66, in main
output_format.process_option_show_html(parsed_email=parsed_email)
File "D:\anaconda3\lib\site-packages\eml_analyzer\library\outputs\standard_output.py", line 62, in process_option_show_html
print(html)
UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 45613: illegal multibyte sequence
For coding problems, I don't have a solution,My execution environment is Windows.
Due to the confidentiality of the email content, I am sorry that I cannot provide.
v2.0.0 no longer works on Python 3.8 due to this issue: https://stackoverflow.com/q/63460126/404623
I can confirm the issue by running it inside Docker:
$ docker run -it -v "$PWD"/emails:/srv -w /srv python:3.8 /bin/bash -c "python3 -m pip install eml-analyzer && cat foo2.eml | emlAnalyzer"
...<snip>...
Traceback (most recent call last):
File "/usr/local/bin/emlAnalyzer", line 5, in <module>
from eml_analyzer.cli_script import main
File "/usr/local/lib/python3.8/site-packages/eml_analyzer/cli_script.py", line 7, in <module>
from eml_analyzer.library.outputs import AbstractOutput, StandardOutput, JsonOutput
File "/usr/local/lib/python3.8/site-packages/eml_analyzer/library/outputs/__init__.py", line 1, in <module>
from .abstract_output import AbstractOutput
File "/usr/local/lib/python3.8/site-packages/eml_analyzer/library/outputs/abstract_output.py", line 3, in <module>
from eml_analyzer.library.parser.parsed_email import ParsedEmail
File "/usr/local/lib/python3.8/site-packages/eml_analyzer/library/parser/__init__.py", line 1, in <module>
from .parsed_email import ParsedEmail, EmlParsingException, PayloadDecodingException
File "/usr/local/lib/python3.8/site-packages/eml_analyzer/library/parser/parsed_email.py", line 17, in <module>
class ParsedEmail:
File "/usr/local/lib/python3.8/site-packages/eml_analyzer/library/parser/parsed_email.py", line 34, in ParsedEmail
def get_error_messages(self) -> list[str]:
TypeError: 'type' object is not subscriptable
It works fine on Python 3.11:
$ docker run -it -v "$PWD"/emails:/srv -w /srv python:3.11 /bin/bash -c "python3 -m pip install eml-analyzer && cat foo2.eml | emlAnalyzer"
...<snip>...
=================
|| Structure ||
=================
|- multipart/mixed
| |- multipart/related
| | |- multipart/alternative
| | | |- text/plain
| | | |- text/html
| | |- image/png [1f438.png]
| |- image/png [cover.png]
As of right now, the supported versions of Python are 3.7+. I discovered the issue as my WSL runs Ubuntu 20.04 LTS which comes with Python 3.8.
It might be worth documenting and enforcing the version requirement of >=3.9 if that's what's desired, or implementing the workaround in the SO question if supporting Python <3.9 is necessary.
The changes to the URL extraction in v2.0.0 causes the results to have quotes or chunks of HTML at the end, and sometimes encoded HTML in the middle rather than the actual characters ( &
is pretty common in URLs). The outputs no longer include the actual URLs in some cases (see examples)
Text and HTML included for reference purposes. Notice the &
in some of the URLs:
==================================
|| URLs in HTML and text part ||
==================================
- https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcarleton.ca%2F&data=05%7C01%7Cadmin%40tq2zr.onmicrosoft.com%7C49360182de73427cd8e908dae851e772%7C71759330a027406e9082f1f64f1007b3%7C0%7C0%7C638077735753161743%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=49SuRO9jfbmE5QmGqq85RvUZgZnyJ6XgFjlD3V7duYw%3D&reserved=0"
- https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcarleton.ca%2F&data=05%7C01%7Cadmin%40tq2zr.onmicrosoft.com%7C49360182de73427cd8e908dae851e772%7C71759330a027406e9082f1f64f1007b3%7C0%7C0%7C638077735753161743%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=49SuRO9jfbmE5QmGqq85RvUZgZnyJ6XgFjlD3V7duYw%3D&reserved=0
- https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2F&data=05%7C01%7Cadmin%40tq2zr.onmicrosoft.com%7C49360182de73427cd8e908dae851e772%7C71759330a027406e9082f1f64f1007b3%7C0%7C0%7C638077735753161743%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YF3Fdvo0FyHQkXoHuhv3WzGzfhmqGjCRMdxUPIZsYvA%3D&reserved=0"
- https://carleton.ca/</a></span></div>
- https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2F&data=05%7C01%7Cadmin%40tq2zr.onmicrosoft.com%7C49360182de73427cd8e908dae851e772%7C71759330a027406e9082f1f64f1007b3%7C0%7C0%7C638077735753161743%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YF3Fdvo0FyHQkXoHuhv3WzGzfhmqGjCRMdxUPIZsYvA%3D&reserved=0
- https://twitter.com/<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2F&data=05%7C01%7Cadmin%40tq2zr.onmicrosoft.com%7C49360182de73427cd8e908dae851e772%7C71759330a027406e9082f1f64f1007b3%7C0%7C0%7C638077735753161743%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YF3Fdvo0FyHQkXoHuhv3WzGzfhmqGjCRMdxUPIZsYvA%3D&reserved=0>
- https://twitter.com/</a></span></span>
- https://carleton.ca/<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcarleton.ca%2F&data=05%7C01%7Cadmin%40tq2zr.onmicrosoft.com%7C49360182de73427cd8e908dae851e772%7C71759330a027406e9082f1f64f1007b3%7C0%7C0%7C638077735753161743%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=49SuRO9jfbmE5QmGqq85RvUZgZnyJ6XgFjlD3V7duYw%3D&reserved=0>
=================
|| Plaintext ||
=================
https://twitter.com/<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2F&data=05%7C01%7Cadmin%40tq2zr.onmicrosoft.com%7C49360182de73427cd8e908dae851e772%7C71759330a027406e9082f1f64f1007b3%7C0%7C0%7C638077735753161743%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YF3Fdvo0FyHQkXoHuhv3WzGzfhmqGjCRMdxUPIZsYvA%3D&reserved=0>
https://carleton.ca/<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcarleton.ca%2F&data=05%7C01%7Cadmin%40tq2zr.onmicrosoft.com%7C49360182de73427cd8e908dae851e772%7C71759330a027406e9082f1f64f1007b3%7C0%7C0%7C638077735753161743%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=49SuRO9jfbmE5QmGqq85RvUZgZnyJ6XgFjlD3V7duYw%3D&reserved=0>
[cid:d2eb1b04-21d5-4a18-bfdb-12c4c3e65b8a]
============
|| HTML ||
============
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
<span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);"><span class="x_elementToProof FluidPluginCopy" style="font-size:15px;font-family:"Segoe UI", "Segoe UI Web (West European)", "Segoe UI", -apple-system, BlinkMacSystemFont, Roboto, "Helvetica Neue", sans-serif;margin:0px;color:rgb(36, 36, 36);background-color:rgb(255, 255, 255)"><span style="font-size:12pt;font-family:Calibri, Arial, Helvetica, sans-serif;margin:0px;color:rgb(0, 0, 0);background-color:rgb(255, 255, 255)"><a href="https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2F&data=05%7C01%7Cadmin%40tq2zr.onmicrosoft.com%7C49360182de73427cd8e908dae851e772%7C71759330a027406e9082f1f64f1007b3%7C0%7C0%7C638077735753161743%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YF3Fdvo0FyHQkXoHuhv3WzGzfhmqGjCRMdxUPIZsYvA%3D&reserved=0" originalsrc="https://twitter.com/" shash="qzkaBC2sLxgP1TgKDzZQ22fgeb1b7BsI5lFHyR43eYo7m7MI1zAsITkXPZdZ6n1KIL8l5zsW9ZYym8Zh696/mItL4iYvJ0Ubwz1de7W+ONeLI4b2ew0tg4HzBWjz70b4QTUpxwi7deoO2c/HOahf1M884A1dPLtJtc8s+ZBrhcA=" target="_blank" rel="noopener noreferrer" data-auth="NotApplicable" data-safelink="true" data-linkindex="0" style="margin:0px" class="ContentPasted0">https://twitter.com/</a></span></span>
<div class="x_elementToProof FluidPluginCopy" style="font-size:15px;font-family:"Segoe UI", "Segoe UI Web (West European)", "Segoe UI", -apple-system, BlinkMacSystemFont, Roboto, "Helvetica Neue", sans-serif;margin:0px;color:rgb(36, 36, 36);background-color:rgb(255, 255, 255)">
<span style="font-size:12pt;font-family:Calibri, Arial, Helvetica, sans-serif;margin:0px;color:rgb(0, 0, 0);background-color:rgb(255, 255, 255)"><br class="ContentPasted0">
</span></div>
<div class="x_elementToProof FluidPluginCopy" style="font-size:15px;font-family:"Segoe UI", "Segoe UI Web (West European)", "Segoe UI", -apple-system, BlinkMacSystemFont, Roboto, "Helvetica Neue", sans-serif;margin:0px;color:rgb(36, 36, 36);background-color:rgb(255, 255, 255)">
<span style="font-size:12pt;font-family:Calibri, Arial, Helvetica, sans-serif;margin:0px;color:rgb(0, 0, 0);background-color:rgb(255, 255, 255)"><a href="https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcarleton.ca%2F&data=05%7C01%7Cadmin%40tq2zr.onmicrosoft.com%7C49360182de73427cd8e908dae851e772%7C71759330a027406e9082f1f64f1007b3%7C0%7C0%7C638077735753161743%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=49SuRO9jfbmE5QmGqq85RvUZgZnyJ6XgFjlD3V7duYw%3D&reserved=0" originalsrc="https://carleton.ca/" shash="Dxc62cEP1Yg+/wKXMo/VoujSwhva4+Frdv2Yr8iQuG/kuzsq8b6WfRRSuA3H0L4B+GRsbWHMjTjX4Mg2/0vuwo9UW9HOglt0hd7TcsPjzwi8IgUT3bVowbeQfPFAoMMdOWkKIzbKe4Ax/2E4rJf8j/m4b+N+/72C5VvaPLBgd+8=" target="_blank" rel="noopener noreferrer" data-auth="NotApplicable" data-safelink="true" data-linkindex="2" style="margin:0px" class="ContentPasted0">https://carleton.ca/</a></span></div>
<br>
</span></div>
<div class="elementToProof" style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
<span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);"><img style="max-width: 100%;" class="w-240 h-240" size="6492" contenttype="image/png" data-outlook-trace="F:1|T:1" src="cid:d2eb1b04-21d5-4a18-bfdb-12c4c3e65b8a"><br>
</span></div>
</body>
</html>
- https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcarleton.ca%2F&data=05%7C01%7Cadmin%40tq2zr.onmicrosoft.com%7C49360182de73427cd8e908dae851e772%7C71759330a027406e9082f1f64f1007b3%7C0%7C0%7C638077735753161743%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=49SuRO9jfbmE5QmGqq85RvUZgZnyJ6XgFjlD3V7duYw%3D&reserved=0
- https://carleton.ca/
- https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2F&data=05%7C01%7Cadmin%40tq2zr.onmicrosoft.com%7C49360182de73427cd8e908dae851e772%7C71759330a027406e9082f1f64f1007b3%7C0%7C0%7C638077735753161743%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YF3Fdvo0FyHQkXoHuhv3WzGzfhmqGjCRMdxUPIZsYvA%3D&reserved=0
- https://twitter.com/
There is a library called URLExtract which does exactly as the name says. I haven't used it, though it appears to have some dependencies and requires an Internet connection to download a list of TLDs. It has the risk of false positives (see known issues in the README), and adding it as a dependency to this library does increase the overall complexity by quite a bit IMHO.
Alternative options... perhaps the regex can be improved with some ideas here?
Feature request:
Allow passing * to allow for all attachments not just int value
the same input the same para
but random out with xxx by diff run
the input file is here : aaa_123.zip
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.