Giter Club home page Giter Club logo

mail-parser's Introduction

mail-parser

crates.io build docs.rs crates.io

mail-parser is an e-mail parsing library written in Rust that fully conforms to the Internet Message Format standard (RFC 5322), the Multipurpose Internet Mail Extensions (MIME; RFC 2045 - 2049) as well as many other internet messaging RFCs.

It also supports decoding messages in 41 different character sets including obsolete formats such as UTF-7. All Unicode (UTF-*) and single-byte character sets are handled internally by the library while support for legacy multi-byte encodings of Chinese and Japanese languages such as BIG5 or ISO-2022-JP is provided by the optional dependency encoding_rs.

In general, this library abides by the Postel's law or Robustness Principle which states that an implementation must be conservative in its sending behavior and liberal in its receiving behavior. This means that mail-parser will make a best effort to parse non-conformant e-mail messages as long as these do not deviate too much from the standard.

Unlike other e-mail parsing libraries that return nested representations of the different MIME parts in a message, this library conforms to RFC 8621, Section 4.1.4 and provides a more human-friendly representation of the message contents consisting of just text body parts, html body parts and attachments. Additionally, conversion to/from HTML and plain text inline body parts is done automatically when the alternative version is missing.

Performance and memory safety were two important factors while designing mail-parser:

  • Zero-copy: Practically all strings returned by this library are Cow<str> references to the input raw message.
  • High performance Base64 decoding based on Chromium's decoder (the fastest non-SIMD decoder).
  • Fast parsing of message header fields, character set names and HTML entities using perfect hashing.
  • Written in 100% safe Rust with no external dependencies.
  • Every function in the library has been fuzzed and thoroughly tested with MIRI.
  • Battle-tested with millions of real-world e-mail messages dating from 1995 until today.

Usage Example

    let input = br#"From: Art Vandelay <[email protected]> (Vandelay Industries)
To: "Colleagues": "James Smythe" <[email protected]>; Friends:
    [email protected], =?UTF-8?Q?John_Sm=C3=AEth?= <[email protected]>;
Date: Sat, 20 Nov 2021 14:22:01 -0800
Subject: Why not both importing AND exporting? =?utf-8?b?4pi6?=
Content-Type: multipart/mixed; boundary="festivus";

--festivus
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: base64

PGh0bWw+PHA+SSB3YXMgdGhpbmtpbmcgYWJvdXQgcXVpdHRpbmcgdGhlICZsZHF1bztle
HBvcnRpbmcmcmRxdW87IHRvIGZvY3VzIGp1c3Qgb24gdGhlICZsZHF1bztpbXBvcnRpbm
cmcmRxdW87LDwvcD48cD5idXQgdGhlbiBJIHRob3VnaHQsIHdoeSBub3QgZG8gYm90aD8
gJiN4MjYzQTs8L3A+PC9odG1sPg==
--festivus
Content-Type: message/rfc822

From: "Cosmo Kramer" <[email protected]>
Subject: Exporting my book about coffee tables
Content-Type: multipart/mixed; boundary="giddyup";

--giddyup
Content-Type: text/plain; charset="utf-16"
Content-Transfer-Encoding: quoted-printable

=FF=FE=0C!5=D8"=DD5=D8)=DD5=D8-=DD =005=D8*=DD5=D8"=DD =005=D8"=
=DD5=D85=DD5=D8-=DD5=D8,=DD5=D8/=DD5=D81=DD =005=D8*=DD5=D86=DD =
=005=D8=1F=DD5=D8,=DD5=D8,=DD5=D8(=DD =005=D8-=DD5=D8)=DD5=D8"=
=DD5=D8=1E=DD5=D80=DD5=D8"=DD!=00
--giddyup
Content-Type: image/gif; name*1="about "; name*0="Book ";
              name*2*=utf-8''%e2%98%95 tables.gif
Content-Transfer-Encoding: Base64
Content-Disposition: attachment

R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
--giddyup--
--festivus--
"#;

    let message = Message::parse(input).unwrap();

    // Parses addresses (including comments), lists and groups
    assert_eq!(
        message.get_from(),
        &HeaderValue::Address(Addr::new(
            "Art Vandelay (Vandelay Industries)".into(),
            "[email protected]"
        ))
    );

    assert_eq!(
        message.get_to(),
        &HeaderValue::GroupList(vec![
            Group::new(
                "Colleagues",
                vec![Addr::new("James Smythe".into(), "[email protected]")]
            ),
            Group::new(
                "Friends",
                vec![
                    Addr::new(None, "[email protected]"),
                    Addr::new("John Smîth".into(), "[email protected]"),
                ]
            )
        ])
    );

    assert_eq!(
        message.get_date().unwrap().to_iso8601(),
        "2021-11-20T14:22:01-08:00"
    );

    // RFC2047 support for encoded text in message readers
    assert_eq!(
        message.get_subject().unwrap(),
        "Why not both importing AND exporting? ☺"
    );

    // HTML and text body parts are returned conforming to RFC8621, Section 4.1.4
    assert_eq!(
        message.get_html_body(0).unwrap(),
        concat!(
            "<html><p>I was thinking about quitting the &ldquo;exporting&rdquo; to ",
            "focus just on the &ldquo;importing&rdquo;,</p><p>but then I thought,",
            " why not do both? &#x263A;</p></html>"
        )
    );

    // HTML parts are converted to plain text (and viceversa) when missing
    assert_eq!(
        message.get_text_body(0).unwrap(),
        concat!(
            "I was thinking about quitting the “exporting” to focus just on the",
            " “importing”,\nbut then I thought, why not do both? ☺\n"
        )
    );

    // Supports nested messages as well as multipart/digest
    let nested_message = message
        .get_attachment(0)
        .unwrap()
        .unwrap_message();

    assert_eq!(
        nested_message.get_subject().unwrap(),
        "Exporting my book about coffee tables"
    );

    // Handles UTF-* as well as many legacy encodings
    assert_eq!(
        nested_message.get_text_body(0).unwrap(),
        "ℌ𝔢𝔩𝔭 𝔪𝔢 𝔢𝔵𝔭𝔬𝔯𝔱 𝔪𝔶 𝔟𝔬𝔬𝔨 𝔭𝔩𝔢𝔞𝔰𝔢!"
    );
    assert_eq!(
        nested_message.get_html_body(0).unwrap(),
        "<html><body>ℌ𝔢𝔩𝔭 𝔪𝔢 𝔢𝔵𝔭𝔬𝔯𝔱 𝔪𝔶 𝔟𝔬𝔬𝔨 𝔭𝔩𝔢𝔞𝔰𝔢!</body></html>"
    );

    let nested_attachment = nested_message.get_attachment(0).unwrap().unwrap_binary();

    assert_eq!(nested_attachment.len(), 42);

    // Full RFC2231 support for continuations and character sets
    assert_eq!(
        nested_attachment.get_attachment_name().unwrap(),
        "Book about ☕ tables.gif"
    );

    // Integrates with Serde
    println!("{}", serde_json::to_string_pretty(&message).unwrap());
    println!("{}", serde_yaml::to_string(&message).unwrap());

More examples available under the examples directory. Please note that this library does not support building e-mail messages as this functionality is provided separately by the mail-builder crate.

Testing, Fuzzing & Benchmarking

To run the testsuite:

 $ cargo test --all-features

or, to run the testsuite with MIRI:

 $ cargo +nightly miri test --all-features

To fuzz the library with cargo-fuzz:

 $ cargo +nightly fuzz run mail_parser

and, to run the benchmarks:

 $ cargo +nightly bench --all-features

Conformed RFCs

Supported Character Sets

  • UTF-8
  • UTF-16, UTF-16BE, UTF-16LE
  • UTF-7
  • US-ASCII
  • ISO-8859-1
  • ISO-8859-2
  • ISO-8859-3
  • ISO-8859-4
  • ISO-8859-5
  • ISO-8859-6
  • ISO-8859-7
  • ISO-8859-8
  • ISO-8859-9
  • ISO-8859-10
  • ISO-8859-13
  • ISO-8859-14
  • ISO-8859-15
  • ISO-8859-16
  • CP1250
  • CP1251
  • CP1252
  • CP1253
  • CP1254
  • CP1255
  • CP1256
  • CP1257
  • CP1258
  • KOI8-R
  • KOI8_U
  • MACINTOSH
  • IBM850
  • TIS-620

Supported character sets via the optional dependency encoding_rs:

  • SHIFT_JIS
  • BIG5
  • EUC-JP
  • EUC-KR
  • GB18030
  • GBK
  • ISO-2022-JP
  • WINDOWS-874
  • IBM-866

License

Licensed under either of

at your option.

Copyright

Copyright (C) 2020-2022, Stalwart Labs, Minter Ltd.

See COPYING for the license.

mail-parser's People

Contributors

barafael avatar mdecimus avatar redsos avatar stappersg avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.