Giter Club home page Giter Club logo

msgreader's Introduction

What is MSGReader

MSGReader is a C# .NET 4.6.2, NET Standard 2.0 and NET 5.0 library to read Outlook MSG and EML (Mime 1.0) files. Almost all common object in Outlook are supported:

  • E-mail
  • Appointment
  • Task
  • Contact card
  • Sticky note

It supports all body types there are in MSG files, this includes:

  • Text
  • HTML
  • HTML embedded into RTF
  • RTF

MSGReader has only a few options to manipulate an MSG file. The only option you have is that you can remove attachments and then save the file to a new one.

If you realy want to write MSG files then see my MsgKit project on GitHub (https://github.com/Sicos1977/MsgKit)

Read properties from an Outlook (msg) message

using (var msg = new MsgReader.Outlook.Storage.Message("d:\\testfile.msg"))
{
        var from = msg.Sender;
        var sentOn = msg.SentOn;
        var recipientsTo = msg.GetEmailRecipients(MsgReader.Outlook.RecipientType.To, false, false);
        var recipientsCc = msg.GetEmailRecipients(MsgReader.Outlook.RecipientType.Cc, false, false);
        var subject = msg.Subject;
        var htmlBody = msg.BodyHtml;
        // etc...
}

Detecting charset encoding in MSG files with HTML encapuslated into RTF that use different font set encodings

Most of the times when an HTML body is used in an MSG file this HTML body is encapsulated into RTF. See this link for more info --> https://learn.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/4f09a809-9910-43f3-a67c-3506b09ca5ac

When an HTML body contains chars that are not in the default extended ASCII range then these chars are encoded. This is normally not a problem when just one language is used. When multiple languages are used then it is quite often that the RTF is not build correctly in a way so that MSGReader can figure out what kind of encoding needs to be used to decode the chars. Because of this MSGReader uses the nuget package UTF.Unknown (https://www.nuget.org/packages/UTF.Unknown/) to try to figure out in what kind of encoding a char is stored. Most of the times this works correctly and because of that a threshold is set to a value of 0.90 so that when the detection level passes this value it will be seen as a valid char.

If you still have bad results you can control this confidence level yourself by using the property CharsetDetectionEncodingConfidenceLevel in the Reader or Message class

/// <summary>
///     When an MSG file contains an RTF file with encapsulated HTML and the RTF
///     uses fonts with different encodings then this levels set the threshold that
///     an encoded string detection levels needs to be before recognizing it as a valid
///     string. When the detection level is lower than this setting then the default RTF
///     encoding is used to decode the encoded char 
/// </summary>
/// <remarks>
///     Default this value is set to 0.90, any values lower then 0.70 probably give bad
///     results
/// </remarks>
public float CharsetDetectionEncodingConfidenceLevel { get; set; } = 0.90f;

Read properties from an Outlook (eml) message

var fileInfo = new FileInfo("d:\\testfile.eml");
var eml = MsgReader.Mime.Message.Load(fileInfo);

if (eml.Headers != null)
{
        if (eml.Headers.To != null)
        {
            foreach (var recipient in eml.Headers.To)
            {
                var to = recipient.Address;            
            }
        }
}

var subject = eml.Headers.Subject;

if (eml.TextBody != null)
{
        var textBody = System.Text.Encoding.UTF8.GetString(eml.TextBody.Body);
}

if (eml.HtmlBody != null)
{
        var htmlBody = System.Text.Encoding.UTF8.GetString(eml.HtmlBody.Body);
}

// etc...

Delete attachment from an Outlook message

This example deletes the first attachment

var outlook = new Storage.Message(fileName, FileAccess.ReadWrite);
outlook.DeleteAttachment(outlook.Attachments[0]);
outlook.Save("d:\\deleted.msg");

Translations

  • Kees van Spelde

    • English (US)
    • Dutch
  • Ronald Kohl

    • German
  • Yan Grenier (@ygrenier on GitHub)

    • French
  • xupefei

    • Simpl Chinese

Installing via NuGet

NuGet

The easiest way to install MSGReader is via NuGet.

In Visual Studio's Package Manager Console, simply enter the following command:

Install-Package MSGReader

Side note

This project can also be used from a COM based language like VB script or VB6. To use it first compile the code and register the com visible assembly with the command:

Regasm.exe /codebase MsgReader.dll

After that you can call it like this:

dim msgreader

set msgreader = createobject("MsgReader.Reader")
msgreader.ExtractToFolderFromCom "the msg file to read", "the folder where to place the extracted files"

License Information

MsgReader is Copyright (C) 2013-2024 Magic-Sessions and is licensed under the MIT license:

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NON INFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

Core Team

Sicos1977 (Kees van Spelde)

Support

If you like my work then please consider a donation as a thank you by using the donate button at the top

msgreader's People

Contributors

adameverington avatar alexandrepickler avatar altso avatar calinjigmond avatar codetraverser avatar cyril12740 avatar faustina01 avatar fuchsiasoft avatar gumtreehill avatar iagograh avatar jamesblackadar avatar jh-shine avatar kenjiuno avatar matt1liu avatar maxime-paquatte avatar mvpkenlin avatar nagayamatoshiaki avatar nitrotm avatar r-win avatar rkuznetsov-dynamicsun avatar ronaldthemsgcatcher avatar seppu avatar sicos1977 avatar sicos2002 avatar simplicbe avatar ssmelov avatar tq-seidel avatar wikijm avatar xupefei avatar ygrenier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

msgreader's Issues

msg attachments are broken from commit 77b81b5

In commit 77b81b5 there is some refactoring done that removes a large block of code from Reader.cs. In this block of code, there is a check for attachments, and then attachments are added into the header. Now instead this block was removed and replaced with a very simplified version, ExtractMsgEmailHeader(message, hyperlinks), which does nothing with attachmentList. This change needs to be reverted or fixed.

Codepage problems

Hello Kees,

many thanks for your work.
We are using your project for displaying e-mails exported from outlook in an outlook addin.
Unfortunately we got an e-mail which is not displayed correctly, can you please look at it,
my knowledge goes not far enough to fix it by myself.

Best regards
Dennis

MSGReader issue.zip

Issue with forwarding a forwarded email

When forwarding an email which has been forwarded I see this error message returned:

ExceptionMessage":"The provided stream is not a valid IStorage\r\nParameter name: storageStream","ExceptionType":"System.ArgumentException","StackTrace":" at MsgReader.Outlook.Storage..ctor(Stream storageStream, FileAccess fileAccess) in C:\Users\Kees\Documents\GitHub\MsgReader\MsgReader\Outlook\Storage.cs:line 196\r\n

and the thread from the previous email is not attached.

Thanks!

James

MSG made by Outlook has empty Attachment, throwing exception

I have a Msg file (trying to see if I am allowed to upload) that was created by outlook (reproducible). This file causes PreProcessMsgFile to die at the following code:

fileInfo = new FileInfo(FileManager.FileExistsMakeNew(outputFolder + attachmentFileName));
File.WriteAllBytes(fileInfo.FullName, attach.Data);

The issue is attach.Data is null.
The file does have an attachment, and actually message.Attachments.length==2. The first attachment is good (pdf), but second attachment has filename of Nameless.

I've added the following if() continue;:

if (attachment is Storage.Attachment)
{
   var attach = (Storage.Attachment)attachment;
   if ( attach.Data == null )
      continue;
   attachmentFileName = attach.FileName;

What license is this project?

Can you please specify what license your project is using - public domain, apache, LGPL, etc. This project looks great and I would love to be able to use it, but I want to make sure that I legally can. Thanks. Keep up the good work.

Unable to open a .MSG file and delete attachments

Hi Kees,

A very good day to you.

Could I ask how does one open a .MSG file in order to delete the attachments? My attempts to do so have resulted in errors claiming that 'MsgReader.Exceptions.MRCannotRemoveAttachment: Cannot remove attachments when the file is not opened in Write or ReadWrite mode'

I have tried using the following code:

private static void TestMsgReader()
{
System.IO.FileStream filetoread = new System.IO.FileStream(@"D:\wph\sources\ReadAnOutlookMsg\bin\Debug\testsomethings.msg", FileMode.Open, FileAccess.ReadWrite);

        Storage.Message mymessage = new Storage.Message(filetoread);            

        List<object> mymessageattachments = mymessage.Attachments;

        object currattachment = null;

        for (int x = 0; x < mymessageattachments.Count; x++)
        {
            currattachment = mymessageattachments.IndexOf(x);
            mymessage.DeleteAttachment(currattachment);                
        }

       mymessage.Dispose();
        filetoread.Close();
        filetoread.Dispose();
    }

Many thanks and regards,
Gerald

ArgumentException in MsgReader.dll for MSG files with voice attachments

An unhandled exception of type 'System.ArgumentException' occurred in MsgReader.dll

Additional information: Unknown parameter in Content-Disposition. Ask developer to fix! Parameter: VOICE

We have a request to read MSG files and store the first x characters from the body into a SQL Server table. The MSGReader.dll works great except when a MSG file has a voicemail attachment. Any workarounds? we were using 1.9.6 but I tried 1.9.10 and received the same error.

Question about Mime Headers with UTF-8

Hi,

I currently work on a french mail extractor, from more sources. I treat .msg and .eml files.

In France we often use accent on the sender name, and lot of software save this display name in the Mime Headers with UTF-8 encoding but the subject is word encoded in a .eml file. Example :

From: "Xx santé xxxxxx" <[email protected]>
...
Subject: =?UTF-8?B?RG...

So my question : Can I add some methods with encoding different from ASCII although it does not correspond to the RFC? Or should I use another approach to retrieve those emails in UTF-8 ?

Thanks,

Yanos

Unable to load msg file

Hi, I just tried to use this project to open some msg-files I was handed.
But, somehow this does not work as it should.

I am using the same code as in the readme:

using (var msg = new MsgReader.Outlook.Storage.Message("c:\\test.msg"))
{
        var from = msg.Sender;
        var sentOn = msg.SentOn;
        var recipientsTo = msg.GetEmailRecipients(Storage.Recipient.RecipientType.To, false, false);
        var recipientsCc = msg.GetEmailRecipients(Storage.Recipient.RecipientType.Cc, false, false);
        var subject = msg.Subject
        var htmlBody = msg.BodyHtml;
        // etc...
}

It fails when trying to read with the following exception:
<System.ArgumentException: Object must be of type Int32. at System.Int32.CompareTo(Object value)

I am attaching the zipped msg-file for you here. It was created with Outlook 2016: Test.zip

Is there something I am doing wrong? Is Outlook 2016 supported? Are there anything special with the file I uploaded (encoding, language etc) that makes it hard/impossible to read? As far as I know it was created by saving/dragging it from Outlook onto the desktop, where Outlook suggested the msg extension.

NullReference in ReferenceManager on program exit

I created a console app that loops through a directory with msg files. Every msg is analyzed and some data is extracted. It all works fine except when a msg file is opened. Of course it throws an exception and used a try/catch to catch the exception.

This all works fine until the program is done executing. When I place breakpoints at the end of the main() function, it executes everything as it should, but after that it throws a null reference exception:

An unhandled exception of type 'System.NullReferenceException' occurred in mscorlib.dll
Additional information: Object reference not set to an instance of an object.

mscorlib.dll!System.Runtime.InteropServices.Marshal.ReleaseComObject(object o) MsgReader.dll!MsgReader.Outlook.Storage.ReferenceManager.Finalize()

Quoted text in html mail

Hi,
I am working with the Version 1.6.1.
If I send a message via Outlook 2013 with quoted text as html-mail the MSGReader deletes the first quote and all text within the quotes and returns just the final quote.
Example to reproduce the issue
Sent message
This is a "quoted test" to reproduce the issue.
Returns
This is a " to reproduce the issue
Regards
Ron

Restore last widow size and position?

Nice tool!! I can now use it on my server to read msg files, as I do not have Outlook on the server.

Request - could you make the exe to remember the last window size and position? I find that each time I open the app, I have to resize the window. It would be nice if the tool could store this info, and resize each time it is opened.

Thanks again for this great tool!!

The method "MatchEvaluator" for the RFC 2822 must be changed to lower case.

Sorry I forgot something, after sending the regular expression. In the RFC 2822 specification part "4.3. Obsolete Date and Time" is specified that the military Zone in Upper and lower case are allowed.
So in the regular expression this digits are used as lower and upper case sign, but in the method "MatchEvaluator" the switch/case accepted only Uppercase.
Please add the case for the lowercase Military sign like my sample:

    /// <summary>
    /// Convert timezones in older formats to [+-]dddd format.
    /// </summary>
    /// <param name="match">The match that was found</param>
    /// <returns>The string to replace the matched string with</returns>
    /// 
    /// <remarks>
    /// 
    ///
    /// .. Look here to check the RFC 2822: http://www.rfc-base.org/rfc-2822.html
    ///     
    /// ... here the snippet
    ///     
    ///     4.3. Obsolete Date and Time  
    /// 
    ///         The syntax for the obsolete date format allows a 2 digit year in the
    ///         date field and allows for a list of alphabetic time zone
    ///         specifications that were used in earlier versions of this standard.
    ///         It also permits comments and folding white space between many of the
    ///         tokens.
    ///     
    ///         obs-day-of-week =       [CFWS] day-name [CFWS]
    ///     
    ///         obs-year        =       [CFWS] 2*DIGIT [CFWS]
    ///     
    ///         obs-month       =       CFWS month-name CFWS
    ///     
    ///         obs-day         =       [CFWS] 1*2DIGIT [CFWS]
    ///     
    ///         obs-hour        =       [CFWS] 2DIGIT [CFWS]
    ///     
    ///         obs-minute      =       [CFWS] 2DIGIT [CFWS]
    ///     
    ///         obs-second      =       [CFWS] 2DIGIT [CFWS]
    ///     
    ///         obs-zone        =       "UT" / "GMT" /          ; Universal Time
    ///     
    ///     
    ///     
    ///         Resnick                     Standards Track                    [Page 31]
    ///         
    ///         RFC 2822                Internet Message Format               April 2001
    ///     
    ///     
    ///                                                         ; North American UT
    ///                                                         ; offsets
    ///                                 "EST" / "EDT" /         ; Eastern:  - 5/ - 4
    ///                                 "CST" / "CDT" /         ; Central:  - 6/ - 5
    ///                                 "MST" / "MDT" /         ; Mountain: - 7/ - 6
    ///                                 "PST" / "PDT" /         ; Pacific:  - 8/ - 7
    ///     
    ///                                 %d65-73 /               ; Military zones - "A"
    ///                                 %d75-90 /               ; through "I" and "K"
    ///                                 %d97-105 /              ; through "Z", both
    ///                                 %d107-122               ; upper and lower case -- imported lower and upper
    ///
    /// .. end of snippet
    /// 
    /// 
    /// </remarks>
    private static string MatchEvaluator(Match match)
    {
        if (!match.Success)
        {
            throw new ArgumentException("Match success are always true");
        }

        switch (match.Value)
        {
                // "A" through "I" and "a" through "i"
                // are equivalent to "+0100" through "+0900" respectively
            case "A": case "a":
                return "+0100";
            case "B": case "b":
                return "+0200";
            case "C": case "c":
                return "+0300";
            case "D": case "d":
                return "+0400";
            case "E": case "e":
                return "+0500";
            case "F": case "f":
                return "+0600";
            case "G": case "g":
                return "+0700";
            case "H": case "h":
                return "+0800";
            case "I": case "i":
                return "+0900";

                // "K", "L", and "M" and "k", "l" and "m"
                // are equivalent to "+1000", "+1100", and "+1200" respectively
            case "K": case "k":
                return "+1000";
            case "L": case "l":
                return "+1100";
            case "M": case "m":
                return "+1200";

                // "N" through "Y" and "n" through "y"
                // are equivalent to "-0100" through "-1200" respectively
            case "N": case "n":
                return "-0100";
            case "O": case "o":
                return "-0200";
            case "P": case "p":
                return "-0300";
            case "Q": case "q":
                return "-0400";
            case "R": case "r":
                return "-0500";
            case "S": case "s":
                return "-0600";
            case "T": case "t":
                return "-0700";
            case "U": case "u":
                return "-0800";
            case "V": case "v":
                return "-0900";
            case "W": case "w":
                return "-1000";
            case "X": case "x":
                return "-1100";
            case "Y": case "y":
                return "-1200";

                // "Z", "z", "UT" and "GMT"
                // is equivalent to "+0000"
            case "Z": case "z":
            case "UT":
            case "GMT":
                return "+0000";

                // US time zones
            case "EDT":
                return "-0400"; // EDT is semantically equivalent to -0400
            case "EST":
                return "-0500"; // EST is semantically equivalent to -0500
            case "CDT":
                return "-0500"; // CDT is semantically equivalent to -0500
            case "CST":
                return "-0600"; // CST is semantically equivalent to -0600
            case "MDT":
                return "-0600"; // MDT is semantically equivalent to -0600
            case "MST":
                return "-0700"; // MST is semantically equivalent to -0700
            case "PDT":
                return "-0700"; // PDT is semantically equivalent to -0700
            case "PST":
                return "-0800"; // PST is semantically equivalent to -0800

                // EU time zones
            case "MSK":
                return "+0400"; // MSK is semantically equivalent to +0400

            default:
                throw new ArgumentException("Unexpected input");
        }
    }   

Sorry I have forget this, but in my branch I have changed it.

Best regards
MB

'Storage can not be null' exception when creating new Storage.Message by file path.

Hi Kees,

I ran into the following issue. When creating a new Storage.Message like this:

var msgFilePath = @"c:\temp\email.msg";
return new Storage.Message(msgFilePath);

I got a 'Storage can not be null' exception from Storage.cs.
When using the overloaded Storage.Message constructor which expects a Stream, no exception occurs:

var msgFilePath = @"c:\temp\email.msg";
return new Storage.Message(File.OpenRead(msgFilePath));

non-English msg file issues

I've parsed non-English msg files and then message shown as marks or ??.

for example:
msg files in MsgReadzip\MsgRead\bin\Debug\Inbox shows:

Chinese part ¡
ÁŤ¤¦Ã

Japanese part ¡


fullwidth ¡
¢¢¢¡¢¢¢¢¢¡¢¢¡¢¢¢

I've tried MsgReader.Reader.ExtractMsgEmailBody(sr, false, "text/plain"); but still.

Please help.
thanks

MsgRead.zip

Compressed-RTF CRC32 failed Error

Hello Kees,

We are having issues opening a number of MSG files in our system using your MSGViewer we are getting the "compressed-RTF CRC32 failed" error. The body of the MSG for these files is RTF. The files open normally in Outlook. I have attached 2 sample files and would appreciate any assistance you could provide.

Example MSGs.zip

Problem with BodyHtml property when message format is RTF

Hello Kees,

I'm having a problem when accessing the BodyHtml property in Outlook.Storage.Message class.

If the email format is RTF, like this one written and sent with Outlook 2010: broken_rtf.zip I get the following exception:

--NullReferenceException at MsgReader.Rtf.DomDocument.Load(Reader reader, DocumentFormatInfo parentFormat) in C:\Users\Kees\Documents\GitHub\MsgReader\MsgReader\Rtf\DomDocument.cs:line 1466 at MsgReader.Rtf.DomDocument.Load(Reader reader, DocumentFormatInfo parentFormat) in C:\Users\Kees\Documents\GitHub\MsgReader\MsgReader\Rtf\DomDocument.cs:line 367 at MsgReader.Rtf.DomDocument.LoadRtfText(String rtfText) in C:\Users\Kees\Documents\GitHub\MsgReader\MsgReader\Rtf\DomDocument.cs:line 293 at MsgReader.Outlook.Storage.Message.get_BodyHtml() in C:\Users\Kees\Documents\GitHub\MsgReader\MsgReader\Outlook\Message.cs:line 1066

I tried to debug it by myself, but I don't have enough knowledge about RTF to know what is happening here.

Rfc822DateTime

Hallo,

I have used the library for an test with EML Mails, generated by Lotus Notes..
In the Header is a time as send date with an EST zone Tag. The regex in the file "Rfc2822DateTime.cs" interpret this tag as +0500-0600-0700 Hours.
The Regex interprets E = +0500 S = -0600 T= -0700.
I have changed the regex - in a private branch- from the file "Rfc2822DateTime.cs" from
private const string RegexOldTimezoneFormats = @"UT|GMT|EST|EDT|CST|CDT|MST|MDT|PST|MSK|PDT|[A-I]|[K-Y]|Z";
into
private const string RegexOldTimezoneFormats = @"\b((UT|GMT|EST|EDT|CST|CDT|MST|MDT|PST|MSK|PDT)|([A-IK-Za-ik-z]))\b";
Now the Time interpreter works fine :-)

Best Regards
MB

German translation

Hello Kees,
some lines in the 'German Translation' should be corrected:

public const string EmailSignedBy = "Ondertekend Tür";
To
public const string EmailSignedBy = "Signiert von";

public const string EmailSignedByOn = "op";
To
public const string EmailSignedByOn = "Signiert am";

Also the filter in 'ViewerForm.cs' should be enlarged to allow EML-Files:

Filter = "MSG Files (.msg)|.msg|EML Files (.eml)|.eml",

Best regards,
Ron
BTW: Thanks for the great work you have done.

Storage.Attachment.DataStream?

Is it possible to expose the underlying Stream of Storage.Attachment.Data?
For checking the dimensions of an image for example this would be useful as you don't need to load the whole attachment into memory...

Retrieve recipient/sender of the .msg

Hello,i've just installed your library via Nuget. First, thank you for your work.
I've successfully extracted the content of several .msg files and put them in separate folders.
I would like to access to the metadata of each message (i.e. recipient and sender addresses, mail subject, timestamps etc.) but I can't understand how to do it.
Thanks in advantage,
Sebastiano

Unneeded span tag generated for first escaped character

Hi,

The first time the DocDocument.ReadHtmlContent method encounters an escaped character (e.g., \'c6) it generates a span tag with font styling which is not required (and also wrong, style-wise).

This RTF snippet (generated by Outlook 2013):

{\*\htmltag64 <p class=MsoNormal>}\htmlrtf {\htmlrtf0 X
{\*\htmltag244 <o:p>}
{\*\htmltag252 </o:p>}\htmlrtf\par}\htmlrtf0

{\*\htmltag72 </p>}
{\*\htmltag64 <p class=MsoNormal>}\htmlrtf {\htmlrtf0 \'c6
{\*\htmltag244 <o:p>}
{\*\htmltag252 </o:p>}\htmlrtf\par}\htmlrtf0

{\*\htmltag72 </p>}
{\*\htmltag64 <p class=MsoNormal>}\htmlrtf {\htmlrtf0 \'d8
{\*\htmltag244 <o:p>}
{\*\htmltag252 </o:p>}\htmlrtf\par}\htmlrtf0

becomes:

<p class=MsoNormal>
X<o:p>
</o:p>
</p>
<p class=MsoNormal>
<span style = 'font-family:"Arial";font-size:12pt'>Æ<o:p>
</o:p>
</p>
<p class=MsoNormal>
Ø<o:p>
</o:p>
</p>

Complete RTF from msg file: body.rtf.txt and HTML generated by MSGReader: body.html.txt.

The problem can be solved by removing the following code from DocDocument.ReadHtmlContent:

if (FontTable != null && fontIndex.HasValue && fontIndex <= FontTable.Count)
{
	// <span style = 'font-size:12.0pt;font-family:"Arial",sans-serif' >
	var font = FontTable[fontIndex.Value];
	if (!spanTagWritten)
	{
		stringBuilder.Append("<span style = 'font-family:\"" + font.Name + "\";");
		if (fontSize.HasValue)
			stringBuilder.Append("font-size:" + fontSize + "pt");
		stringBuilder.Append("'>");
		spanTagWritten = true;
		encoding = font.Encoding ?? _defaultEncoding;
	}
}

This have worked for my use cases so far and there does not seem to be any reason to treat escaped characters differently, at least for files generated by Outlook 2013.

Thanks!
Henning

Array out of bound [MapiTagMapper]. rel. library 2.0.8.0

Hi and thanks for this # great library.
I'm currently having problems with any msg-files.
There seems to be an issue in https://github.com/Sicos1977/MSGReader/blob/master/MsgReader/Outlook/MapiTagMapper.cs#L95

I do not understand exactly which properties of the email throws an exception,
but it seems that the array entryStreamBytes[] is not allocated correclty and is not big enough.

The exception returned in "Index was outside the bounds of the array."
could you help me?

Tnx Jljch
best reguards

GetHeaders() and SetEmailSenderAndRepresentingSender() not reached.

Hi Kees,

My bulk import went very well thanks to your great library. I encountered only two issues.

[1] In some cases the return statement in the following line

https://github.com/Sicos1977/MSGReader/blob/master/MsgReader/Outlook/Message.cs#L1279

Prevents the code to reach the lines at 1295 and 1296:

GetHeaders();
SetEmailSenderAndRepresentingSender();

That caused me to end up with empty Sender fields. I worked around this issue by moving the two method calls to lines 1225 and 1226, so before the if statement.
I can't send you an example msg because of customer confidentiality. I know that sucks.

[2] In some other rare case the e-mail addresses contained the Exchange Legacy Distinguished Name, so something like /o=SomeDomain/ou=SomeGroup/cn=Recipients/cn=Morgan Cheng instead of an e-mail address.

In those cases I could work around most of them by running the returned e-mail addresses through the following quick hack AD lookup method (found somewhere on Stack Overflow ...I think)...

using System.DirectoryServices;

public string ResolveEmailAddress(string email)
{
    // Check if the address is set and if it's an Exchange Distinguished Name (i.e. Canonical CN=... etc)
    if (string.IsNullOrWhiteSpace(email) || !email.Contains("CN=")) return email;

    using (var ds = new DirectorySearcher())
    {
        ds.Filter = $"(&(ObjectClass=User)(legacyExchangeDN={email}))";
        var src = ds.FindAll();

        // Return the Exchange Distinguished Name in case we can't get the e-mail address for some reason
        if (src.Count != 1 || src[0] == null || src[0].Properties == null || src[0].Properties.Count == 0 || 
            !src[0].Properties.Contains("Mail") || src[0].Properties["Mail"].Count == 0) return email;

        return string.IsNullOrWhiteSpace(src[0].Properties["Mail"][0].ToString())
            ? email
            : src[0].Properties["Mail"][0].ToString();
    }
}

Invalid Media Type On HeaderParser

Hi Kees,

Noticed a small issue on the HeaderParser for Mime Types.

When parsing the content-type in the public static ContentType ParseContentType(string headerValue) method i've had to add in the following line // Check for illegal content-type var v = value.ToUpperInvariant(); if (v.Equals("TEXT") || v.Equals("TEXT/")) value = "text/plain"; if (v.EndsWith("\0")) value = v.Replace("\0", String.Empty); << As some Content types are coming through with a "\0" suffix eg "IMAGE/GIF\0" which is causing an assignment error, removing the "\0" resolves the issue.

infinite Loop in FileExistsMakeNew

Hi,

the file name is derived from the mails subject.
If the length of file name and path equals or exceeds MaxPath (248 characters) the first ExtractToFolder runs as expected.
The second call to ExtractToFolder with the same mail file loops infinitely in FileManager.FileExistsMakeNew because the shortened

I'd like to suggest to replace

        while (File.Exists(tempFileName))
        {
            tempFileName = path + fileNameWithoutExtension + "_" + i + extension;
            tempFileName = validateLongFileName ? ValidateLongFileName(tempFileName, extraTruncateSize) : tempFileName;
            i += 1;
        }

by

        while (File.Exists(tempFileName))
        {
            tempFileName = validateLongFileName ? ValidateLongFileName(tempFileName, extraTruncateSize) : tempFileName;
            fileNameWithoutExtension = GetFileNameWithoutExtension(tempFileName);
            tempFileName = path + fileNameWithoutExtension + "_" + i + extension;
            i += 1;
        }

Thanks!
Sincerely
Matthias

Is it possible to see if an e-mail was originally Received or Sent?

Hi Kees, thanks for this great and useful library! It came just on time!

I'm importing a large quantity of Outlook *.msg e-mail files into our CRM system. In CRM I can set a property on e-mails to differ between received mails and sent mails. Is there a way to determine this for the *.msg files? I couldn't find a property on the MsgReader.Outlook.Storage.Message Type. My test file also returned filled SentOn and ReceivedOn properties, so those can't be used either.
Any help appreciated.

Include Localization files in nuget-package

I noticed, that the MsgReader.resources.dlls are missing in the nuget-package. Could you please provide the de/MsgReader.resources.dll, fr/MsgReader.resources.dll and so on in the nuget-package (or provide a way to localize the strings)?
Thanks a lot!

Its not able to read Chinese characters, its displaying ? instaed of characters

using (var outlookMsg = new MsgReader.Outlook.Storage.Message(pathPSgFile))
{
Console.WriteLine("Subject: {0}", outlookMsg.Subject);
Console.WriteLine("Body: {0}", outlookMsg.BodyText);

            Console.WriteLine("{0} Recipients", outlookMsg.Recipients.Count);
            foreach (MsgReader.Outlook.Storage.Recipient recip in outlookMsg.Recipients)
            {
                Console.WriteLine(" {0}:{1}", recip.Type, recip.Email);
            }

            Console.WriteLine("{0} Attachments", outlookMsg.Attachments.Count);
            foreach (MsgReader.Outlook.Storage.Attachment attach in outlookMsg.Attachments)
            {
                Console.WriteLine(" {0}, {1}b", attach.FileName, attach.Data.Length);
            }

Attached is msg template

        }

IDENTIFIER HTML Test Mail w Att - Smoke-Test.zip

Nuget Restore Fails

I am experiencing an issue when using Nuget restore with MSGReader package 1.9.3. This issue is documented https://nuget.codeplex.com/workitem/4489.

WARNING: Unexpected XML declaration. The XML declaration must be the first node in the document, and no white space characters are allowed to appear before it.

Using MSGReader to extract email recipients

Hello,

Regarding your code, this roughly what i need for a command line tool that lets me extract recipents addresses from all msg files in a directory. Such an utility would basically help me extract addresses from abuse RFC822 complaint mails in order to blacklist them.
I'm not good at reading C#, would it be hard to add such a command line tool ?

Regards,
Ozy.

Stack overflow MsgReader.RtfDomDocument.Load

I have someone with an RTF msg with inline pdf. We hit a stack overflow with it,

There are a couple hundred levels of stack all at the same spot of code, around line 368 of Rtf.DomDocument.cs:

if (reader.TokenType == RtfTokenType.GroupStart)
  Load(reader,format); // this is where recursion is at

Unfortunately I am not able to reproduce this in a msg file that I can send to you (it is a sensitive email). Any idea as to what could cause the issue? Is the iterator not changing positions?

Add Reload or Back button to form...

I noticed that if I open a msg file that has web links, and then click on one of the links, it will actually navigate to the target contents, much like a web browser does. This is a nice feature indeed, but I found that I had no way to navigate back to the original msg/email content.

So, my request is to add a "Reload message" button to the form so we could easily get back to the original message.

Removing attachments

I’m working on an internal project for storing e-mails and documents and we are trying to extract the attachments of e-mails and store the attachments separately.

But we would like to remove the extracted attachments documents from the e-mail before we save the e-mail to our storage.

This solution works great for extracting attachments, but I have not been able to remove attachments from the e-mail.

Is that something you can add to this project?

Question: adding support for extracting into streams or byte arrays

Hello,
I found msgreader to be able to cope with a lot of problematic msg format mails where other libraries fail. The only problem I have is that I need to use this assembly so that I can extract attachments, mail body text and meta data (from, to, cc, subject, etc.) into memory. I found that msgreader only supports saving into files first which is very inefficient in my scenario.
Could you add support for extracting everything (files and metadata) into streams/byte arrays and properties? I guess you do that internally, so you just need to expose these as public properties.
Thanks.

Can you make this pkg compatible with xamarin.ios!!!

I was able to get this to run and successfully convert .msg to .eml but cannot compile with xamarin.ios I really would love to use this feature but am stopped by the xamarin.ios issue. It would be great if this could work on xamarin as well thanks in advance.

Encodings with Umlauts generate FormatException in DomDocument

Hi and thanks for this great library. I'm currently having problems with german msg-files. There seems to be an issue in https://github.com/Sicos1977/MSGReader/blob/master/MsgReader/Rtf/DomDocument.cs#L3345

(Rtf/DomDocuments.cs, line 3345). The hexBuffer is empty at this point, and therefore byte.Parse throws an exception. reader.CurrentToken.Hex however contains 0xFC (a german Umlaut in 1252 encoding: 'ü'). If I add
if (hexBuffer == string.Empty) hexBuffer = reader.CurrentToken.Hex;
The code works, but the font-size changes at the Umlauts position.

Please see this testmail: link removed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.