Giter Club home page Giter Club logo

Comments (10)

Dijji avatar Dijji commented on May 29, 2024 1

I've been looking at your code, and I'm afraid I'm rather puzzled. So, the path is XstReader to CSV to eml.

  1. I understand how this will work for text content, but I am puzzling over HTML and Rich text format. eml, if it means anything, means RFC 822, but that only deals with text. How should full fidelity be maintained (and I do think that should include in-line images)?

  2. Why do I want to go to the eml format in the first place? Outlook no longer supports importing it, so why is it a good vehicle for recovering OST files? If you need a script to run in Outlook, why go via eml when you're having to deal with rich content by appending core body material as attachments.

  3. Would not the most direct way of attacking the load into Outlook problem be to export properties to a CSV, the body to a text, HTML or RTF file, and the attachments to their own files, add pointers to those files to the CSV, and then write an Outlook script that consumed as much of the data as possible?

It may just be that I'm missing your point. Please enlighten me if so.

Dijji

from xstreader.

Dijji avatar Dijji commented on May 29, 2024

Generally, I agree that this would be useful. The problem is the format. Bodies come in three main formats: plain text; HTML; and RTF (Word document format). What should I export them as? I had a look around, but couldn't come up with anything compelling in the way of a common format.

Another consideration in the HTML case - what should I do with embedded graphics, which Outlook transports as hidden attachments?

Dijji

from xstreader.

halueda avatar halueda commented on May 29, 2024

I think these three format is put in different column in CVS file.
About the format, I'm thinking

  • plain text: encoding to utf-8 text
  • HTML; it is rather difficult because originally it is binary data, byte[] . I propose that exporting two column:
    - utf-8 converted html, which is almost readable but sometimes incorrect encoding.
    - base64 format, which is perfect for mail reader but hard to read by human.
  • RTF: base64 text

IMHO , HTML embedded graphics should be ignored, because no attachments are exported.
And, I think it is acceptable for user because Outlook often shows HTML message without graphics, especially the sender unreliable.

from xstreader.

Dijji avatar Dijji commented on May 29, 2024

I'm not a big fan of adding a mix of columns to the CSV file, where some are meant for humans to read, and some for programs. I think we should stick to making the CSV file human readable, and if we want program readable formats, export them separately.

I also think that if we have a plain text column in the CSV file, then the program should make a decent effort at providing text whatever the email format. For plain text, this is trivial. For HTML, most browsers support a means of extracting text from a web page, although this might make extraction run rather slowly. For RTF, Microsoft Word would give a text conversion, but this would be an annoying dependency to take. Maybe there is an open source alternative for this.

Having got this far, it all sounds pretty tricky. But if I consider the alternative that you have coded up, I'm not sure what on earth would consume compressed RTF, or the streamed form of HTML. What do you believe the use cases for these formats are?

Dijji

from xstreader.

halueda avatar halueda commented on May 29, 2024

My concern is importing all the messages from .ost file to Outlook. To do so, I can do following steps:

  • Export properties for messages in a folder as a CSV file.
  • Convert each messages in the CSV file, line by line, to a eml file
  • Import eml files into outlook.

I almost finished work for this and published in my repositories.
For first two items, exporting for this purpose and converting to eml format in Python, I pushed In a working branch of a fork of your XstReader, https://github.com/halueda/XstReader/tree/working
And for the final item, I pushed another repo, https://github.com/halueda/EML-Import , automatic import in Visual Basic Script.

At this end, I'm also agreeing CSV file is just for humans. Instead, exporting eml file should be more direct way. Though, I'm not familiar with C# so much, that I need more time to contribute that task.

I'm also noticed exporting properties is not best fit for my purpose because properties do not include To and Cc information. I realized that I need exporting 'contents' instead of 'properties', if exporting eml is too specific requirement.


About RTF contents, I also noticed that issue and worked out for it. Finally, I export it as an attached file with extension of rt. I can click the attached file then MS word preview plugin automatically show the contents, and it is enough to salvage message.

from xstreader.

Dijji avatar Dijji commented on May 29, 2024

Very interesting. I'll have a look at your code and see if I can figure it out, although your C# might be better than my Python!

Dijji

from xstreader.

halueda avatar halueda commented on May 29, 2024

I choosed eml simply because I know the format very well, as I am very old man and knows well
about UNIX /usr/bin/mail and so on than Outlook format. If I knows more about Outlook format then I don't stick with eml format.
About fidelity, I implemented my extension referring other RFCs about attachment and so on, as much as I could.

As you mentioned that V1.7 has an extension to save a message with attached file in another issue, I feel it works. Anyway, I will check it whether it fit for me. (But at now, I'm so busy that it will be a few weeks/months later)

from xstreader.

esantose avatar esantose commented on May 29, 2024

Hi, I need to get some MAPIproperties, but I co uld find a function to do it.
Could someone help me with how to get the 'EntryId" value as a string? (it's a byte[])

from xstreader.

Dijji avatar Dijji commented on May 29, 2024

Are you by any chance looking at the CSV file using Excel? There appears to be a problem with properties that contain a new line character messing up Excel CSV import in UTF-8 encoded files.

To confirm, export the properties for a message and look at the CSV file in a text editor. It is a bit hard to read, but if you count the headings to the first EntryId property, and then count to the corresponding data field, you should see the value in hexadecimal, just as you do if you display message properties in the UI.

from xstreader.

Dijji avatar Dijji commented on May 29, 2024

Release version 1.13 contains a fix that will cause Excel to open exported CSV files correctly

from xstreader.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.