Giter Club home page Giter Club logo

atariemailarchive-data's Introduction

This is public data from atariemailarchive.org

This repository contains parsed and threaded email data from Jed Margolin's time as a hardware engineer at Atari.

This data backs atariemailarchive.org, which curates and showcases Jed's emails. The raw email data are in text files on Jed's website, where you can read more about his time there.

You can read about how I made atariemailarchive.org here.

There are 4,128 messages and 2,846 threads in the archive.

How is this data different from the text files on Jed's website?

  • It's parsed.
  • It has conversation threads.
    • I read all 4,128 emails in the archive and threaded them manually. No, really I did. Yes, you're welcome.

From my correspondence with Jed:

Did Vaxmail have the notion of conversation threads? And is there any way I can glean them from the raw data? Did, perhaps, the UI default to Re: (subject) when replying to other mail (which would give me an approximation)?


No, I don't think there were threads. It probably hadn't been invented yet so it wouldn't be in the raw data. In fact, I don't think there was raw data, just ASCII text. And there wasn't much of a UI. It was just text.

I read every single message. If a message was clearly a reply to a prior messages then I added them to the same thread. If you find obvious errors, let me know.

What's the schema?

There's one messages table with the following columns:

  • id: the primary key, an id for each message
  • thread_id: messages that belong to the same thread have the same thread_id
    • Messages in each thread are ordered by sent_at.
  • sender: the sender of the message in the original {VAX_NAME}:{USER} form (more on this below)
  • recipient: the recipient of the messsage in the original {VAX_NAME}:{USER} form (more on this below)
  • cc: any cc'ed recipients (more on this below)
  • sent_at: the time the message was sent
  • subject: the message subject
  • body: the message body

How are senders, recipients, and CC formatted?

From my correspondence with Jed:

Most to: and from: fields are two words with a colon separating them. What do the first and second refer to? In cases where there is no colon delimiter, what does that mean?


An email like:

From: KIM::DROBNY "Buddy Flyback" 29-JAN-1986 10:08:49.22 To: @SYS$MAIL:JUNK

came from the VAX named Kim from user Drobny (Chris Drobny). Chris was a tech and a really good guy. “Buddy Flyback” was his chosen nickname.

We started with one Vax but when we got a second one they needed names. The new one was Kim (New Vax; from actress Kim Novak). The second one was Ernie (Slow Vax; from famous comedian Ernie Kovax.).

I’m not sure why some don’t have a colon. It might be where both sender and recipient are on the same Vax.

@SYS$MAIL:JUNK was a mail distribution list called JUNK, where you could post anything, like what flavor Jello was for lunch that day.

Here are three messages from the archive that may also help you understand what is going on.

License

Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.

Author

This dataset is published by Vikram Oberoi.

atariemailarchive-data's People

Contributors

voberoi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

mcecorpus nazma98

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.