Giter Club home page Giter Club logo

gvoiceparser's Introduction

##gvoiceParser

The gvoiceParser project aims to make the thousands of tiny files generated from Google Voice Takeout more useful. This effort consists of three parts:

  1. A Python library for interpreting the HTML files ("gvParserLib.py")
  2. A program to invoke the library and load its contents into an SQLite database ("gvproc.py")

Note that this is currently a Python 2.7 script, with dependencies on dateutil and html5lib.

Library Usage

A GoogleVoice dump gives you a boatload of useless HTML files.

gvoiceParser.Parser.process_file in "gvParserLib.py" processes one such file. If it is successful, it returns a record. Otherwise, it returns None.

You can use it in a loop, like so, to read all the GoogleVoice files

for fl in os.listdir(directory):
  if fl.endswith(".html"):
    record = gvoiceParser.Parser.process_file(os.path.join(directory, fl),mynumbers)
    #Do other bookkeeping stuff here
    if record:
      records.append(record)

What's that mynumbers business, you ask? That is a list of all the numbers the Google Voice account holder owns. Typically, this is the Google Voice number itself along with any phones the GV number aliases, such as the user's real cell number. This list allows the parser to associate these numbers with the caller named "###ME###", which is otherwise difficult.

The records contain fields which are pretty self-explanatory. If you're ambitious, you can even send me a patch with a description of them, which I will place here.

Usage for the Rest of Us

What's that? You want your GV numbers in a nice database so you can back-up them up or do stats or something? Got you covered.

There's a handy file in the repo named "gvproc.py".

The command takes the following arguments

  • --contacts This is a file to load contacts from. The file should be a CSV with the header row Name,Number,Notes. Each number must be unique, but any number of names can be identical. When a person gets a new phone number this allows you to continue to associate that person's name with each of their numbers in your DB.

  • path Where the GV files are

  • database Name of the database you want to create or append to. Note that currently appending will create duplicate messages.

  • --contactcsv The program uses some moderately intelligent logic to try to figure out which phone numbers belong to which names. The aforementioned contacts CSV ensures that contacts you already know are associated with the correct name. This argument specifies a CSV where all contacts, new and old, are printed to. You can then diff this against the original contacts CSV and revise it accordingly. I recommend then re-running the program so that your DB comes out right.

  • --clear This destroys all messages, texts, and call records, but not contacts, in the DB.

  • --mynumbers This is a comma-delimited list of the account owner's phone numbers. This is useful because you do not often have yourself in your contacts list.

Notes

If you need to parse Google Voice messages stored in Hangouts, Steven Zhang has a repo here which may help.

Credits

The "gvoiceParser.py" library was original built by Avi Levin and released under the LGPL v2.1. They have since modified their repository to identify Richard's repo as the primary development effort.

Avi's work, "gvoiceParser.py" library was, in turn, a rewrite of their googlevoice-to-sqlite script.

gvoiceparser's People

Contributors

arithmomaniac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gvoiceparser's Issues

Paralellism

Including I/O multiplexing and queuing the file reading, parsing, and database conversion separately

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.