Giter Club home page Giter Club logo

fb_to_mongo's Introduction

FB_to_mongo

This code takes data from FB's Graph API and writes that data to mongo.

It iterates through a specified directory, finding all files that are at least 30 minutes old. When it finishes processing all files at least 30 minutes old, it sleeps for 1 hour before resuming -- this means that the code can be run indefinitely, rather than being run each time there is new data to insert.
Using the filename, it determines whether the data contains a page, a post, comments, or replies.
It then parses the data according to a template made for each type of data.

It takes processed data and does the following:

  1. It writes the processed data to a new file, putting that processed file in a processed directory
  2. It inserts the processed data to Mongo
  3. It moves the raw file to a raw archive directory

Installation and setup

  1. Clone the code to your machine.
  2. Rename config_template.py to config.py.
  3. Modify the parameters in config.py:
    • Put the proper path to the directory containing data in base_dirc. FB_to_mongo assumes that this directory will contain a directory called "download" that will contain subdirectories with data.
    • Specify the name you want for the file that will contain info about candidates and FB page ids in candidate_info_json_file.
    • Enter the credentials for Mongo into mongo_auth. If your instance of mongo is not password-protected, change "AUTH" to False.
    • Enter the name you want for the Mongo DB into mongo_auth. FB_to_mongo will automatically create collection names that will live inside the DB you name here.
  4. Run create_candidate_info_json.py
  5. Check the file named in candidate_info_json_file in config.py. Make any changes as appropriate. The string that appears after the colon in each line will be the value for a field called "name" in the documents in each collection.
  6. Run the code using sudo python3 FB_data_parsing.py >> insert.log 2>&1 &. This code is meant to run perpetually. When it finishes processing data, it will sleep for one hour before looking for new data to process.

Requirements

fb_to_mongo's People

Contributors

sjacks26 avatar worlikarshweta avatar

Watchers

 avatar

Forkers

worlikarshweta

fb_to_mongo's Issues

Add post crawl history

Need to add some code that:

  1. Checks to see if a post is already in the main post collection
  2. If it is, move the post from the main post collection to the post history collection
  3. Insert the new post to the main post collection

Do this for comments too? Or just update number of likes if the comment is already in the main public collection?

No code to incorporate ML

This code doesn't currently have any functionality to incorporate ML work - either doing the labeling or reading the labeling from elsewhere.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.