fb_to_mongo's Introduction

FB_to_mongo

This code takes data from FB's Graph API and writes that data to mongo.

It iterates through a specified directory, finding all files that are at least 30 minutes old. When it finishes processing all files at least 30 minutes old, it sleeps for 1 hour before resuming -- this means that the code can be run indefinitely, rather than being run each time there is new data to insert.
Using the filename, it determines whether the data contains a page, a post, comments, or replies.
It then parses the data according to a template made for each type of data.

It takes processed data and does the following:

It writes the processed data to a new file, putting that processed file in a processed directory
It inserts the processed data to Mongo
It moves the raw file to a raw archive directory

Installation and setup

Clone the code to your machine.
Rename config_template.py to config.py.
Modify the parameters in config.py:
- Put the proper path to the directory containing data in base_dirc. FB_to_mongo assumes that this directory will contain a directory called "download" that will contain subdirectories with data.
- Specify the name you want for the file that will contain info about candidates and FB page ids in candidate_info_json_file.
- Enter the credentials for Mongo into mongo_auth. If your instance of mongo is not password-protected, change "AUTH" to False.
- Enter the name you want for the Mongo DB into mongo_auth. FB_to_mongo will automatically create collection names that will live inside the DB you name here.
Run create_candidate_info_json.py
Check the file named in candidate_info_json_file in config.py. Make any changes as appropriate. The string that appears after the colon in each line will be the value for a field called "name" in the documents in each collection.
Run the code using sudo python3 FB_data_parsing.py >> insert.log 2>&1 &. This code is meant to run perpetually. When it finishes processing data, it will sleep for one hour before looking for new data to process.

Requirements

Python3
pymongo
dateutil

fb_to_mongo's People

Contributors

Watchers

fb_to_mongo's Issues

No code to incorporate ML

This code doesn't currently have any functionality to incorporate ML work - either doing the labeling or reading the labeling from elsewhere.

Add post crawl history

Need to add some code that:

Checks to see if a post is already in the main post collection
If it is, move the post from the main post collection to the post history collection
Insert the new post to the main post collection

Do this for comments too? Or just update number of likes if the comment is already in the main public collection?

sjacks26 / fb_to_mongo Goto Github PK

fb_to_mongo's Introduction

FB_to_mongo

Installation and setup

Requirements

fb_to_mongo's People

Contributors

Watchers

Forkers

fb_to_mongo's Issues

No code to incorporate ML

Add post crawl history

Fix collector names

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent