quorum's People
quorum's Issues
long term storage - AWS
Current state: only postgresql storage in implemented in storage
Add functionality to store data on aws
startup script
Modify start.sh to make it more user friendly!
There are a ton of sudo's, this seems to be a problem because of how many times you'd have to enter your password.
add yelp scraper
Get a yelp scraper going on here:
Community detection using pre-implemented algorithms
See:
Incorporate this functionality into quorum.
Social media analytics
create a social media analytics dashboard to keep track of trends, mentions, shared urls, etc.
user friendliness - running uorum
Right now the way to get this thing running is by using the start.sh get rid of this and make a proper makefile
.
scheduling jobs
Jobs are scheduled by adding an entry in scheduler/run_tasks.py. It would be nice to make this more flexible (i.e., I only want the twitter scraper to run).
docker-compose.yml to bind them all
Consider creating a docker-compose in the base directory that spins up all the docker containers instead of a shell script that calls docker-compose in each folder.
NLP - cleaning and parsing
centralize our nlp efforts:
Work on building models for all scrapers.
Documentation
Improve the documentation
- How to use this app
- how to plug in new modules using kafka-python
just-politics
Incorporate this work https://github.com/justforwhimsy/d4d_scripts
done for just-politics
requirements.txt in docker
Instead of Pip install in the dockerfiles, have each dockerfile do a pip install -r requirements.txt. In the requirements.txt, make sure the package has a version specified
Example:
facebook/dockerfile has the following lines
RUN pip install facebook-sdk
RUN pip install facepy
those packages should be in the requirements.txt as
facebook-sdk== version
facepy== version
Improve twitter scraper
There are a lot more things that can be done with twitter
removal of .gitignore_global
gitignore_global, when set, goes across all projects/repos on a users computer. This will probably causes issues with anyone who wants to contribute to this project. Suggest removing it and keeping the .gitignore
word2vec models
Moving this Issues Construct word2vec model with tweets here with the hopes of adding the twitter functionality and expanding the coverage to all other scrapers.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.