To run this project do the following.
The project uses Machine Learning papers downloaded from arxiv.org using its API.
- Run fetch_papers.py - Downloads meta information of papers from the API
- Run download_papers.py - Downloads the actual papers for the corpus
- Run pdf_to_txt.py - Converts pdf into text file for fitting into the LSA model
Note: The above code does not belong to me. It is taken from https://github.com/karpathy/arxiv-sanity-preserver
- Run analyze.py - Train the LSA model using the text corpus
- export set FLASK_APP=webapp
- cd flaskapp
- python -m flask run