This is a tool to mine ArXiv, calculate summary statistics, and classify papers by topic.
- src includes the original code by @dormaayan to run the analysis on a small data sample.
- test is the modified code intended to be run in an HPC environment, specifically on the Sherlock cluster at Stanford. This first go was a test analysis, and imperfect as it wasn't run on all the data.
- analysis is a (hopefully) more complete run of the analysis, where we also take an inventory of files.