Here we put a brief description of the files contained in the repository:
ADM-HW3-GP2.ipynb: This contains the outputs of the search engine. Mainly there are the function calls and the visualization of the outputs, at the end we put two technical appendixes for a little explaination of what we have done in the background and some theoretical stuffs.
search_engine.py: This is the library that contains all the classes and all the useful functions for handle the search engine. The private methods are only for internal use.
map/reduce_1.py: These are the scripts for build the first inverted index with map-reduce framework, we prefer to run this instead of spark because the results are stored directly in the disk.
map.html: The folium map for the bonus point, we saved it because of the problems of github in loading maps.