Giter Club home page Giter Club logo

paper-analytica's Introduction

Content-Based Research Paper Recommendation and Analytics Engine

About

At this time and age, research progresses exponentially and a lot of research papers get published everyday making it hard for a user to find a genuinely good research paper which is relevant to his/her field of research. We plan to solve this problem by analyzing research papers and providing the best papers relevant to the query of the user. We also provide relevant analytics related to the query as well as to each paper.

Dataset

We plan to use ARXIV data from 31000+ papers which is present on Kaggle. This data is mostly restricted to computer science. It contains metadata of all papers related to machine learning, computational language, neural and evolutionary computing, artificial intelligence, and computer vision fields published between 1992 to 2018.

Usage

In order to obtain the results mentioned in the report follow the below steps -

  1. First clone the repo to your local machine.

  2. Download the dataset mentioned above and place it in the data directory which is present in the same directory where you will be running the program. This is how it should look like -

    .
    ├── data
    │   └── arxivData.json
    ├── EDA.ipynb
    ├── LICENSE
    ├── model.py
    ├── preprocess.py
    ├── README.md
    └── topicModel.py
  3. After completing the above step, run preprocess.py

    $ python preprocess.py
    Computed vector and saved!
    Saved TF-IDF vectorizer!

    You can also use python preprocess.py --help for additional options.

  4. After running preprocess.py, run topicModel.py

    $ python topicModel.py
    NMF model saved!
    Saved topic dictionary!
    Saved topic labels!
  5. After the topics have been computed, run model.py

    $ python model.py "clustering techniques"
    ['An Analysis of Gene Expression Data using Penalized Fuzzy C-Means\n'
     '  Approach',
     'A Comparative study Between Fuzzy Clustering Algorithm and Hard\n'
     '  Clustering Algorithm',
     'On comparing clusterings: an element-centric framework unifies overlaps\n'
     '  and hierarchy',
     'Sparse Convex Clustering',
     'Similarity-Driven Cluster Merging Method for Unsupervised Fuzzy\n'
     '  Clustering',
     'Functorial Hierarchical Clustering with Overlaps',
     'Adaptive Evolutionary Clustering',
     'An Analytical Study on Behavior of Clusters Using K Means, EM and K*\n'
     '  Means Algorithm',
     'Clustering Multidimensional Data with PSO based Algorithm',
     'Risk Bounds For Mode Clustering']

    The above command will also generate 2 graphs.

    You can use $ python model.py --help for additional options.

paper-analytica's People

Contributors

pmk21 avatar thejas-bhat avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.