Giter Club home page Giter Club logo

questionpairing's Introduction

Quora Question Pairing system

Mernstack Machine Learning Application

Fullstack web based machine learning application which tells if the two input/selected questions have similar meaning/intent.

Features:

  • Given two input questions the app predicts if they have same meaning/intent.
  • The two input questions will be stored in the MongoDB database.
  • The questions in the database will be rendered in the UI.
  • Form the given list of questions user can select any two questions and can ask to predict for the same and it will give the response accordingly.

Technologies used:

  • Frontend:- React.js and Material UI
  • Backend: - Nodejs, Express.js, MongoDB
  • Machine Learning:- Python, Ensemble Learning Algorithms, Data Analysis

How this application works:

  • On submitting the two input questions it gets stored in the database using the post() method.
  • Simultaneously those questions gets passed as parameter to the python script.
  • Python script on the server processes the input and gives the predicted result.
  • The predicted result gets rendered in the UI.
  • And the questions in the database fetched from the database using get() method to re-render in the UI.
  • If the user opts to select any two questions from the rendered list then those selected questions is passed to the server side to process.
  • After processing the result is displayed on the UI.

To run the python script on server side I have used Nodejs' child_process() method.

Cosine similarity

or click here to see video demo.

Now Machine Learning part:

  • Dataset is taken from Kaggle
  • The final training data was prepared after doing some cleaning(removing punctuation, stemming, lemmatisation, etc) and preprocessing(cosine similarity, polarity, question length, etc).
  • I have used python's nltk library to do the preprocessing.
  • Feature Engineering: to generate features out of the cleaned and preprocessed data.
  • To train the model I have use lightGBM, RandomForest and XGBoost algorithm. And out of all three XGBoost performs best.
  • So, XGBoost model was selected as the final for prediction.
  • Model evaluation metric: Log Loss. Log loss from XGBoost 0.428
  • To get the code for the ML part goto my other repo
How Cosine Similarity works:

Cosine similarity is a metric used to measure how similar the documents are irrespective of their size. Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space. The cosine similarity is advantageous because even if the two similar documents are far apart by the Euclidean distance (due to the size of the document), chances are they may still be oriented closer together. The smaller the angle, higher the cosine similarity.

To read more click here

Cosine similarity

How Polarity works:

Polarity analysis takes into account the amount of positive or negative terms that appear in a given sentence. It is useful to some extent, since it does a good job of structuring data sets. If two questions have different polarity they have more chances of being different or vice-versa.

To read more click here.

To run this project on your machine:
  • Clone this repository to your local machine.
  • Make sure you have node >= v14.15.3 installed.
  • To install the dependencies run npm install .
  • run the above command in the backend folder too.
  • Now to start server run nodemon server in he backend folder.
  • To start app run npm start in another terminal.
  • To run the python script you must have python >= 3.7 installed.
  • Make sure your server is running before asking the app to predict.

questionpairing's People

Contributors

kashaudhan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.