Light

sethips / sentence-inference Goto Github PK

View Code? Open in Web Editor NEW

This project forked from blurred-machine/sentence-inference

0.0 0.0 0.0 1.48 MB

For every given pair of sentences -- (sentence-1, sentence-2), we need to determine if sentence-2 can be logically inferred given sentence-1.

Home Page: https://jovian.ml/paras009/sentence-inference

License: MIT License

Jupyter Notebook 100.00%

sentence-inference's Introduction

Sentence-Inference

For every given pair of sentences -- (sentence-1, sentence-2), we need to determine if sentence-2 can be logically inferred given sentence-1.

Dataset Description:

Sentence1: String column of human entered text, Sentence 1
Sentence2: String column of human entered text, Sentence 2
gold_label: Categorical column inferring logical relation between sentence1 and sentence2

Implementation

Length of document in sentence1:
Length of document in sentence2:
Heatmap of correlation between the features:
Bidirectional LSTM Model performance(not good due to less data):
Selected model's performance for predicting the testing gold_label.

Inference

Since the dataset was very small, training a Neural network was not a good idea so I choose to move ahead with ML algorithms.
So, working on a large dataset can improve the learning.
Advanced NLP techniques can be implemented to find the semantic relationship between both the sentences to get a better result.
Due to lack of time I decided to follow this approach but with various iterations during the development, model's performance can increase significantly.
Data Cleaning was done signifantly well but can be done using other approaches.
Feature engineering is one important part which require good knowledge of NLP which can be worked upon in future.
Dimensionality reduction based on experimentation on using PCA or t-SNE can be perfromed to optimize model performance and remove useless features.
Hypothesis testing can be done in making useful decissions about the feature, whether they contribute in predicting right gold_label or not.
Word ebedding can be implemented to get a better semantic relationship between words.
Working with more better Neural Networks will be a better choice for this kind of problem, although bidirectional LSTM should perform well with large dataset.
Finally once we get a good model performance over the data, we can implement hyperparameter tuning to tune those small knobs in the bidirectional LSTM model to extract the best performance out of it.
for any suggestions contact me at [email protected]

sentence-inference's People

Contributors

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.