Giter Club home page Giter Club logo

sohalibaisla / hindi-fake-news-fact-checker Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 3.45 MB

The purpose of the study was to learn more about the problem of misinformation. In this work, we proposed a machine-learning-based framework to automate the process of data annotation on Hindi Fake News Dataset. Our primary focus was on data annotation and automation, after collecting the data, we manually investigated each site to categorize fake (1) and not-fake (0) news.

License: MIT License

Python 22.25% Jupyter Notebook 77.75%

hindi-fake-news-fact-checker's Introduction

Hindi-Fake-News-Fact-Checker

In this project, the process of data annotation for fake news in the Hindi language is done. Data annotation is a very important task and essential for any AI/ML/DL project. Annotated data is used for training AI/ML/DL models, models learn for this training data and then work on the test data. Although the task of data annotation when done manually is very tedious and time-consuming it plays a huge role in determining the accuracy of the model. Data annotation is the key to building a successful AI model with high accuracy. The higher the accuracy is the better are the results of the model. The accuracy of the model depends greatly on the quality of the annotated data. If there is even a slight inaccuracy in data annotation the overall accuracy of the entire model is greatly affected.

flowchart

A machine-learning and a deep learning based framework to automate the process of data annotation. Our main contributions are:

  • First collected data from various fact check websites.
  • After extraction, the next step was pre-processing of data. For pre-processing we removed the punctuations and stopwords from the dataset followed by stemming and lemmatizing. Finally we vectorized the entire dataset using th TF/IDF Vectorizer.
  • Finally we applied baseline Machine Learning and Deep Learning Models: Gaussian Naive Bayes, Linear Regression, K-Nearest Neighbors, Support Vector Machines and Random Forest Search and Long Short-Term Memory.
  • The proposed models are tested on 10%, 20%, 30% and 40% test data of the dataset prepared. Our model has shown very promising results with high accuracy of 81.44% for the Random Forest model implemented on 10% test data. The highest accuracy for the LSTM model having 100 epochs and a batch size of 64 implemented on 10% test data was 64.70%.

    Results of ML and DL Models

  • hindi-fake-news-fact-checker's People

    Contributors

    sohalibaisla avatar

    Stargazers

    Veena S Kumar avatar

    Watchers

     avatar

    Forkers

    veenasnair18

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google โค๏ธ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.