Giter Club home page Giter Club logo

nikosmav / fakenews-classification Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 6.16 MB

In this notebook we analyze and classify news articles using machine learning techniques, including Logistic Regression, Naive Bayes, Support Vector Machines, and Random Forests. Explore text vectorization and NLP for accurate news categorization.

License: MIT License

Jupyter Notebook 100.00%
fake-news-dataset fake-news-detection model-training neural-networks python-notebook logistic-regression naive-bayes natural-language-processing random-forest svm

fakenews-classification's Introduction

News Classification Project

This project focuses on classifying news articles into two categories: fake news and true news, using various machine learning models and text vectorization techniques. The goal is to build a robust text classification model that can accurately distinguish between fake and true news.

Vectorization Techniques

Three different text vectorization techniques have been used:

  1. Count Vectorizer: This technique converts text data into a numerical format based on the frequency of words.

  2. TF-IDF Vectorizer: TF-IDF (Term Frequency-Inverse Document Frequency) is used to represent the importance of words in a document relative to the entire corpus.

  3. Word2Vec Vectorizer: This method creates word embeddings by learning word associations within the text data.

Machine Learning Models

The following machine learning models have been implemented and evaluated for news classification:

  1. Logistic Regression: Implemented with variations for Count Vectorizer, TF-IDF Vectorizer, and Word2Vec Vectorizer.

  2. Naive Bayes: Implemented with variations for Count Vectorizer, TF-IDF Vectorizer, and Word2Vec Vectorizer.

  3. Support Vector Machine (SVM): Implemented with variations for Count Vectorizer, TF-IDF Vectorizer, and Word2Vec Vectorizer.

  4. Random Forest: Initially implemented with variations for Count Vectorizer, TF-IDF Vectorizer, and Word2Vec Vectorizer. An improved version of the Random Forest model with TF-IDF Vectorization is also presented.

Evaluation

The models have been evaluated using the test dataset, and the following metrics have been calculated:

  • Accuracy
  • F1 Score

Benchmark Improvement

The Random Forest model with TF-IDF Vectorization has been improved by adjusting its parameters for better performance.

Dependencies

Ensure you have the following Python libraries installed:

  • numpy
  • pandas
  • scikit-learn
  • gensim
  • matplotlib
  • seaborn

Usage

  1. Clone this repository to your local machine:
git clone https://github.com/NikosMav/AI-FakeNews-Classification.git
  1. Install the required dependencies:
pip install numpy pandas scikit-learn gensim matplotlib seaborn
  1. Run the Jupyter Notebook or Python script to execute the machine learning models and perform news classification.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Feel free to explore the Jupyter Notebook for a detailed step-by-step explanation of the project and its implementation.

fakenews-classification's People

Contributors

nikosmav avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.