News Classification Project

This project focuses on classifying news articles into two categories: fake news and true news, using various machine learning models and text vectorization techniques. The goal is to build a robust text classification model that can accurately distinguish between fake and true news.

Vectorization Techniques

Three different text vectorization techniques have been used:

Count Vectorizer: This technique converts text data into a numerical format based on the frequency of words.
TF-IDF Vectorizer: TF-IDF (Term Frequency-Inverse Document Frequency) is used to represent the importance of words in a document relative to the entire corpus.
Word2Vec Vectorizer: This method creates word embeddings by learning word associations within the text data.

Machine Learning Models

The following machine learning models have been implemented and evaluated for news classification:

Logistic Regression: Implemented with variations for Count Vectorizer, TF-IDF Vectorizer, and Word2Vec Vectorizer.
Naive Bayes: Implemented with variations for Count Vectorizer, TF-IDF Vectorizer, and Word2Vec Vectorizer.
Support Vector Machine (SVM): Implemented with variations for Count Vectorizer, TF-IDF Vectorizer, and Word2Vec Vectorizer.
Random Forest: Initially implemented with variations for Count Vectorizer, TF-IDF Vectorizer, and Word2Vec Vectorizer. An improved version of the Random Forest model with TF-IDF Vectorization is also presented.

Evaluation

The models have been evaluated using the test dataset, and the following metrics have been calculated:

Accuracy
F1 Score

Benchmark Improvement

The Random Forest model with TF-IDF Vectorization has been improved by adjusting its parameters for better performance.

Dependencies

Ensure you have the following Python libraries installed:

numpy
pandas
scikit-learn
gensim
matplotlib
seaborn

Usage

Clone this repository to your local machine:

git clone https://github.com/NikosMav/AI-FakeNews-Classification.git

Install the required dependencies:

pip install numpy pandas scikit-learn gensim matplotlib seaborn

Run the Jupyter Notebook or Python script to execute the machine learning models and perform news classification.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Feel free to explore the Jupyter Notebook for a detailed step-by-step explanation of the project and its implementation.

nikosmav / fakenews-classification Goto Github PK

fakenews-classification's Introduction

News Classification Project

Vectorization Techniques

Machine Learning Models

Evaluation

Benchmark Improvement

Dependencies

Usage

License

fakenews-classification's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent