Giter Club home page Giter Club logo

bbc-dataset-news-classification's Introduction

BBC-Dataset-News-Classification

Consists of 2225 documents from the BBC news website corresponding to stories in five topical areas from 2004-2005.

Class Labels: 5 (business, entertainment, politics, sport, tech)

Dataset Discription:

BBC Datasets Descrition

Dataset

Files Description

  • dataset/data_files: Data folders each containing several news txt files

  • dataset/dataset.csv: csv file containing "news" and "type" as columns. "news" column represent news article and "type" represents news category among business, entertainment, politics, sport, tech.

  • model/get_data.py: To gather all txt files into one csv file contianing two columns("news","type"). After successfull execution it will create dataset.csv file in dataset folder.

  • model/model.py: preprocessing, tf-idf feature extraction and model buildind and evaluation stuff

  • model/test.ipynb: jupyter notebook

Method

Divided the feature extracted dataset into two parts train and test set. Train set contains 1780 examples and Test set contains 445 examples.

Result

Below table shows the result on test set

Accuracy Value
Kappa 0.9461
Accuracy 0.9573

bbc-dataset-news-classification's People

Contributors

suraj-deshmukh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

bbc-dataset-news-classification's Issues

Vectorized.pkl Not Loading

Dear Suraj,
I am trying to run the code provided by you but facing error for loading vectorized.pkl file. I am currently using google colab. I tried to run the code with Python 3 environment but it was showing error loading this pickle file. Then I switched to Python 2 then also I am facing problem it is showing error shown in image attached. can you help me to solve this error.
image

Thanking you,
Kevin Patel

NLTK not available on Python2

NLTK no longer supports Python2, is there an alternative that you would suggest?
Can't get the code to work as of now.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.