Giter Club home page Giter Club logo

positivum's Introduction


Positivum consits of a web application and a backend service which categorizes news articles by their sentiment.

NOTE: As of 30/06/2020 the public demo is no longer available. While the demo was run, around 34000 articles were collected and classified. I will release the categorized articles under an open-source license at a later time.

How it works

Every few minutes, a background service written in Python queries different RSS feeds stored in the database and classifies them using a model. The web application written in flask then displays the articles in the database to the users.

Model

The model is based on BERT. I used the transformers library to create a classification model using BBC articles annotated by myself. Currently, the dataset is quite small and this is why the sentiment analysis is not as accurate as I would like. In the future, this could be improved by completing some of the goals mentioned below.

For documentation purposes all iterations of my training scripts were saved.

The most up-to-date model can be found below:

Dependencies

The required dependencies for each component of Positivum are listed in the requirements.txt file inside the corresponding directory.

Goals

  • Create a reasonable model which is able to classify the title of news articles as positive/neutral and negative.
  • Create a backend service which is able to query and store articles from different RSS feeds which are fetched from the database.
  • Create a web application which displays the articles stored in the database.
  • Improve the web application appearance.
  • Show a shorter page navigation when the number of pages is big.
  • Use feedback from users to train and improve the model.
  • Share articles feature.
  • Show confidence in each sentiment on the web application.
  • Release document describing the progress of this project.

Disclaimer

This is a personal project developed for the Extend Project Qualification. You are welcome to use this project but I will not be providing support for it.

Dataset Source

The current dataset was annotated by myself, but is based on the following publication:

D. Greene and P. Cunningham. "Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering", Proc. ICML 2006. PDF BibTex.

License

MIT License

positivum's People

Contributors

dependabot[bot] avatar tomasff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.