Giter Club home page Giter Club logo

online-news-popularity's Introduction

Online-news-popularity - Complete study

๐Ÿ”— Interative Report

๐Ÿ‘จโ€๐Ÿ”ฌ Our study's context and contributors

In the context of our 4th year python class's final project, we decided to study the Online-News Popularity Dataset. Two persons worked on this project : Samuel Pariente and Marius Ortega.

๐Ÿ“ Dataset presentation

This dataset was initaly published in 2015 and contains data regarding 39797 articles released on the website mashable.com between 2013 and 2015. It is composed as follows :

  • 61 variables in total
  • 58 predictive variables
  • 2 non predictive variables
  • 1 target variable (shares)

It is worth mentionning that the dataset doesn't have any NA's. However, multiple step of cleaning were still mandatory before starting any predictive process on it.

๐Ÿ“š The documents to your disposal

Our work led to the creation of multiple documents, all accessible from this GitHub :

  • A notebook : You can find in it the totality of our work on the dataset. Al of our scientific procedures are detailed there.
  • Improved Dataframes : Given that we scrapped additional data from Mashable and internet in general, our base dataset has more columns than the initial one. We can mention Author's name, Title of the article or the website trafic as new variables of this improved dataset.
  • Powerpoint presentation : This powerpoint stands as a report to our teachers. We presented it as the final step of the project.
  • Interactive Webapp : Deployed with Streamlit, this webapp is an handy way to introduce people to our work. In contains the same information as our notebook expect the code. In addition, the webapp has a predict your success section that allows you to predict in real time the future of your article's popularity. To do so, it is linked to a API powered on AWS served and containing out most effective machine learning model.

๐Ÿ” The structure of the project

The project is separated in 4 main sections :

  • Preprocessing
  • Data Discovery (Univariate and Bivariate analysis)
  • Optimize an article's success (Data Insights)
  • Prediting the success of an article (Machine Learning and Deep Learning Models)

๐Ÿ’ก If you want any additional details regarding our project, feel free to take a look at the documents we mentionned.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.