๐ Interative Report
In the context of our 4th year python class's final project, we decided to study the Online-News Popularity Dataset. Two persons worked on this project : Samuel Pariente and Marius Ortega.
This dataset was initaly published in 2015 and contains data regarding 39797 articles released on the website mashable.com between 2013 and 2015. It is composed as follows :
- 61 variables in total
- 58 predictive variables
- 2 non predictive variables
- 1 target variable (shares)
It is worth mentionning that the dataset doesn't have any NA's. However, multiple step of cleaning were still mandatory before starting any predictive process on it.
Our work led to the creation of multiple documents, all accessible from this GitHub :
- A notebook : You can find in it the totality of our work on the dataset. Al of our scientific procedures are detailed there.
- Improved Dataframes : Given that we scrapped additional data from Mashable and internet in general, our base dataset has more columns than the initial one. We can mention Author's name, Title of the article or the website trafic as new variables of this improved dataset.
- Powerpoint presentation : This powerpoint stands as a report to our teachers. We presented it as the final step of the project.
- Interactive Webapp : Deployed with Streamlit, this webapp is an handy way to introduce people to our work. In contains the same information as our notebook expect the code. In addition, the webapp has a predict your success section that allows you to predict in real time the future of your article's popularity. To do so, it is linked to a API powered on AWS served and containing out most effective machine learning model.
The project is separated in 4 main sections :
- Preprocessing
- Data Discovery (Univariate and Bivariate analysis)
- Optimize an article's success (Data Insights)
- Prediting the success of an article (Machine Learning and Deep Learning Models)
๐ก If you want any additional details regarding our project, feel free to take a look at the documents we mentionned.