Giter Club home page Giter Club logo

covid-eu-analysis's Introduction

covid-eu-analysis

Multivariate analysis on the impact of socio-economic factors on the spread of covid-19 within European NUTS2 regions, during the first and second wave of the epidemic.

Setup

To install the dependencies of the project

pip install -r requirements.txt

The dataset is already provided by the covid_at_lombardy.sqlite file, or it can be built from the original data sources by running

cd database
bash run_setup.sh
mv covid_at_lombardy.sqlite ../
cd ../

Note: the source eurostat data can be subject to future changes, hence there can be discrepancies if new data points are modified or added to the source eurostat data repositories of the variables considered in this analysis. The last update date can be verified at the top quadrant of the each Eurostat data repository (e.g. see last update at, at the following example eurostat dataset)

Run the experiments

Both experiments on the first and second wave can be easily run and analyzed by means of the provided jupyter notebooks, which can be found at the top level of this repository.

Dataset

The dataset has been built in order to assess which socio-economic features of the analyzed NUTS2 European regions intrinsically posed each at greater risk of epidemic spread. Such experimental setting would ideally serve the purpose of observing whether the epidemic spread, which occurred in Lombardy at the start of 2020, had been due to randomness or due to some intrinsic factors that are possibly shared by one or more European regions: those were majorly affected by the epidemic.

Hence, the dataset has been built by considering the NUTS2 European regions as the samples, each of which has been characterized according to a set of socio-economic variables. The target variable has been engineered from the raw number of cases that had occurred in the first and second wave of the epidemic.

The dataset consists NUTS2 European regions, each represented by:

The dataset has been processed in ordet to obtain a tabular formatted dataset which contains both predictors and the target variable, separately for the first and second wave of the epidemic.

The target variable is the categorical binary risk class obtained by considering two clusters of coronavirus cases density.

Models

Interpretability has been the main driver behind the choice of each classification model selected for the analysis. Hence, the following models have been considered:

  • Logistic regression
  • Random forest
  • Linear svm

Hyper-parameter optimization has been carried out on each of these.

covid-eu-analysis's People

Contributors

chris1nexus avatar fgiobergia avatar

Watchers

 avatar  avatar  avatar

covid-eu-analysis's Issues

Calcolo % missing values per ogni regione

Per il calcolo della % dei missing values per ogni regione, viene usato questo codice:

    plt.hist(data_manager.df.isna().sum(axis=1)/len(data_manager.df))

Dato che data_manager e' un dataframe con una riga per ogni regione e una colonna per ogni feature, e' corretto fare df.isna().sum(axis=1) per sapere, per ogni regione, quante features sono mancanti. Pero', il risultato andrebbe poi normalizzato per il numero totale di features disponibili (len(data_manager.df.columns)), e non per il numero totale di regioni (len(data_manager.df)).

Ti torna @Chris1nexus ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.