A common challenge encountered during working with time series is the presence of missing data. To address this issue, imputation is a widely used approach that involves filling in missing values rather than dropping them. However, the key challenge in imputation is determining the appropriate values to use for filling in the missing data.
In this project, we propose to assess the effectiveness of applying deep learning-based models in time series imputation compared to statistical methods that do not require prior training.
The data is accessible through the link: https://drive.google.com/drive/folders/10OYuhaT3nEaJmoGJLNMzOiSVPCtMJJtW?usp=sharing
!!! The data is a csv file of name 'household_power_consumption.csv' to be placed in the data
folder before any execution.
The project is structured as follows:
data
: contains the data used in the project and needs to be filled with a csv file from the drive link provided in the reporttrained_models
: contains the finaled trained models of each structuredata_generation
: generates the split of the data into train, validation and test setsdata_analysis
: commented notebook with the analysis of the datamain
: contains the majority of the code (preprocessing, datasets and dataloaders generation, code for statistical methods, deeep learning based models Pytorch implementation, training and evaluation)TimeGanDataAugmentation
: contains the code for the TimeGAN data augmentation methodrequirements.txt
: contains the list of packages required to run the code