ermongroup / csdi Goto Github PK

View Code? Open in Web Editor NEW

258.0 258.0 79.0 2.56 MB

Codes for "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation"

License: MIT License

Python 3.53% Jupyter Notebook 96.47%

csdi's People

Contributors

Stargazers

Watchers

csdi's Issues

No “featureemb” in Config.json

Thanks for your sharing and I tried the exe_physio.py in the "pretrained" mode， it had following error：

File "D:\Code\CSDI\CSDI-main\main_model.py", line 14, in init
self.emb_feature_dim = config["model"]["featureemb"]

KeyError: 'featureemb'

The question of CRPS metric

Hi, thank you for your nice work.

I saw the CRPS metric of the eq. (17) in your paper, my question is why the value of "i" is from 1 to 19.

Look forward to your reply. Thanks!

Hi

        Can you provide the code of your paper, thank you

The google drive link is broken

Thank you for your great work, the electricity dataset link is broken, could you please re-upload it

Implementation of CSDI for time series forecasting

I am very interested in the research in your field and want to further explore the research. However, I can not get the same performance as the results on time series forecasting task described in the paper. Our result on electricity dataset for CRPS is 0.1594 while yours is 0.017. Could you please figure out the differences of experimental settings between mine and yours.

The implementation and experimental settings are as follows. First, we use the code from diff_models.py and main_model.py as the time series forecasting model, which is the same as your paper. As for the dataset settings, we use all of the 370 dimensions (clients) of electricity dataset and set prediction step as 24 as the settings in the related work[1]. We also applied test pattern strategy for target choice, which is written by ourselves. For each instance of time series data, we consider the first 24 steps as the observed part and the last 1 step as the target part to forecast the target value autoregressively.

I can not find the reason causing the difference between the performance. Could you please share the code of time series forecasting if it is available.

DEformer-like model achieves 0.216 MAE on 10% missing healthcare dataset

Congratulations on your paper being accepted to NeurIPS, and thank you for sharing your code! I thought the task as described might be a good fit for a DEformer-like model (hereafter "DEformer-CSDI"), so I decided to run an experiment on the 10% missing healthcare dataset and I thought you might be interested in the results (code here). While my test set is identical to yours, I changed the training/validation split to 95%/5% and I used an online strategy to generate missing values for each training sample. Specifically, every time a training sample was encountered, I randomly selected 10% of the observed values to serve as the missing values.

Like the DEformer, the input for DEformer-CSDI consists of a mix of identity feature vectors and identity/value feature vectors. The difference in this case is that DEformer-CSDI is not learning the joint distribution, so only the identity feature vectors are included for the missing values and the attention mask is now full instead of lower triangular (i.e., every input can attend to every other input). Identity was encoded as f(t, k) = [t, embed(k)] where t and k are the time and feature indices, respectively, for a data point. One interesting difference between DEformer-CSDI and CSDI is that DEformer-CSDI simply ignores missing values that are not being predicted.

With no hyperparameter tuning, DEformer-CSDI achieves a mean absolute error of 0.219 on the 10% missing healthcare dataset. I thought it was notable that DEformer-CSDI outperformed the flattened Transformer baseline from Table 7 by a wide margin. With that being said, DEformer-CSDI is much larger than CSDI (19,250,493 parameters), so it would be interesting to see if CSDI's performance could be improved further using this online sampling strategy.

Time series Forecasting example with test pattern strategy

Hello, thank you for your excellent work.
I would like to try time series forecasting using the test pattern strategy described in the paper.
Are there any the example code on github?

Explanation of forecasting work

Dear author,
I have been reproducing the forecasting section of your paper recently. I have noticed that there is less mention of using CSDI for forecasting work in your paper. Do you have a specific explanation of how to do forecasting work? Also, you only provided the electricity dataset, would it be convenient to provide the other datasets from the gluonts.
I would greatly appreciate it.

testing time

I ran the code you posted on github and found that the time of training for a epoch takes about half a minute but the testing takes about half a hour(The command is “python exe_physio.py --testmissingratio 0.1 --nsample 100”). May I ask if this is in line with the actual situation.

error in forecasting code.

Hi,
when I run exe_forecasting.py it gives me type error.

The type error is :
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not NoneType

It is from this part of the code:
main_model.py", line 356, in get_side_info
feature_embed = self.embed_layer(feature_id).unsqueeze(1).expand(-1,L,-1,-1)

It would be great if you fix this.

Reproduction experiments

Hi, Thanks your great work!

I ran the experiment on the physio dataset with the pre-training model file you gave, But I still can't get the results in your paper。

for example, at a 10% missing ratio, the results in the paper are: 0.498/0.217/0.238(RMSE/MAE/CRPS), but I get 0.552/0.25/0.28

Is this gap too big？

difference between deterministic imputation and interpolation

I’m very interested in your work. But I have a question about the difference between deterministic imputation and interpolation.

May I ask why you want to do the deterministic imputation and interpolation experiments separately? I feel like they both impute the missing values.
Apart from the different data, are there any differences in the tasks between the two experiments?
Looking forward to your reply, thanks.

square root for the second coefficient

Hi,

In this line, to generate noisy data, why is the square root of $1 - \alpha_t$ being taken?

The paper did not have a square root for the second coefficient.

ermongroup / csdi Goto Github PK

csdi's People

Contributors

Stargazers

Watchers

Forkers

csdi's Issues

Recommend Projects

Recommend Topics

Recommend Org