thuml / nonstationary_transformers Goto Github PK

Code release for "Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting" (NeurIPS 2022), https://arxiv.org/abs/2205.14415

License: MIT License

Python 68.46% Shell 31.54%

deep-learning time-series forecasting non-stationary

nonstationary_transformers's Introduction

Non-stationary Transformers

This is the codebase for the paper: Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting, NeurIPS 2022. [Slides], [Poster].

🚩 News (2023.02) Non-stationary Transformer has been included in [Time-Series-Library], which covers long- and short-term forecasting, imputation, anomaly detection, and classification.

Discussions

There are already several discussions about our paper, we appreciate a lot for their valuable comments and efforts: [Official], [OpenReview], [Zhihu].

Architecture

Series Stationarization

Series Stationarization unifies the statistics of each input and converts the output with restored statistics for better predictability.

De-stationary Attention

De-stationary Attention is devised to recover the intrinsic non-stationary information into temporal dependencies by approximating distinguishable attentions learned from unstationarized series.

Showcases

Preparation

Install Python 3.7 and neccessary dependencies.

pip install -r requirements.txt

All the six benchmark datasets can be obtained from Google Drive or Tsinghua Cloud.

Training scripts

Non-stationary Transformer

We provide the Non-stationary Transformer experiment scripts and hyperparameters of all benchmark dataset under the folder ./scripts.

# Transformer with our framework
bash ./scripts/ECL_script/ns_Transformer.sh
bash ./scripts/Traffic_script/ns_Transformer.sh
bash ./scripts/Weather_script/ns_Transformer.sh
bash ./scripts/ILI_script/ns_Transformer.sh
bash ./scripts/Exchange_script/ns_Transformer.sh
bash ./scripts/ETT_script/ns_Transformer.sh

# Transformer baseline
bash ./scripts/ECL_script/Transformer.sh
bash ./scripts/Traffic_script/Transformer.sh
bash ./scripts/Weather_script/Transformer.sh
bash ./scripts/ILI_script/Transformer.sh
bash ./scripts/Exchange_script/Transformer.sh
bash ./scripts/ETT_script/Transformer.sh

Non-stationary framework to promote other Attention-based models

We also provide the scripts for other Attention-based models (Informer, Autoformer), for example:

# Informer promoted by our Non-stationary framework
bash ./scripts/Exchange_script/Informer.sh
bash ./scripts/Exchange_script/ns_Informer.sh

# Autoformer promoted by our Non-stationary framework
bash ./scripts/Weather_script/Autoformer.sh
bash ./scripts/Weather_script/ns_Autoformer.sh

Experiment Results

Main Results

For multivariate forecasting results, the vanilla Transformer equipped with our framework consistently achieves state-of-the-art performance in all six benchmarks and prediction lengths.

Model Promotion

By applying our framework to six mainstream Attention-based models. Our method consistently improves the forecasting ability. Overall, it achieves averaged 49.43% promotion on Transformer, 47.34% on Informer, 46.89% on Reformer, 10.57% on Autoformer, 5.17% on ETSformer and 4.51% on FEDformer, making each of them surpass previous state-of-the-art.

Future Work

We will keep equip the following models with our proposed Non-stationary Transformers framework:

Citation

If you find this repo useful, please cite our paper.

@article{liu2022non,
  title={Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting},
  author={Liu, Yong and Wu, Haixu and Wang, Jianmin and Long, Mingsheng},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

Contact

If you have any questions or want to use the code, please contact [email protected].

Acknowledgement

This repo is built on the Autoformer repo, we appreciate the authors a lot for their valuable code and efforts.

nonstationary_transformers's People

Contributors

Stargazers

Watchers

nonstationary_transformers's Issues

经常报这个错，有人遇到过吗？不知道怎么处理，请大神给解决办法，好像是参数的问题，换别的参数就没问题，不知道问题所在

Args in experiment:
Namespace(is_training=1, model_id='test', model='ns_Autoformer', data='custom', root_path='./all_sj/', data_path='dataset.csv', features='MS', target='close', freq='t', checkpoints='./results/checkpoints/', seq_len=131, label_len=70, pred_len=5, enc_in=5, dec_in=5, c_out=1, d_model=752, n_heads=8, e_layers=4, d_layers=2, d_ff=1648, moving_avg=6, factor=1, distil=True, dropout=0.11499999999999999, embed='timeF', activation='softmax', output_attention=True, do_predict=True, num_workers=0, itr=3, train_epochs=10, batch_size=16, patience=3, learning_rate=2.9954351944587492e-05, des='test', loss='sparse_categorical_crossentropy', lradj='type1', use_amp=False, use_gpu=True, gpu=0, use_multi_gpu=False, devices='0,1', seed=20, p_hidden_dims=[208, 208, 208], p_hidden_layers=3)
Use GPU: cuda:0

start training : test_ns_Autoformer_custom_ftMS_sl131_ll70_pl5_dm752_nh8_el4_dl2_df1648_fc1_ebtimeF_dtTrue_test_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 5452
val 795
test 1592
Traceback (most recent call last):
File "/tmp/Nonstationary_Transformers/./run_test.py", line 66, in
exp.train(setting)
File "/tmp/Nonstationary_Transformers/exp/exp_main.py", line 148, in train
outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
File "/root/anaconda3_gpu_keras/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/tmp/Nonstationary_Transformers/ns_models/ns_Autoformer.py", line 127, in forward
seasonal_init, trend_init = self.decomp(x_enc)
File "/root/anaconda3_gpu_keras/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/tmp/Nonstationary_Transformers/ns_layers/Autoformer_EncDec.py", line 49, in forward
res = x - moving_mean
RuntimeError: The size of tensor a (131) must match the size of tensor b (130) at non-singleton dimension 1

Using Custom Dataset with Different Input Lengths for Model

My input dimensions are [1500, 50, 6] and my label dimensions are [1500, 1]. Can I use this model? If so, could you please explain how to configure it correctly?

您好，关于预测结果，如何得到反标准化的预测？

输出得到的预测结果和误差似乎是标准化后的预测和误差

Drawing problem

Hi author! I want to use this method to my paper, please about how to get Figure 1?
thank your answer！

训练以及测试时为什么drop_last=True?

这样训练会少很多样本；
测试时也会缺少样本，导致指标不正确。
论文的指标也是这样的setting下得到的吗？

比如 ill数据集 36in 36pred
Namespace(activation='gelu', batch_size=32, c_out=7, checkpoints='./checkpoints/', d_ff=2048, d_layers=1, d_model=512, data='custom', data_path='national_illness.csv', dec_in=7, des='Exp_h32_l2', devices='0,1,2,3', distil=True, do_predict=False, dropout=0.05, e_layers=2, embed='timeF', enc_in=7, factor=3, features='M', freq='h', gpu=1, is_training=1, itr=1, label_len=18, learning_rate=0.0001, loss='mse', lradj='type1', model='ns_Transformer', model_id='ili_36_36', moving_avg=25, n_heads=8, num_workers=10, output_attention=False, p_hidden_dims=[32, 32], p_hidden_layers=2, patience=3, pred_len=36, root_path='./illness/', seed=2021, seq_len=36, target='OT', train_epochs=10, use_amp=False, use_gpu=True, use_multi_gpu=False)

测试时drop_last=False:
test shape: (158, 36, 7) (158, 36, 7)
mse:2.9380314350128174, mae:1.051339030265808
测试时drop_last=True:
test shape: (128, 36, 7) (128, 36, 7) --- 缺少了158-128个测试样本
1.8928638696670532, mae:0.8517037034034729

麻烦验证下。

nonstationary for fedformer

Hello,

Thanks a lot for a very interesting work. Will this codebase continue to be updated with the code for FEDformer? Is adding non-starionary in fedformer more different than adding it in autoformer?

可视化效果不佳

请问为什么我在使用etth1 得出的结果如果，mse和论文相似0.3左右，但是可视化时，明显不佳.

作者你好，请问如何在代码中实现反归一化呢？

Multivariate Time Series ADF Test Stat

Dear authors,

I am curious how you got the multivariate ADF test stat in your paper, e.g. ETTm2 = -6.225. arch.unitroot.ADF is only for univariate.

Thanks!

输出的数据

作者你好，代码运行之后，产生的npy文件一共有三个，算上real_prediction共有四个，这里面pred,npy和true,npy，运行之后的数据为什么都在0到1之间，与原始数据不符合，是需要进行反归一化，或者放到其他模型去跑吗

Ablation with Stat + DeFF + DeAttn

Thank you for publishing code!

I have a question on the following table, and would be great if you could enlighten me on this: given the fact that when combining Stat with either DeFF/DeAtten, performance gets boosted, have you tried to benchmark the case where all three are applied, i.e. Stat + DeFF + DeAttn?

Also for DeFF implementation, how exactly should it be coded?

这里的步长是什么意思呀？

作者你好，我看到你的代码对于非平稳时序数据效果很好，但是在这三个参数中有一点疑惑，label_len = 48是什么意思呀？观测数据不是96，然后基于96行观测数据预测未来96个时点的数据吗？

How to calculate the stationarity of the output

In this discussion, it is mentioned that all the predicted time series are arranged in chronological order and then compared with the ground truth in terms of stationarity, which results in the following figure:

Could you please explain specifically how the predicted time series are arranged? Assume the length of the test sequence is from t1 to t10, and we have prediction sequences y1=[t1,t2,t3] and y2=[t2,t3,t4]. How do we handle the overlap between y1 and y2 in such cases?"

About Implementation of DeFF

Since DeFF has been mentioned in ablation study and openreview, could you give a more detailed description about how to re-incorporating the μ and σ into feed-forward layers ?

ETSFormer: how to apply tau/delta?

Hello,

Thanks a lot for a very interesting work, would you be so kind to provide some hints for how to apply learned tau/delta for ETSFormer architecture you mentioned in the paper?

直接运行脚本，指标没有达到论文精度？

直接运行了给的scripts下的脚本值班与论文和本repo中的实验结果截图都相差很多。比如：

可以看下是什么问题吗？没有修改任何代码

ETTm2 scripts

Hi, I’m trying to run the experiments on ETTm2. I found in this repository the scripts for ETTh2. Are you using the same settings for ETTm2? If not, would you please add the scripts for ETTm2?

About decoder input

Hello
I have a question in the train part of exp/exp_main.py.
Is there a reason to fill the prediction part of the decoder's input with zeros instead of true values?
I know that in the general transformer learning process, true values are put in the decoder as well.
please answer about my question.
Thank you.

关于tau和delta的计算

谢谢你们的工作，有一个疑问，为什么tau 和 delta的计算可以表示为以上的MLP映射？谢谢

关于训练收敛的问题

您好，我在ETTh2上测试您的ns_transformer模型，训练中我发现模型好像大部分时候在完成第一个epoch时val loss就最低了，但是train loss不是最低，于是之后的训练，train loss下降，val loss在最低点上方震荡，3个epoch后early stopping了，请问这是正常现象吗，模型会不会存在欠拟合的情况。下面附上我的实验情况。
@WenWeiTHU
Args in experiment:
Namespace(activation='gelu', batch_size=32, c_out=7, checkpoints='./checkpoints/', d_ff=2048, d_layers=1, d_model=512, data='ETTh2', data_path='ETTh2.csv', dec_in=7, des="'Exp_h256_l2'", devices='0,1,2,3', distil=True, do_predict=False, dropout=0.05, e_layers=2, embed='timeF', enc_in=7, factor=1, features='M', freq='h', gpu=0, is_training=1, itr=3, label_len=48, learning_rate=0.0001, loss='mse', lradj='type1', model='ns_Transformer', model_id='ETTh2_96_96', moving_avg=25, n_heads=8, num_workers=0, output_attention=False, p_hidden_dims=[256, 256], p_hidden_layers=2, patience=3, pred_len=96, root_path='./dataset/ETT-small/', seed=2021, seq_len=96, target='OT', train_epochs=10, use_amp=False, use_gpu=True, use_multi_gpu=False)
Use GPU: cuda:0

start training : ETTh2_96_96_ns_Transformer_ETTh2_ftM_sl96_ll48_pl96_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue_'Exp_h256_l2'0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8449
val 2785
test 2785
iters: 100, epoch: 1 | loss: 0.3959244
speed: 0.0768s/iter; left time: 195.2609s
iters: 200, epoch: 1 | loss: 0.2405966
speed: 0.0632s/iter; left time: 154.2832s
Epoch: 1 cost time: 18.004290342330933
Epoch: 1, Steps: 264 | Train Loss: 0.3269685 Vali Loss: 0.2746330 Test Loss: 0.3945290
Validation loss decreased (inf --> 0.274633). Saving model ...
Updating learning rate to 0.0001
iters: 100, epoch: 2 | loss: 0.3052072
speed: 0.1449s/iter; left time: 329.8502s
iters: 200, epoch: 2 | loss: 0.1317495
speed: 0.0626s/iter; left time: 136.2892s
Epoch: 2 cost time: 16.520211696624756
Epoch: 2, Steps: 264 | Train Loss: 0.2197954 Vali Loss: 0.2868253 Test Loss: 0.4253391
EarlyStopping counter: 1 out of 3
Updating learning rate to 5e-05
iters: 100, epoch: 3 | loss: 0.1411438
speed: 0.1452s/iter; left time: 292.2331s
iters: 200, epoch: 3 | loss: 0.1338760
speed: 0.0631s/iter; left time: 120.7719s
Epoch: 3 cost time: 16.639776468276978
Epoch: 3, Steps: 264 | Train Loss: 0.1703486 Vali Loss: 0.2900613 Test Loss: 0.4549606
EarlyStopping counter: 2 out of 3
Updating learning rate to 2.5e-05
iters: 100, epoch: 4 | loss: 0.1420920
speed: 0.1458s/iter; left time: 255.0376s
iters: 200, epoch: 4 | loss: 0.1750177
speed: 0.0632s/iter; left time: 104.2967s
Epoch: 4 cost time: 16.763848543167114
Epoch: 4, Steps: 264 | Train Loss: 0.1526386 Vali Loss: 0.2885140 Test Loss: 0.4750347
EarlyStopping counter: 3 out of 3
Early stopping
testing : ETTh2_96_96_ns_Transformer_ETTh2_ftM_sl96_ll48_pl96_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue'Exp_h256_l2'0<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
test 2785
test shape: (87, 32, 96, 7) (87, 32, 96, 7)
test shape: (2784, 96, 7) (2784, 96, 7)
mse:0.3945288360118866, mae:0.41980457305908203
Use GPU: cuda:0
start training : ETTh2_96_96_ns_Transformer_ETTh2_ftM_sl96_ll48_pl96_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue'Exp_h256_l2'1>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8449
val 2785
test 2785
iters: 100, epoch: 1 | loss: 0.2209248
speed: 0.0653s/iter; left time: 165.9310s
iters: 200, epoch: 1 | loss: 0.2358510
speed: 0.0638s/iter; left time: 155.7341s
Epoch: 1 cost time: 17.02840828895569
Epoch: 1, Steps: 264 | Train Loss: 0.3301022 Vali Loss: 0.2849490 Test Loss: 0.3910953
Validation loss decreased (inf --> 0.284949). Saving model ...
Updating learning rate to 0.0001
iters: 100, epoch: 2 | loss: 0.2643779
speed: 0.1497s/iter; left time: 340.8089s
iters: 200, epoch: 2 | loss: 0.2535767
speed: 0.0641s/iter; left time: 139.5449s
Epoch: 2 cost time: 16.864766597747803
Epoch: 2, Steps: 264 | Train Loss: 0.2152764 Vali Loss: 0.2894374 Test Loss: 0.4258763
EarlyStopping counter: 1 out of 3
Updating learning rate to 5e-05
iters: 100, epoch: 3 | loss: 0.1445326
speed: 0.1484s/iter; left time: 298.6893s
iters: 200, epoch: 3 | loss: 0.1796765
speed: 0.0643s/iter; left time: 123.0670s
Epoch: 3 cost time: 17.12177586555481
Epoch: 3, Steps: 264 | Train Loss: 0.1675327 Vali Loss: 0.2817460 Test Loss: 0.4827897
Validation loss decreased (0.284949 --> 0.281746). Saving model ...
Updating learning rate to 2.5e-05
iters: 100, epoch: 4 | loss: 0.1448471
speed: 0.1531s/iter; left time: 267.8125s
iters: 200, epoch: 4 | loss: 0.1152792
speed: 0.0649s/iter; left time: 107.0724s
Epoch: 4 cost time: 17.29974603652954
Epoch: 4, Steps: 264 | Train Loss: 0.1513977 Vali Loss: 0.2860749 Test Loss: 0.4547649
EarlyStopping counter: 1 out of 3
Updating learning rate to 1.25e-05
iters: 100, epoch: 5 | loss: 0.1340131
speed: 0.1480s/iter; left time: 219.7723s
iters: 200, epoch: 5 | loss: 0.1168449
speed: 0.0645s/iter; left time: 89.3987s
Epoch: 5 cost time: 17.2496280670166
Epoch: 5, Steps: 264 | Train Loss: 0.1434257 Vali Loss: 0.2873148 Test Loss: 0.4631858
EarlyStopping counter: 2 out of 3
Updating learning rate to 6.25e-06
iters: 100, epoch: 6 | loss: 0.1203539
speed: 0.1544s/iter; left time: 188.5536s
iters: 200, epoch: 6 | loss: 0.1027171
speed: 0.0647s/iter; left time: 72.5008s
Epoch: 6 cost time: 17.324661016464233
Epoch: 6, Steps: 264 | Train Loss: 0.1395933 Vali Loss: 0.2859809 Test Loss: 0.4624731
EarlyStopping counter: 3 out of 3
Early stopping
testing : ETTh2_96_96_ns_Transformer_ETTh2_ftM_sl96_ll48_pl96_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue'Exp_h256_l2'1<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
test 2785
test shape: (87, 32, 96, 7) (87, 32, 96, 7)
test shape: (2784, 96, 7) (2784, 96, 7)
mse:0.4827895164489746, mae:0.46259817481040955
Use GPU: cuda:0
start training : ETTh2_96_96_ns_Transformer_ETTh2_ftM_sl96_ll48_pl96_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue'Exp_h256_l2'2>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8449
val 2785
test 2785
iters: 100, epoch: 1 | loss: 0.4084912
speed: 0.0673s/iter; left time: 170.8999s
iters: 200, epoch: 1 | loss: 0.2444842
speed: 0.0665s/iter; left time: 162.3365s
Epoch: 1 cost time: 17.56477952003479
Epoch: 1, Steps: 264 | Train Loss: 0.3333128 Vali Loss: 0.2911936 Test Loss: 0.4261954
Validation loss decreased (inf --> 0.291194). Saving model ...
Updating learning rate to 0.0001
iters: 100, epoch: 2 | loss: 0.3382215
speed: 0.1517s/iter; left time: 345.3431s
iters: 200, epoch: 2 | loss: 0.2358498
speed: 0.0658s/iter; left time: 143.2543s
Epoch: 2 cost time: 17.257097005844116
Epoch: 2, Steps: 264 | Train Loss: 0.2223898 Vali Loss: 0.3071573 Test Loss: 0.4283208
EarlyStopping counter: 1 out of 3
Updating learning rate to 5e-05
iters: 100, epoch: 3 | loss: 0.1242966
speed: 0.1516s/iter; left time: 305.2351s
iters: 200, epoch: 3 | loss: 0.1247057
speed: 0.0645s/iter; left time: 123.4376s
Epoch: 3 cost time: 17.220128059387207
Epoch: 3, Steps: 264 | Train Loss: 0.1741069 Vali Loss: 0.3256035 Test Loss: 0.4496353
EarlyStopping counter: 2 out of 3
Updating learning rate to 2.5e-05
iters: 100, epoch: 4 | loss: 0.1237622
speed: 0.1516s/iter; left time: 265.1746s
iters: 200, epoch: 4 | loss: 0.1213419
speed: 0.0649s/iter; left time: 107.0502s
Epoch: 4 cost time: 17.295646905899048
Epoch: 4, Steps: 264 | Train Loss: 0.1562201 Vali Loss: 0.3242903 Test Loss: 0.4585052
EarlyStopping counter: 3 out of 3
Early stopping
testing : ETTh2_96_96_ns_Transformer_ETTh2_ftM_sl96_ll48_pl96_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue'Exp_h256_l2'_2<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
test 2785
test shape: (87, 32, 96, 7) (87, 32, 96, 7)
test shape: (2784, 96, 7) (2784, 96, 7)
mse:0.4261956214904785, mae:0.4425164461135864

standard deviation issue

what i believe is while calculating standard deviation, there should be "x_raw" rather than "x_enc".
std_enc = torch.sqrt(torch.var(x_enc, dim=1, keepdim=True, unbiased=False) + 1e-5).detach()

standard scalar/fit scaler/transform

file = data_loader.py
if self.scale:
train_data = df_data[border1s[0]:border2s[0]]
self.scaler.fit(train_data.values)
data = self.scaler.transform(df_data.values)
here why you applied fit scaler to 'train_data' and then applied transform to whole df_data.values. its beyond my understanding.

在data_factory.py中flag==test将drop_last=False是，测试结果显示tuple index out of range

之前drop_last默认为True并且设置预测长度为1的时候，得出的测试集长度会减少，然后设置drop_last==False时报错了，我想知道原因是什么和解决办法。

关于非平稳的问题

您好，我对于时间序列非平稳带来的问题一直存在一些疑惑，时间序列非平稳是指序列的统计信息随时间变化，可是这到底是如何影响神经网络模型的预测呢？神经网络预测输入的是历史序列，输出的是未来序列，只要这个映射关系是存在一定规律的，那么神经网络应该就能够在一定程度上学到这种关系，序列的非平稳是如何影响这种映射关系的呢？为什么平稳的序列就一定更好呢？（白噪声也是一种平稳序列，可是却完全无法预测。

您好，在运行ETTh1和自己数据集的时候都出现了RuntimeError，尝试加了if name == 'main':但是也没用

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\admin\anaconda3\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\admin\anaconda3\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\Users\admin\anaconda3\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\admin\anaconda3\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\Users\admin\anaconda3\lib\runpy.py", line 288, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\Users\admin\anaconda3\lib\runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\Users\admin\anaconda3\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\admin\PycharmProjects\Non-stationary\Nonstationary_Transformers-main\run.py", line 121, in
exp.train(setting)
File "C:\Users\admin\PycharmProjects\Non-stationary\Nonstationary_Transformers-main\exp\exp_main.py", line 120, in train
for i, (batch_x, batch_y, batch_x_mark, batch_y_mark) in enumerate(train_loader):
File "C:\Users\admin\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 435, in iter
return self._get_iterator()
File "C:\Users\admin\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 381, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\admin\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 1034, in init
w.start()
File "C:\Users\admin\anaconda3\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\admin\anaconda3\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\admin\anaconda3\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\admin\anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\admin\anaconda3\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\Users\admin\anaconda3\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

dataset中scale参数

在dataset中不应该做归一化吧在model的forward中已经做归一化了，如果在之前做了归一化那么进入模型的就是归一化之后的平稳数据，计算出的mean、delta也不是原始数据的了。

如何构建自己的数据集

老师您好，我读了您的文章，觉得idea非常好，一定程度上解决了非平稳序列的问题，于是我想在金融汇率数据进行实证，但是在写data_loader.py类时，不太清楚您怎么构造的数据集，所以向您求助，如果您方便的话，可以麻烦您在data_loader.py类更新一些注释，或者在github上进行一些提示吗？十分感谢！

patch

您的序列平稳化和revin有什么区别尼？

怎么让它单gpu跑起来哈，我看了sh文件，发现它是4GPU的哈

real_pred和pred

real_prediction.npy
pred.npy

在咱们这个模型里,最后预测结果取上面那个值合适,多谢.

运行时候出错

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

OT column in datasets

Hello!

Do you know what the OT column in the datsets is? If I understand your experiments correctly, in the "S" case the model gets the OT column as the data? Where as I thought the data would be all the individual columns?

Thanks!
Kashif

Series Stationarization vs. Normalization

Dear authors,

Thanks for making the code available of your interesting paper.

When I was going through the paper, I found the usage of the term "Stationarization" a little confusing. In Section 3.1 you mention

To attenuate the non-stationarity of each input series, we conduct normalization on the temporal dimension (...)

Yet, I am not sure if normalization actually removes any of the non-stationarity. Let me illustrate this using the famous Air Passenger dataset, which is shown below.

As a very sloppy definition "A time series is considered weakly stationary, if it has no trend or seasonality, constant variance over time, and a consistent auto-correlation over time". From the above plot, none of this is true for the data. This is confirmed by the Augmented-Dickey-Fuller (ADF) test

Augmented Dickey-Fuller Test

data:  Monthly Airline Passenger Numbers
Dickey-Fuller = -1.5094, Lag order = 12, p-value = 0.7807
alternative hypothesis: stationary

as well as an Auto-ARIMA

ARIMA(2,1,1)(0,1,0)[12]

If the series is normalized via (y-mu)/sigma, most of the factors that contribute to the non-stationarity are still present in the data, e.g., trend, seasonality, etc. as shown in the following plot.

The fact that normalization does not remove or attenuate non-stationarity is also reflected by the ADF

Augmented Dickey-Fuller Test

data:  Monthly Airline Passenger Numbers Normalized
Dickey-Fuller = -1.5094, Lag order = 12, p-value = 0.7807
alternative hypothesis: stationary

as well as an Auto-ARIMA

ARIMA(2,1,1)(0,1,0)[12]

In fact, the ADF values are exactly the same as for the non-transformed data.

So instead of saying "Stationarization", one should rather use "Normalization". In fact, if the above series is supposed to be stationary, then one would need to use differencing + seasonal differencing as suggested by the Auto-ARIMA.

Have you tried to use (p,P) differencing for the evaluation of your models and compare the results to the proposed instance normalization? Also, using the residuals from a STL-decomposition might get you closer to a stationary series.

Looking very much forward to the discussion.

Visualization code

First,thank you very much for providing such insight for tsf work
and could you please share the visualization code for adf-test code and prediction plot code.

Autoformer Implementation

First, thanks for such a great work.
I saw that actually the ns_Autoformer model does not use the tau and delta - will the code will be updated?
In addition, I have a question about the following line:

Nonstationary_Transformers/ns_layers/AutoCorrelation.py

Line 118 in 2f53d98

corr = corr * tau + delta

In the cross autocorrelation layer, the keys and values (originated from the encoder) are padded with zeros for the FFT process and multiplication. Thus there is no way to use the delta again (as you guys mentioned in the self-attention of the Transformer's decoder) because it will be seq_len size in comparison to the corr size, which is label_len + pred_len.
How did you solve this issue?

Thanks!

dataset size question

Hey, thanks for the impressive work, I just wanna know is this model suit for "daily" time series, for example, some items only have 2 years data(almost 700 timestamps), can this model used on this kind of dataset?
Thanks and hope for reply.

cuda error:No kernal image is availabe

i am facing this error. i dont know how to solve it.

TypeError: 'NoneType' object is not subscriptable

当我选用ns_Transfomer运行时报出TypeError: 'NoneType' object is not subscriptable这样的错误，但选择Transformer没有这样的问题

scaler transformed data

Hello!

If I am not mistaken, you preprocess the time series right at the start with std scaling and then apply this method and I was wondering, does that mean your calculated mu and std would then be close to 0 and 1 respectively? Would you know if this is the case? If you take context windows, due to the std scaling the mu and std would be close to 0 and 1 no?

Also, I believe the metrics due to the std scaling in your table and those of all the papers you compare with are in reality the NMSE and NMAE metrics rather than MSE and MAE

Finally, if one calculates the mu and std and provides these as static covariates to the transformer do you think there is a need for the projection networks?

I have a version of the Std-Scaling, if you are interested (together with static covariates support for mu and std), here https://github.com/kashif/pytorch-transformer-ts/blob/main/ns-transformer/ns-transformer-electricity.ipynb

Looking forward to your insights!

thuml / nonstationary_transformers Goto Github PK

nonstationary_transformers's Introduction

Non-stationary Transformers

Discussions

Architecture

Series Stationarization

De-stationary Attention

Showcases

Preparation

Training scripts

Non-stationary Transformer

Non-stationary framework to promote other Attention-based models

Experiment Results

Main Results

Model Promotion

Future Work

Citation

Contact

Acknowledgement

nonstationary_transformers's People

Contributors

Stargazers

Watchers

Forkers

nonstationary_transformers's Issues

Recommend Projects

Recommend Topics

Recommend Org