Giter Club home page Giter Club logo

informer2020's Introduction

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting (AAAI'21 Best Paper)

Python 3.6 PyTorch 1.2 cuDNN 7.3.1 License CC BY-NC-SA

This is the origin Pytorch implementation of Informer in the following paper: Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Special thanks to Jieqi Peng@cookieminions for building this repo.

🚩News(Mar 27, 2023): We will release Informer V2 soon.

🚩News(Feb 28, 2023): The Informer's extension paper is online on AIJ.

🚩News(Mar 25, 2021): We update all experiment results with hyperparameter settings.

🚩News(Feb 22, 2021): We provide Colab Examples for friendly usage.

🚩News(Feb 8, 2021): Our Informer paper has been awarded AAAI'21 Best Paper [Official][Beihang][Rutgers]! We will continue this line of research and update on this repo. Please star this repo and cite our paper if you find our work is helpful for you.



Figure 1. The architecture of Informer.

ProbSparse Attention

The self-attention scores form a long-tail distribution, where the "active" queries lie in the "head" scores and "lazy" queries lie in the "tail" area. We designed the ProbSparse Attention to select the "active" queries rather than the "lazy" queries. The ProbSparse Attention with Top-u queries forms a sparse Transformer by the probability distribution. Why not use Top-u keys? The self-attention layer's output is the re-represent of input. It is formulated as a weighted combination of values w.r.t. the score of dot-product pairs. The top queries with full keys encourage a complete re-represent of leading components in the input, and it is equivalent to selecting the "head" scores among all the dot-product pairs. If we choose Top-u keys, the full keys just preserve the trivial sum of values within the "long tail" scores but wreck the leading components' re-represent.



Figure 2. The illustration of ProbSparse Attention.

Requirements

  • Python 3.6
  • matplotlib == 3.1.1
  • numpy == 1.19.4
  • pandas == 0.25.1
  • scikit_learn == 0.21.3
  • torch == 1.8.0

Dependencies can be installed using the following command:

pip install -r requirements.txt

Data

The ETT dataset used in the paper can be downloaded in the repo ETDataset. The required data files should be put into data/ETT/ folder. A demo slice of the ETT data is illustrated in the following figure. Note that the input of each dataset is zero-mean normalized in this implementation.



Figure 3. An example of the ETT data.

The ECL data and Weather data can be downloaded here.

Reproducibility

To easily reproduce the results you can follow the next steps:

  1. Initialize the docker image using: make init.
  2. Download the datasets using: make dataset.
  3. Run each script in scripts/ using make run_module module="bash ETTh1.sh" for each script.
  4. Alternatively, run all the scripts at once:
for file in `ls scripts`; do make run_module module="bash scripts/$script"; done

Usage

Colab Examples: We provide google colabs to help reproduce and customize our repo, which includes experiments(train and test), prediction, visualization and custom data. Open In Colab

Commands for training and testing the model with ProbSparse self-attention on Dataset ETTh1, ETTh2 and ETTm1 respectively:

# ETTh1
python -u main_informer.py --model informer --data ETTh1 --attn prob --freq h

# ETTh2
python -u main_informer.py --model informer --data ETTh2 --attn prob --freq h

# ETTm1
python -u main_informer.py --model informer --data ETTm1 --attn prob --freq t

More parameter information please refer to main_informer.py.

We provide a more detailed and complete command description for training and testing the model:

python -u main_informer.py --model <model> --data <data>
--root_path <root_path> --data_path <data_path> --features <features>
--target <target> --freq <freq> --checkpoints <checkpoints>
--seq_len <seq_len> --label_len <label_len> --pred_len <pred_len>
--enc_in <enc_in> --dec_in <dec_in> --c_out <c_out> --d_model <d_model>
--n_heads <n_heads> --e_layers <e_layers> --d_layers <d_layers>
--s_layers <s_layers> --d_ff <d_ff> --factor <factor> --padding <padding>
--distil --dropout <dropout> --attn <attn> --embed <embed> --activation <activation>
--output_attention --do_predict --mix --cols <cols> --itr <itr>
--num_workers <num_workers> --train_epochs <train_epochs>
--batch_size <batch_size> --patience <patience> --des <des>
--learning_rate <learning_rate> --loss <loss> --lradj <lradj>
--use_amp --inverse --use_gpu <use_gpu> --gpu <gpu> --use_multi_gpu --devices <devices>

The detailed descriptions about the arguments are as following:

Parameter name Description of parameter
model The model of experiment. This can be set to informer, informerstack, informerlight(TBD)
data The dataset name
root_path The root path of the data file (defaults to ./data/ETT/)
data_path The data file name (defaults to ETTh1.csv)
features The forecasting task (defaults to M). This can be set to M,S,MS (M : multivariate predict multivariate, S : univariate predict univariate, MS : multivariate predict univariate)
target Target feature in S or MS task (defaults to OT)
freq Freq for time features encoding (defaults to h). This can be set to s,t,h,d,b,w,m (s:secondly, t:minutely, h:hourly, d:daily, b:business days, w:weekly, m:monthly).You can also use more detailed freq like 15min or 3h
checkpoints Location of model checkpoints (defaults to ./checkpoints/)
seq_len Input sequence length of Informer encoder (defaults to 96)
label_len Start token length of Informer decoder (defaults to 48)
pred_len Prediction sequence length (defaults to 24)
enc_in Encoder input size (defaults to 7)
dec_in Decoder input size (defaults to 7)
c_out Output size (defaults to 7)
d_model Dimension of model (defaults to 512)
n_heads Num of heads (defaults to 8)
e_layers Num of encoder layers (defaults to 2)
d_layers Num of decoder layers (defaults to 1)
s_layers Num of stack encoder layers (defaults to 3,2,1)
d_ff Dimension of fcn (defaults to 2048)
factor Probsparse attn factor (defaults to 5)
padding Padding type(defaults to 0).
distil Whether to use distilling in encoder, using this argument means not using distilling (defaults to True)
dropout The probability of dropout (defaults to 0.05)
attn Attention used in encoder (defaults to prob). This can be set to prob (informer), full (transformer)
embed Time features encoding (defaults to timeF). This can be set to timeF, fixed, learned
activation Activation function (defaults to gelu)
output_attention Whether to output attention in encoder, using this argument means outputing attention (defaults to False)
do_predict Whether to predict unseen future data, using this argument means making predictions (defaults to False)
mix Whether to use mix attention in generative decoder, using this argument means not using mix attention (defaults to True)
cols Certain cols from the data files as the input features
num_workers The num_works of Data loader (defaults to 0)
itr Experiments times (defaults to 2)
train_epochs Train epochs (defaults to 6)
batch_size The batch size of training input data (defaults to 32)
patience Early stopping patience (defaults to 3)
learning_rate Optimizer learning rate (defaults to 0.0001)
des Experiment description (defaults to test)
loss Loss function (defaults to mse)
lradj Ways to adjust the learning rate (defaults to type1)
use_amp Whether to use automatic mixed precision training, using this argument means using amp (defaults to False)
inverse Whether to inverse output data, using this argument means inversing output data (defaults to False)
use_gpu Whether to use gpu (defaults to True)
gpu The gpu no, used for training and inference (defaults to 0)
use_multi_gpu Whether to use multiple gpus, using this argument means using mulitple gpus (defaults to False)
devices Device ids of multile gpus (defaults to 0,1,2,3)

Results

We have updated the experiment results of all methods due to the change in data scaling. We are lucky that Informer gets performance improvement. Thank you @lk1983823 for reminding the data scaling in issue 41.

Besides, the experiment parameters of each data set are formated in the .sh files in the directory ./scripts/. You can refer to these parameters for experiments, and you can also adjust the parameters to obtain better mse and mae results or draw better prediction figures.



Figure 4. Univariate forecasting results.



Figure 5. Multivariate forecasting results.

FAQ

If you run into a problem like RuntimeError: The size of tensor a (98) must match the size of tensor b (96) at non-singleton dimension 1, you can check torch version or modify code about Conv1d of TokenEmbedding in models/embed.py as the way of circular padding mode in Conv1d changed in different torch versions.

Citation

If you find this repository useful in your research, please consider citing the following papers:

@article{haoyietal-informerEx-2023,
  author    = {Haoyi Zhou and
               Jianxin Li and
               Shanghang Zhang and
               Shuai Zhang and
               Mengyi Yan and
               Hui Xiong},
  title     = {Expanding the prediction capacity in long sequence time-series forecasting},
  journal   = {Artificial Intelligence},
  volume    = {318},
  pages     = {103886},
  issn      = {0004-3702},
  year      = {2023},
}
@inproceedings{haoyietal-informer-2021,
  author    = {Haoyi Zhou and
               Shanghang Zhang and
               Jieqi Peng and
               Shuai Zhang and
               Jianxin Li and
               Hui Xiong and
               Wancai Zhang},
  title     = {Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting},
  booktitle = {The Thirty-Fifth {AAAI} Conference on Artificial Intelligence, {AAAI} 2021, Virtual Conference},
  volume    = {35},
  number    = {12},
  pages     = {11106--11115},
  publisher = {{AAAI} Press},
  year      = {2021},
}

Contact

If you have any questions, feel free to contact Haoyi Zhou through Email ([email protected]) or Github issues. Pull requests are highly welcomed!

Acknowledgments

Thanks for the computing infrastructure provided by Beijing Advanced Innovation Center for Big Data and Brain Computing (BDBC). At the same time, thank you all for your attention to this work! Hits Stargazers repo roster for @zhouhaoyi/Informer2020

informer2020's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

informer2020's Issues

Inconsistent with the original paper?

Hello:

Thank you for your work. During my study I find it confusing that in your code (line 68 in atten.py)

V_sum = V.sum(dim=-2)

you directly sum up V for those flatten attention matrix lines. However, in your original paper you use the average value of V, which is more reasonable for me. Which way is correct?

image

Thank you!

关于模型训练的问题

你好,我在使用程序训练模型时遇到一个情况比较疑惑。虽然设置了训练6个epochs,不过基本上都是只训练了1个epochs,模型的val loss和test loss就无法继续降低了(最多训练2个epochs),最后训练的结果也不算特别好。

我尝试了几个不同的数据集ETTh1,ETTm1和另外两个天气的数据集。参数使用的基本都是默认参数,程序基本上也没有太多改动。

我感觉有可能是哪里我没有做对,希望可以得到一些优化建议,谢谢。

question about the test part result

hi zhouhaoyi,
I am just a machinelearning beginner.I watch your vedio about informer from bilibli. I am curious about the informer so I come here to see the code.
I download the code and run it .but I was confused about the result .
My understanding about the test results represent the 'OT' values .but I just got the values is like this.
-0.22336945 0.01950708 -0.08188418 -0.00959123 -0.34219092 0.19225414 -0.40179282
but the real values is like below.
10.114 3.55 6.183 1.564 3.716 1.462 9.567
I am confused about this .Can you help me ?What's the reason?

关于double数据类型的问题

pytorch默认的是float,速度更快一般精度相对double也差不多,为何代码里都是数据和模型都是double双精度的呢?有其他的考虑吗?谢谢~

BUG:Wrong att-mask in decoder

A BUG when creating model:

AttentionLayer(FullAttention(True, factor, attention_dropout=dropout, output_attention=False),
d_model, n_heads),
AttentionLayer(FullAttention(False, factor, attention_dropout=dropout, output_attention=False),
d_model, n_heads),

The BUG lead the cross-att in decoder using NO casual mask, while self-att using casual mask. Fortunately there is no information leak in informer, but still totally different with what you wrote in the paper.

the weather dataset

The weather dataset's link on your paper has been unavailable for sometime. Is there other ways to download the data or is it possible for you to provide the weather data on github? Thanks

About the best result's hyperparameters

Hi.
Thank you for your amazing paper.
I think your thoughts and this technical structure is so awesome.

I am doing replication your paper, but I have some question.
The prediction(len=336) of Informer's result is very interesting below.
So I want to know below result's more detail hyperparameters.
eg.) seq_len, label_len, pred_len, e_layers, d_layers and other parameters everything.

  1. Can you tell me about the specific hyperparameters?
  2. How can we use this forecasitng system to anomaly detecion?
    (I just want to know your thoughts.)

Thank you. 非常感谢。

image

About ECL dataset in your paper

Congratulations on the best paper award!

We collected the ECL data set and found that it contained a total of 370 clients. However, in the experimental part of the paper, you said "It collects the electricity consumption (Kwh) of 321 clients" and "set ‘MT 320’ as the target value". How do you determine the target value? Can you provide the data of these 321 clients? Thank you。

Reproducing the results for ETTh2

Hello,

Thanks a lot for the publishing your results and code, I enjoyed reading the paper.
While trying to reproduce the paper results, the output was way off especially for the ETTh2 dataset. (Ran it with the same configuration in the Colab notebook)

testing : informer_ETTh2_ftM_sl96_ll48_pl24_dm512_nh8_el3_dl2_df512_atprob_ebtimeF_dtTrue_exp_0<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
test 2857
test shape: (89, 32, 24, 7) (89, 32, 24, 7)
test shape: (2848, 24, 7) (2848, 24, 7)
mse:0.8689931831590327, mae:0.7622690107174594

Could you please let me know if you use a different hyper parameter? or what am i doing wrong .

Thanks in advance.

Regards,
Kiran

A question about scaling data.

In your model code, I find you used data scaler for train\val\test dataset separately. However, I think you probably use future information during validating and testing process. Because, during the online prediction, we can't get the whole data in advance. In addition, I didn't find inverse transformation, which is important to show the real model performance for testing dataset. Can you give more information as to how to deal with data scaling and inverse transformation? Thanks.

如何对时间间隔为1s的自定义序列数据进行预测

感谢大佬们做了如此惊艳的工作!

我想问一下,我的自定义数据集的序列时间间隔是1s, 我看到freq选项好像仅支持到分钟,我要怎么修改才能用于对间隔为秒的序列进行预测呢。

另外我想问一下,预测时,输入的序列可以是不定长度的吗,比如预测时,不论是输入前60秒还是前40秒的轨迹,都固定预测后续60秒的轨迹
提前感谢大佬 :)

A question about "attn.py"?

   At “attn.py” "90 - 95 line" 
   "B, L, H, D = queries.shape
    _, S, _, _ = keys.shape

    queries = queries.view(B, H, L, -1)
    keys = keys.view(B, H, S, -1)
    values = values.view(B, H, S, -1)"

    why it use "view" but not "transpose"?

希望支持将模型转换为torchscript格式

希望能够支持将模型转换为torchscript格式,便于得到更广泛的应用。
我希望能够在Java App中调用训练好的Informer模型, 在了解了JDL库后得知需要先将pth模型转换为torchscript格式
我尝试使用如下代码进行转换

Exp = Exp_Informer
exp = Exp(args) # 使用训练时相同的参数初始化模型
pthfile = './checkpoints/test/checkpoint.pth'

examples = exp.trace() # 为Informer类新增一个方法以便获取forward()所需的参数, 在此例中返回值是一个tuple()
model = exp.model # 获取模型并加载
model.load_state_dict(torch.load(pthfile))

# 尝试推理并转换
traced_script_module = torch.jit.trace(model, examples)
traced_script_module.save("./traced_model.pt")

我得到了如下错误,
File "E:\pythonspace\deep_learning\Informer2020\models\attn.py", line 110, in forward
U_part = self.factor * np.ceil(np.log(L_K)).astype('int').item() # c*ln(L_k)
AttributeError: 'Tensor' object has no attribute 'astype'

https://zhuanlan.zhihu.com/p/146453159
我猜测这是因为使用了numpy 中的np.ceil(), np.log()函数导致的, 我尝试将其替换为torch对应的函数但仍不奏效

# U_part = self.factor * np.ceil(np.log(L_K)).astype('int').item() # 转换中应尽量避免使用np [https://zhuanlan.zhihu.com/p/146453159] 
U_part = self.factor * torch.ceil(torch.log(L_K)).int() # 尝试替换为torch对应的写法

这样改之后错误变成了这样:
torch.jit._trace.TracingCheckError: Tracing failed sanity checks!
encountered an exception while running the Python function with test inputs.
Exception:
log(): argument 'input' (position 1) must be Tensor, not int

希望大佬能指点一下,这里如果想不用np的话应该怎么修改,万分感谢❀❀❀

OOM后继续训练

首先感谢您的分享。

我在训练你的模型时,使用了不同的参数和数据集。有时候训练到一半,会出现OOM,不知道能不能接着上一个checkpoint,继续train这个model,不用重新开始?

我调用train(settings)会重新开始训练。

Poor documentation on how to predict on new data

I trained the model but I have no idea how to use it to predict new data, I would like to pass a csv with two columns: "dates" and "past values" to predict the next values using the model, but from what I understand it is not possible to do this without entering in the source code. The colab example also does not provide any specific code or comment on how to do this.

By the way, what exactly is "batch_x", "batch_y, "batch_x_mark, "batch_y_mark"? Which ones I have to pass to the informer to predict the next values?

args.data in Colab Example

Thank you for the excellent work! and sorry to bother you again. 😃
Since the Colab Example is used for custom data, why should we input the data-name "ETTh1"?
And I found the code d_inp = 4 if data=='ETTh' else 5 in models/embed.py. Does it mean ETTh for (Y, M, W, D) while ETTm needs additional Minute? But this is unfriendly for the custom data, at least a little bit confusing.
Also, I found you add the freq to set the dimensionality of the timestamp, isn't it enough to get the time features?

What I want to say is that I don't understand why parameter data should be sent into the model and DataEmbed Class.
Data is independent of model in my opinion.

RuntimeError when e_layers>3 ?

感谢您的分享。

当我尝试e_layers=4或者更大的时候,训练总会出现错误,无法进行。调用exp.train(settings).

e_layers=3或者更少,都正常。不明白为什么?


RuntimeError Traceback (most recent call last)
in
13 # train
14 print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
---> 15 exp.train(setting)
16
17 # test

~/max/Informer2020/exp/exp_informer.py in train(self, setting)
169 # encoder - decoder
170 if self.args.output_attention:
--> 171 outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
172 else:
173 outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/model.py in forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, enc_self_mask, dec_self_mask, dec_enc_mask)
145 enc_self_mask=None, dec_self_mask=None, dec_enc_mask=None):
146 enc_out = self.enc_embedding(x_enc, x_mark_enc)
--> 147 enc_out, attns = self.encoder(enc_out, attn_mask=enc_self_mask)
148
149 dec_out = self.dec_embedding(x_dec, x_mark_dec)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/encoder.py in forward(self, x, attn_mask)
94 inp_len = inp_len//2
95 continue
---> 96 x, attn = encoder(x[:, -inp_len:, :])
97 x_stack.append(x); attns.append(attn)
98 inp_len = inp_len//2

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/encoder.py in forward(self, x, attn_mask)
65 if self.conv_layers is not None:
66 for attn_layer, conv_layer in zip(self.attn_layers, self.conv_layers):
---> 67 x, attn = attn_layer(x, attn_mask=attn_mask)
68 x = conv_layer(x)
69 attns.append(attn)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/encoder.py in forward(self, x, attn_mask)
43 new_x, attn = self.attention(
44 x, x, x,
---> 45 attn_mask = attn_mask
46 )
47 x = x + self.dropout(new_x)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/attn.py in forward(self, queries, keys, values, attn_mask)
151 keys,
152 values,
--> 153 attn_mask
154 )
155 out = out.view(B, L, -1)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/attn.py in forward(self, queries, keys, values, attn_mask)
109 u = self.factor * np.ceil(np.log(L_Q)).astype('int').item() # c*ln(L_q)
110
--> 111 scores_top, index = self._prob_QK(queries, keys, sample_k=U_part, n_top=u)
112
113 # add scale factor

~/max/Informer2020/models/attn.py in _prob_QK(self, Q, K, sample_k, n_top)
58 # find the Top_k query with sparisty measurement
59 M = Q_K_sample.max(-1)[0] - torch.div(Q_K_sample.sum(-1), L_K)
---> 60 M_top = M.topk(n_top, sorted=False)[1]
61
62 # use the reduced Q to calculate Q_K

RuntimeError: invalid argument 5: k not in range for dimension at /pytorch/aten/src/THC/generic/THCTensorTopK.cu:26

results

运行程序后results文件夹中的trues.npy是三维数据,和原二维数据什么关系呢,还有预测结果preds.npy也是三维的,为什么不是像预测目标‘OT’那样是一列呢,不太理解,请多多指教。

Sample (stochastically) from the model?

Hi,

I was wondering if it is possible to sample from the Informer model?

To be more specific:
During decoding in classic Transformer, each token follows a categorical distribution (i.e. the softmax) and is sampled/generated one by one. I understand that it's an advantage of Informer that this sequential decoding is not needed anymore.
But does this mean that one cannot get diverse samples from the Informer model?
If not, how would one go about generating those samples?

Thanks!

For custom dataset

Congratulations on the best paper award!
I'm new to the transformers with time-series data and just found this cool paper!
It will be great if you could provide an example to show how to use the custom data.
Thanks in advance!

Eq3

Hi,

Thanks for the nice work. I have one question about eq3. To achieve sparsity, the proposed Probsparse attends only to the top-u queries. I am a little bit confused about this design. From my side, it's more rational to attend to partial keys for each query. Could you please elaborate more on this design? Thanks.

on running informer.py file

`Use GPU: cuda:0

start training : informer_ETTh1_ftM_sl96_ll48_pl24_dm512_nh8_el3_dl2_df1024_atprob_ebfixed_test_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8521
val 2857
test 2857
Traceback (most recent call last):
File "main_informer.py", line 69, in
exp.train(setting)
File "/content/Informer2020/exp/exp_informer.py", line 157, in train
outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/Informer2020/models/model.py", line 71, in forward
dec_out = self.decoder(dec_out, enc_out, x_mask=dec_self_mask, cross_mask=dec_enc_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/Informer2020/models/decoder.py", line 46, in forward
x = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/Informer2020/models/decoder.py", line 23, in forward
attn_mask=x_mask
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/Informer2020/models/attn.py", line 141, in forward
attn_mask
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/content/Informer2020/models/attn.py", line 25, in forward
attn_mask = TriangularCausalMask(B, L, device=queries.device)
File "/content/Informer2020/utils/masking.py", line 7, in init
self._mask = torch.triu(torch.ones(mask_shape, dtype=torch.bool), diagonal=1).to(device)
RuntimeError: "triu" not implemented for 'Bool'`

Can't use multi-gpu (Colab example)

Hi, Thanks for your great work.

I just tried to run through provided Colab example using multi-gpu (0,1,2,3).

By changing use_multi_gpu from False to True, I got AssertionError: Invalid device id.

Does that mean Colab refuses to offer 4 GPUs or there are just some bugs?

By the way, I want to make sure whether my problem can be solved by the Informer model.

Let's say my custom dataset has 1 column of date and N columns of features and 1 target column that is numerical, and the goal is to predict the value of the target column using other columns in the same row.

Does this kind of problem called ''Multivariate predict Univariate'' so I choose 'MS' in args.features?

Thanks again.

perf test

First of all, I like the idea in calcing the attention matrix and it should work.

I then did a small performance test and it seemed the result didn't add up... appreciated if anyone could shed some lights on this.

The code snippet is provided as below. I basically removed anything else except the attention layer, and changed the layer depth to 5.

=================RESULTS=======================
When using full attention: 0.008 sec per forward and 18.8 GRAM
When using prob attention: 0.21 sec per forward and >20 GRAM

=================ENV===========================
UBUNTU: 20.04
NVIDIA DRIVER: 460.39
CUDA: 11.03
PYTORCH: 1.7.1
GPU: RTX3090
=================CODE==========================

class InformerSpeedTest(nn.Module):
    def __init__(self, enc_in, dec_in, c_out, seq_len, label_len, out_len,
                factor=5, d_model=512, n_heads=8, e_layers=5, d_layers=2, d_ff=512,
                dropout=0.0, attn='prob', embed='fixed', data='ETTh', activation='gelu', 
                device=torch.device('cuda:0')):
        super(InformerSpeedTest, self).__init__()
        self.attn = attn

        # Attention
        Attn = ProbAttention if attn=='prob' else FullAttention
        # Encoder
        self.encoder = Encoder(
            [
                EncoderLayer(
                    AttentionLayer(Attn(False, factor, attention_dropout=dropout), 
                                d_model, n_heads),
                    d_model,
                    d_ff,
                    dropout=dropout,
                    activation=activation
                ) for l in range(e_layers)
            ]
        )

    def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, enc_self_mask=None, dec_self_mask=None, dec_enc_mask=None):
        enc_out = self.encoder(x_enc, attn_mask=enc_self_mask)
        return enc_out

if __name__ == "__main__":
    from tqdm import tqdm
    import time
    parser = argparse.ArgumentParser()
    parser.add_argument("--prob", action="store_true")
    args = parser.parse_args()
    rount = 50
    batch_size = 128
    length = 512
    d_model = 512
    device = "cuda:0"
    attn = None

    if args.prob:
        print("Use prob")
        attn = "prob"
    else:
        print("Use full")

    test = InformerSpeedTest(None, None, None, None, None, None, attn=attn).to(device)
    test.train() #test.eval()

    for i in range(rount):
        print(f"Round: {i}")
        x = torch.randn(batch_size, length, d_model).to(device)
        s = time.time()
        test(x, None, None, None)
        print(f"Cost: {time.time() - s:.6f}s")

ValueError: could not convert string to float:

Hi
Same script that was working this morning now gives me :

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-8eee1c4218f9> in <module>()
     14     target=args.target,
     15     timeenc=timeenc,
---> 16     freq=args.freq
     17 )
     18 data_loader = DataLoader(

5 frames
/content/Informer2020/data/data_loader.py in __init__(self, root_path, flag, size, features, data_path, target, scale, timeenc, freq)
    216         self.root_path = root_path
    217         self.data_path = data_path
--> 218         self.__read_data__()
    219 
    220     def __read_data__(self):

/content/Informer2020/data/data_loader.py in __read_data__(self)
    244         if self.scale:
    245             train_data = df_data[border1s[0]:border2s[0]]
--> 246             self.scaler.fit(train_data.values)
    247             data = self.scaler.transform(df_data.values)
    248         else:

/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_data.py in fit(self, X, y)
    667         # Reset internal state before fitting
    668         self._reset()
--> 669         return self.partial_fit(X, y)
    670 
    671     def partial_fit(self, X, y=None):

/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_data.py in partial_fit(self, X, y)
    698         X = check_array(X, accept_sparse=('csr', 'csc'),
    699                         estimator=self, dtype=FLOAT_DTYPES,
--> 700                         force_all_finite='allow-nan')
    701 
    702         # Even in the case of `with_mean=False`, we update the mean anyway

/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    529                     array = array.astype(dtype, casting="unsafe", copy=False)
    530                 else:
--> 531                     array = np.asarray(array, order=order, dtype=dtype)
    532             except ComplexWarning:
    533                 raise ValueError("Complex data not supported\n"

/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

ValueError: could not convert string to float: 

Note : args.freq = 'h' # freq for time features encoding

error:report

When I used 'informerstack' instead of informer in command below:
python -u main_informer.py --model informerstack --data ETTm1 --attn prob --freq t

an error occured:
(informer) lizhaorui@server-System-Product-Name:~/DL/Informer2020$ python -u main_informer.py --model informerstack --data ETTh1 --attn prob --freq h
Args in experiment:
Namespace(activation='gelu', attn='prob', batch_size=32, c_out=7, checkpoints='./checkpoints/', d_ff=2048, d_layers=1, d_model=512, data='ETTh1', data_path='ETTh1.csv', dec_in=7, des='test', device_ids=[0, 1], devices='0,1', distil=True, dropout=0.05, dvices='0,1', e_layers=2, embed='timeF', enc_in=7, factor=5, features='M', freq='h', gpu=0, itr=2, label_len=48, learning_rate=0.0001, loss='mse', lradj='type1', model='informerstack', n_heads=8, num_workers=0, output_attention=False, patience=3, pred_len=48, root_path='./data/ETT/', seq_len=512, target='OT', train_epochs=6, use_amp=False, use_gpu=True, use_multi_gpu=True)
Use GPU: cuda:0

start training : informerstack_ETTh1_ftM_sl512_ll48_pl48_dm512_nh8_el2_dl1_df2048_atprob_fc5_ebtimeF_dtTrue_test_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8081
val 2833
test 2833
Traceback (most recent call last):
File "main_informer.py", line 91, in
exp.train(setting)
File "/home/lizhaorui/DL/Informer2020/exp/exp_informer.py", line 199, in train
outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lizhaorui/DL/Informer2020/models/model.py", line 147, in forward
enc_out, attns = self.encoder(enc_out, attn_mask=enc_self_mask)
File "/home/lizhaorui/anaconda3/envs/informer/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lizhaorui/DL/Informer2020/models/encoder.py", line 99, in forward
x_stack = torch.cat(x_stack, -2)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:5925 [kernel]
CUDA: registered at /pytorch/build/aten/src/ATen/RegisterCUDA.cpp:7100 [kernel]
QuantizedCPU: registered at /pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:641 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradCPU: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradCUDA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradXLA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradNestedTensor: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradPrivateUse1: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradPrivateUse2: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
AutogradPrivateUse3: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:9122 [autograd kernel]
Tracer: registered at /pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:10525 [kernel]
Autocast: registered at /pytorch/aten/src/ATen/autocast_mode.cpp:254 [kernel]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

关于算法复杂度

attn.py 48行:

K_expand = K.unsqueeze(-3).expand(B, H, S, L, E)

这一步空间复杂度有点高,超过了原始的transformer ,是不是有更好的实现方式呢?

Concatenating Feature maps from encoder

Hi,

I really appreciate you helping others to understand the code by proactively replying to the questions.

While going through the paper and the figure in the paper, it is mentioned that the output from the encoder is a concatenated feature map from each stack.

"we concatenate all the stacks’ outputs and have the final hidden representation of encoder."

But in the code, it seems like only the final encoder representation is used as an input to the decoder (without concatenating final encoder representation with lower level embedding representations.)

Could you please clarify why do I notice this discrepancy?

Thanks in advance.

Error of tensor dimension

After executing the following command:
python -u main_informer.py --model informer --data ETTh1

with your downloaded data, I get the following error:
Use GPU: cuda:0

start training : informer_ETTh1_ftM_sl96_ll48_pl24_dm512_nh8_el3_dl2_df1024_atprob_ebfixed_test_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 8521
val 2857
test 2857
Traceback (most recent call last):
File "main_informer.py", line 69, in
exp.train(setting)
File "C:\Users\User\Informer2020\exp\exp_informer.py", line 157, in train
outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
File "C:\Users\User\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\User\Informer2020\models\model.py", line 67, in forward
enc_out = self.enc_embedding(x_enc, x_mark_enc)
File "C:\Users\User\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\User\Informer2020\models\embed.py", line 95, in forward
x = self.value_embedding(x) + self.position_embedding(x) + self.temporal_embedding(x_mark)
RuntimeError: The size of tensor a (98) must match the size of tensor b (96) at non-singleton dimension 1

Dataloaders for ECL and Weather

Hello everyone,
first of all, thank you for your amazing paper!
We are currently trying to reproduce your results and want to run the experiments on the weather and ECL datasets. Would it be possible for you to also publish the two data loaders you used for those particular datasets to keep the preprocessing consistent.
Thanks a lot in advance!

模型测试结果

我尝试着用script里的参数去调试数据,例如
python -u main_informer.py --model informer --data ETTh2 --features S --seq_len 336 --label_len 336 --pred_len 168 --e_layers 2 --d_layers 1 --attn prob --des 'Exp' --itr 5
测试结果是mse:0.25672146677970886, mae:0.43616965413093567
和您更新的结果差的不多
image
但是图像的预测结果和真实的值相差的很多。请问是我有漏调的参数还是结果就是这样的。
image

Informer.ipynb - Colaboratory.pdf

非常感谢。

decoder without prob attention

论文整体的模型结构图里decoder部分是有prob attention的,但是看代码实现都是使用的full attention
DecoderLayer(
AttentionLayer(FullAttention(True, factor, attention_dropout=dropout, output_attention=False),
d_model, n_heads),
AttentionLayer(FullAttention(False, factor, attention_dropout=dropout, output_attention=False)
image
想问下为什么这么做呢

About the inconsistance of ProbSparse self-attention implementation

image
Thanks for your great work first.

I find in the code att.py line 111:
scores_top, index = self._prob_QK(queries, keys, u, U)
it seems that U is set to n_top and it should be u in the paper?

and U is set to cln(n) instead of mln(n) in line 108:
U = self.factor * np.ceil(np.log(S)).astype('int').item()

I don't know if I misunderstand this and hope for your reply.

EarlyStopping's occurence

Hello,

I am currently trying to run your code to see how it works, but every time the code terminates too soon based on EarlyStopping. The result MSE and MAE were also quite off compared with the results shown here. I have had no involvement with almost any programming-related things for a long time so my knowledge is too limited to solve the problem myself. With that being said, I did try to set the EarlyStopping patience to 100 instead, but the code still ended on its own despite saying that EarlyStopping counter is at 3 out of 100. Also, at the start, the code would show that Use GPU: cuda: 0, which made me concerned that if the training was done on CPU at first, but when I checked with the Task Manager the GPU use was at almost 100%, so I believed it was fine, but the fact that the code terminates itself too early every time still makes me wonder if it is using the GPU properly. It would be great if you could provide me some help on this.

In case if any information on specs are needed:
OS: Windows Server 2019 64-bit
Processor: Intel Xeon CPU @ 2.20GHz 2.20GHz
Memory: 30GB
GPU: Nvidia Tesla V100

Thank you in advance. Let me know if there is any additional information you need.

IndexError when using 'learned' or 'fixed' in args.embed

args.model = 'informerstack' # model of experiment, options: [informer, informerstack, informerlight(TBD)]

args.data = 'custom' # data
args.root_path = './' # root path of data file
args.data_path = 'test.csv' # data file
args.features = 'S' # forecasting task, options:[M, S, MS(TBD)]; M:multivariate predict multivariate, S:univariate predict univariate, MS:multivariate predict univariate
args.target = 'target' # target feature in S or MS task
args.freq = 't' # freq for time features encoding

args.seq_len = 128 # input sequence length of Informer encoder
args.label_len = 96 # start token length of Informer decoder
args.pred_len = 15 # prediction sequence length

args.enc_in = 1 # encoder input size number of features in input
args.dec_in = 1 # decoder input size number of features
args.c_out = 7 # output size output dimension before FN
args.factor = 5 # probsparse attn factor
args.d_model = 512 # dimension of model
args.n_heads = 8 # num of heads
args.e_layers = 3 # num of encoder layers
args.d_layers = 2 # num of decoder layers
args.d_ff = 512 # dimension of fcn in model
args.dropout = 0.05 # dropout
args.attn = 'full' # attention used in encoder, options:[prob, full]
args.embed = 'fixed' # time features encoding, options:[timeF, fixed, learned]
args.activation = 'gelu' # activation
args.distil = True # whether to use distilling in encoder
args.output_attention = False # whether to output attention in ecoder

args.batch_size = 64
args.learning_rate = 0.0001 ## 0.0001
args.loss = 'mse'
args.lradj = 'type1'

args.num_workers = 0
args.itr = 1
args.train_epochs = 6
args.patience = 3
args.des = 'exp'

我用以上参数训练,结果报错:


IndexError Traceback (most recent call last)
in
9 # train
10 print('>>>>>>>start training : {}>>>>>>>>>>>>>>>>>>>>>>>>>>'.format(setting))
---> 11 exp.train(setting)
12
13 # test

~/max/Informer2020/exp/exp_informer.py in train(self, setting)
169 # encoder - decoder
170 if self.args.output_attention:
--> 171 outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
172 else:
173 outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/model.py in forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, enc_self_mask, dec_self_mask, dec_enc_mask)
144 def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec,
145 enc_self_mask=None, dec_self_mask=None, dec_enc_mask=None):
--> 146 enc_out = self.enc_embedding(x_enc, x_mark_enc)
147 enc_out, attns = self.encoder(enc_out, attn_mask=enc_self_mask)
148

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/embed.py in forward(self, x, x_mark)
105
106 def forward(self, x, x_mark):
--> 107 x = self.value_embedding(x) + self.position_embedding(x) + self.temporal_embedding(x_mark)
108
109 return self.dropout(x)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~/max/Informer2020/models/embed.py in forward(self, x)
75 x = x.long()
76
---> 77 minute_x = self.minute_embed(x[:,:,4]) if hasattr(self, 'minute_embed') else 0.
78 hour_x = self.hour_embed(x[:,:,3])
79 weekday_x = self.weekday_embed(x[:,:,2])

IndexError: index 4 is out of bounds for dimension 2 with size 4

如果使用'timeF',则一切正常。

about the distilling process

From your paper I read "To enhance the robustness of the distilling operation, we build halving replicas of the main stackand progressively decrease the number of self-attention dis-tilling layers by dropping one layer at a time, like a pyramidin Fig.(3), such that their output dimension is aligned. Thus,we concatenate all the stacks’ outputs and have the final hid-den representation of encoder." Somehow I failed to locate the corresponding code sections (also as black-boxed in pic below), though I do notice maxpool here which I believe is part of the upper portion op:

self.maxPool = nn.MaxPool1d(kernel_size=3, stride=2, padding=1)

Any clue?

image

Colab error on ETTm1

Hi,

I was trying to run the model on the provided Colab with the ETTm1 dataset.
And I run into the error RuntimeError: mat1 dim 1 must match mat2 dim 0:

/content/Informer2020/exp/exp_informer.py in train(self, setting)
    171                     outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
    172                 else:
--> 173                     outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
    174 
    175                 f_dim = -1 if self.args.features=='MS' else 0

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/content/Informer2020/models/model.py in forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, enc_self_mask, dec_self_mask, dec_enc_mask)
     67     def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, 
     68                 enc_self_mask=None, dec_self_mask=None, dec_enc_mask=None):
---> 69         enc_out = self.enc_embedding(x_enc, x_mark_enc)
     70         enc_out, attns = self.encoder(enc_out, attn_mask=enc_self_mask)
     71 

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/content/Informer2020/models/embed.py in forward(self, x, x_mark)
    105 
    106     def forward(self, x, x_mark):
--> 107         x = self.value_embedding(x) + self.position_embedding(x) + self.temporal_embedding(x_mark)
    108 
    109         return self.dropout(x)

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/content/Informer2020/models/embed.py in forward(self, x)
     92 
     93     def forward(self, x):
---> 94         return self.embed(x)
     95 
     96 class DataEmbedding(nn.Module):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py in forward(self, input)
     91 
     92     def forward(self, input: Tensor) -> Tensor:
---> 93         return F.linear(input, self.weight, self.bias)
     94 
     95     def extra_repr(self) -> str:

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
   1690         ret = torch.addmm(bias, input, weight.t())
   1691     else:
-> 1692         output = input.matmul(weight.t())
   1693         if bias is not None:
   1694             output += bias

RuntimeError: mat1 dim 1 must match mat2 dim 0

All I changed was:

args.data = 'ETTm1' # data

Am I missing anything in the configuration?

How use the network to predict future data.

I don't understand how to use the net to predict future data, there’s no example code to predict unseen data, I want to pass X previous values and predict the next values.

In oder to change the order of dimensions, why is torch.tensor.view used??

Hallo,

thank you for making the code public.

In the code (forward in ProbAttention), you used the tensor.view to change the dimensions.

queries = queries.view(B, H, L_Q, -1)
keys = keys.view(B, H, L_K, -1)
values = values.view(B, H, L_K, -1)

Why do not use permute?Dose View function not break the relationship between heads?
Is the goal of using tensor.view here to change the order of dimensions? If it is, why tensor.view.?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.