zero-coder / fecam Goto Github PK

About Code release for "FECAM: Frequency Enhanced Channel Attention Mechanism for Time Series Forecasting" ⌚

Python 17.93% Shell 1.71% Jupyter Notebook 80.36%

frequency-analysis time-series-forecasting python forecasting-model pytorch time-series deep-learning

fecam's Introduction

FECAM

This is the original pytorch implementation for the following paper: [FECAM: Frequency Enhanced Channel Attention Mechanism for Time Series Forecasting](https://www.sciencedirect.com/science/article/pii/S1474034623002860?dgcid=coauthor).

Arxiv address: If you find this repository useful for your research work, please consider citing it as follows: https://arxiv.org/abs/2212.01209


@article{JIANG2023102158,
title = {FECAM: Frequency enhanced channel attention mechanism for time series forecasting},
journal = {Advanced Engineering Informatics},
volume = {58},
pages = {102158},
year = {2023},
issn = {1474-0346},
doi = {https://doi.org/10.1016/j.aei.2023.102158},
url = {https://www.sciencedirect.com/science/article/pii/S1474034623002860},
author = {Maowei Jiang and Pengyu Zeng and Kai Wang and Huan Liu and Wenbo Chen and Haoran Liu},
keywords = {Time series forecasting, Channel attention, Frequency domain, Gibbs phenomenon, Real world datasets},
abstract = {Time series forecasting (TSF) is a challenging problem in various real-world scenarios, such as industry, energy, weather, traffic, economics, and earthquake warning. TSF demands the model to have a high prediction accuracy. Despite the promising performance of deep learning-based methods in TSF tasks, mainstream forecasting models may sometimes produce results that deviate from the actual ground truth. Our analysis suggests that this may be attributed to the models’ limited ability to capture the frequency information that is abundantly present in real-world datasets. Currently, the Fourier Transform (FT) is the most widely used method for extracting frequency information, but it has some issues that lead to poor model performance, such as high-frequency noise caused by the Gibbs phenomenon and computational overhead of the inverse transformation in the FT-IFT process. To address these issues, we propose a novel frequency enhanced channel attention mechanism (FECAM) that models frequency interdependencies between channels based on Discrete Cosine Transform (DCT), which inherently mitigates the high-frequency noise caused by problematic periodicity during Fourier Transform. This approach improves the model’s capability to extract frequency features and resolves computational overhead concerns that arise from inverse transformations. Our contributions are threefold: (1) We propose a novel frequency enhanced channel attention mechanism that models frequency interdependencies between channels based on DCT, which improves the model’s capability to extract frequency features and resolves computational overhead concerns that arise from inverse transformations; (2) We theoretically prove that our method mitigates the Gibbs phenomenon, which introduces high frequency noise during Fourier Transform. We demonstrate that the result of 1D GAP linearly varies with the lowest frequency component of 1D DCT; (3) We demonstrate the generalization ability of the proposed method FECAM by embedding it into other networks, resulting in significant performance improvements when compared to the original model, with only a minor increase in parameters. Furthermore, we conduct extensive experiments on six different real-world TSF datasets to validate the effectiveness of our proposed model and compare it with several existing state-of-the-art models. Our findings indicate that the FECAM model is superior to these models in terms of accuracy, making it a promising solution for TSF in diverse real-world scenarios. Our codes and data are available at https://github.com/Zero-coder/FECAM.}
}

@misc{jiang2022fecam,
      title={FECAM: Frequency Enhanced Channel Attention Mechanism for Time Series Forecasting}, 
      author={Maowei Jiang and Pengyu Zeng and Kai Wang and Huan Liu and Wenbo Chen and Haoran Liu},
      year={2022},
      eprint={2212.01209},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

Updates

[2022-12-01] FECAM v1.0 is released
[2023-3-23] Model/Linear means FECAM+ linear projection layer,fecam is used for feature extration,and projection layer is for controlling the output length of the prediction
[2023-9-8] Our paper get published on 《Advanced Engineering Informatics》.

Features

Support Six popular time-series forecasting datasets, namely Electricity Transformer Temperature (ETTh1, ETTh2 and ETTm1,ETTm2) , Traffic, National Illness, Electricity and Exchange Rate , ranging from power, energy, finance,illness and traffic domains.
We generalize FECAM into a module which can be flexibly and easily applied into any deep learning models within just few code lines.

To-do items

Integrate FECAM into other mainstream models(eg:Pyraformer,Bi-lstm,etc.) for better performance and higher efficiency on real-world time series.
Validate FECAM on more spatial-temporal time series datasets.
As a sequence modelling module,we believe it can work fine on NLP tasks too,like Machine Translation and Name Entity Recognization.Further more,as a frequency enhanced module it can theoretically work in any deep-learning models like Resnet.

Stay tuned!

Get started

Install the required package first(Mainly including Python 3.8, PyTorch 1.9.0):

    cd FECAM
    conda create -n fecam python=3.8
    conda activate fecam
    pip install -r requirements.txt

Download data. You can obtain all the six benchmarks from Tsinghua Cloud or Google Drive. All the datasets are well pre-processed and can be used easily.
Train the model. We provide the experiment scripts of all benchmarks under the folder ./scripts. You can reproduce the experiment results by:

    sh ./scripts/electricity.sh
    sh ./scripts/ettm2.sh
    sh ./scripts/exchange_rate.sh
    sh ./scripts/ill.sh
    sh ./scripts/traffic.sh
    sh ./scripts/weather.sh

SENET(channel attention)

FECAM(Frequency Enhanced Channel Attention Mechanism)

As a module to enhance the frequency domain modeling capability of transformers and LSTM

Comparison with Transformers and other mainstream forecasting models

Multivariate Forecasting:

FECAM outperforms all transformer-based methods by a large margin.

Univariate Forecasting:

Efficiency

Compared to vanilla models, only a few parameters are increased by applying our method (See Table 4), and thereby their computationalcomplexities can be preserved.

Performance promotion with FECAM module

Visualization

Forecasting visualization:Visualization of ETTm2 and Exchange predictions given by different models.

FECAM visualization:Visualization of frequency enhanced channel attention and output tensor of encoder layer of transformer.x-axis represents channels,y-axis represents frequency from low to high,performing on datasets weather and exchange.

Used Datasets

We conduct the experiments on 6 popular time-series datasets, namely Electricity Transformer Temperature (ETTh1, ETTh2 and ETTm1) and Traffic, Weather,Illness, Electricity and Exchange Rate, ranging from power, energy, finance , health care and traffic domains.

Overall information of the 9 real world datasets

Datasets	Variants	Timesteps	Granularity	Start time	Task Type
ETTh1	7	17,420	1hour	7/1/2016	Multi-step
ETTh2	7	17,420	1hour	7/1/2016	Multi-step
ETTm1	7	69,680	15min	7/1/2016	Multi-step
ETTm2	7	69,680	15min	7/1/2016	Multi-step&Single-step
ILI	7	966	1hour	1/1/2002	Multi-step
Exchange-Rate	8	7,588	1hour	1/1/1990	Multi-step&Single-step
Electricity	321	26,304	1hour	1/1/2012	Multi-step-step
Traffic	862	17,544	1hour	1/1/2015	Multi-step-step
Weather	21	52,695	10min	1/1/2020	Multi-step-step

Dataset preparation

Download data. You can obtain all the six benchmarks from Tsinghua Cloud or Google Drive. All the datasets are well pre-processed and can be used easily.(We thanks Author of Autoformer ,Haixu Wu for sorting datasets and public sharing them.)

The data directory structure is shown as follows.

./
└── datasets/
    ├── electricity
    │   └── electricity.csv
    ├── ETT-small
    │   ├── ETTh1.csv
    │   ├── ETTh2.csv
    │   ├── ETTm1.csv
    │   └── ETTm2.csv
    ├── exchange_rate
    │   └── exchange_rate.csv
    ├── illness
    │   └── national_illness.csv
    ├── traffic
    │   └── traffic.csv
    └── weather
        └── weather.csv

Contact

If you have any questions, feel free to contact us or post github issues. Pull requests are highly welcomed!

Maowei Jiang: [email protected]

Acknowledgements

Thank you all for your attention to our work!

This code uses (Autoformer,Informer, Reformer, Transformer, LSTM,N-HiTS, N-BEATS, Pyraformer, ARIMA) as baseline methods for comparison and further improvement.

We appreciate the following github repos a lot for their valuable code base or datasets:

https://github.com/zhouhaoyi/Informer2020

https://github.com/thuml/Autoformer

https://github.com/cure-lab/LTSF-Linear

https://github.com/zhouhaoyi/ETDataset

https://github.com/laiguokun/multivariate-time-series-data

Thank you for your attention.

fecam's People

Contributors

Stargazers

Watchers

Forkers

wenbochen996 liuhaoran1 zengpengyu-student gby-star lsy-eng jessisesia sunyue1212 likegopher statmixedml dhockaday zhuolinli-shu xizhang-yao szu-study wefwefwef2 coolcodelvs jaeho3690 chasethedeer jinghonh

fecam's Issues

To mathematically prove that the Discrete Cosine Transform (DCT) is more energy efficient than the Discrete Fourier Transform (DFT)

To mathematically prove that the Discrete Cosine Transform (DCT) is more energy efficient than the Discrete Fourier Transform (DFT), we can consider the energy consumption of each transform.

The energy consumption of a signal processing algorithm is typically measured in terms of the number of multiplications and additions required to compute the transform. This is because multiplications and additions are the most basic operations in digital signal processing, and they are the operations that consume the most energy in a digital circuit.

The DCT uses the Fast Cosine Transform (FCT) algorithm to compute the transform, which requires O(N log N) multiplications and additions, where N is the number of samples in the input signal. On the other hand, the DFT uses the brute-force method to compute the transform, which requires O(N^2) multiplications and additions.

Therefore, we can conclude that the DCT is more energy efficient than the DFT because it requires fewer multiplications and additions to compute the transform. This makes the DCT a more practical and energy-efficient choice for many applications where energy consumption is a concern.

您好，对于平均序列长度为3，造成性能变差，您有什么好的建议嘛？

在平均序列长度为15的数据集，我使用了您的模型，提升了很多性能。但是换了平均序列长度为3的数据集，不光没有提升，还变差了。不知道这种情况下，还有没有可能把模型救活？

我将您的模型，放在了transformer输入之前。

关于航空发动机寿命预测问题请教一下

您好之前请教过您关于航空发动机剩余寿命预测的问题

我是想用这个结构来做航空发动机的寿命预测 ,想把您的模块加在Encoder的后边，请问是否可以？在修改代码的过程中，不太确定self.dct_layer = dct_channel_block( ) 和result = self.dct_layer( )的入参是如何确定的？能否请作者讲解一下？谢谢您

How to better apply it to aero-engine life prediction?

Thank you for your work. Now I am working on aero engine life prediction. After seeing your work, I have two ideas. One idea is to improve the structure based on Transformer and add the FECAM module you extracted later. The other idea is replacing the multi-head attention mechanism in the Transformer structure with the FECAM structure you mentioned.What are your suggestion for them？Thank you and look forward to hearing from you!

作者您好，向您请教一个问题

我看您在LSTM和CNN中都写着之前可以跑通，那现在还能跑通吗

experimental effect of FECAM

hello, Thanks for this interesting work. I would like to ask whether the current code is the complete version?

The six data set experiments in this paper have an average increase of about 10% after adding FECAM module. But on my own dataset, I added FECAM on Autoformer and Fedformer, and the effect changed only slightly. I would like to ask what reason might have caused this result.

关于实验效果不理想的问题

您好,非常感谢您的工作,之前已经请教过您一些关于航空发动机寿命预测的问题，您的回答都很详细也很有帮助。之前请教的是模块如何添加并且加在哪里的问题。现在的问题是模块加进去效果不太理想，我不知道要从哪几个方面进行调整。请问您能否给出相关的建议呢？
下面是我的代码以及相关说明:

由于我的数据集的特征是14个，所以我将这里的入参改成14

同时也将这里改成了14

这个是将FECAM加入到了encoder后面，之后又跟了全连接层的代码

期待您的回复谢谢！

作者您好，向您请教一下，这个模块可以适用于单变量的时间序列预测吗

A question about decomposition

@Zero-coder Thanks for your great work. From the perspective of time series decomposition, in addition to Fourier transform, there are also modal decomposition methods, such as empirical mode decomposition, variational mode decomposition, etc. Have you tried using mode decomposition and deep network for fusion?

维度错误

你好，我单独将您代码中的dct函数和dct_channel_block模块摘取出来，输入了三维度的数据：(B, C, L)，但是在dct中进行处理的时候，发现维度对不上，代码中试图在一个二维张量上使用一个三维索引：V = Vc[:, :, 0] * W_r - Vc[:, :, 1] * W_i，其中Vc为一个二维张量。是我传入dct_channel_block的维度不正确吗？

作者您好，请教您一个问题

如果是单变量时间序列预测的话，是不是这个模块强调的各个通道的重要性就不适用了，因为单变量只有一个通道。该模块在单变量的时序预测中起到的只是提取频率信息的作用？

您好

您好，请问在你的代码dctnet的前向传播过程中，
lr_weight = self.dct_norm(stack_dct)
lr_weight = self.fc(stack_dct)
lr_weight = self.dct_norm(lr_weight)
是不是有些问题？
还有就是层归一化对这个模块的影响大吗？

why frequency is important for time series forecasting

Frequency is important for time series forecasting because it helps to determine the patterns and trends within the data. By analyzing the data at different frequencies, such as daily, weekly, or monthly, it is possible to identify seasonal and cyclical patterns, which can be used to make more accurate predictions about future values. Additionally, understanding the frequency of the data allows for more appropriate model selection and the application of appropriate forecasting techniques.

why Discrete cosine transform is efficient than Discrete fourier transform._IAOP

The Discrete Cosine Transform (DCT) is more efficient than the Discrete Fourier Transform (DFT) for several reasons. First, the DCT uses only real numbers, whereas the DFT uses complex numbers, which require more computational resources to process. Second, the DCT uses a basis of sinusoidal functions that are more closely aligned with the characteristics of natural signals, such as images and audio, which allows it to capture the important frequency components of the signal more accurately and with fewer coefficients. Third, the DCT has a faster computational complexity than the DFT, which means it can be computed more quickly for the same amount of data.

Overall, the DCT is more efficient than the DFT for signal processing tasks because it uses real numbers, a more appropriate basis function, and has a faster computational complexity. This makes it particularly well-suited for applications such as image and audio compression, where a high degree of accuracy and efficiency is required.

why dct is more useful than fft in TSF task?

The discrete cosine transform (DCT) is often used in time series forecasting tasks because it has a number of desirable properties that make it well-suited for this type of problem. For example, the DCT is generally more computationally efficient than the Fast Fourier Transform (FFT), which makes it useful for working with large datasets. Additionally, the DCT is better at capturing the local correlations within a time series, which is important for accurately modeling the underlying patterns in the data. Finally, the DCT is more robust to noise than the FFT, which can be useful in situations where the data is noisy or contains outliers. These properties make the DCT a good choice for time series forecasting tasks.

experimental effect of FECAM

hello, Thanks for this interesting work. I would like to ask whether the current code is the complete version?

The six data set experiments in this paper have an average increase of about 10% after adding FECAM module. But on my own dataset, I added FECAM to Autoformer and Fedformer, and the effect changed only slightly. I would like to ask what might have caused this result.

About parameters: Is the channel value the only important parameter that can be adjusted?

代码运行报错

您好，在的decnet.py中运行示例程序，发现在42行出现张量维度不匹配的问题，请问如何修改呢？是程序本身的问题吗？

你好，我是用您的代码出现了这个问题，请问要怎么解决呢

FULL code

When you will release the full script?

modelling in frequency domain is important for deep learning models

Modelling in the frequency domain is important for deep learning models because it allows for the analysis and transformation of the data into a form that is more suitable for the model to process. By performing a Fourier transform on the data, it is possible to decompose the data into its individual frequency components, which can be used to identify patterns and trends within the data. This can then be used to pre-process the data and improve the performance of the deep learning model. Additionally, modelling in the frequency domain allows for the application of appropriate techniques for dealing with seasonality and trends, which can further improve the accuracy of the model's predictions.

why frequency neural network is important

Frequency neural networks are important because they allow for the analysis and processing of data in the frequency domain. By using a neural network that is specifically designed to operate in the frequency domain, it is possible to capture and model complex patterns and relationships within the data that may not be apparent in the time domain. This can result in improved performance and accuracy of the model's predictions, particularly when dealing with data that exhibits non-linear or non-stationary patterns. Additionally, frequency neural networks can be used to effectively handle seasonality and trends within the data, which is important for making accurate forecasts.

zero-coder / fecam Goto Github PK

fecam's Introduction

FECAM

Updates

Features

To-do items

Get started

SENET(channel attention)

FECAM(Frequency Enhanced Channel Attention Mechanism)

As a module to enhance the frequency domain modeling capability of transformers and LSTM

Comparison with Transformers and other mainstream forecasting models

Multivariate Forecasting:

Univariate Forecasting:

Efficiency

Performance promotion with FECAM module

Visualization

Forecasting visualization:Visualization of ETTm2 and Exchange predictions given by different models.

FECAM visualization:Visualization of frequency enhanced channel attention and output tensor of encoder layer of transformer.x-axis represents channels,y-axis represents frequency from low to high,performing on datasets weather and exchange.

Used Datasets

Overall information of the 9 real world datasets

Dataset preparation

Contact

Acknowledgements

fecam's People

Contributors

Stargazers

Watchers

Forkers

fecam's Issues

Recommend Projects

Recommend Topics

Recommend Org