Hi, thanks a lot for your proposed architecture and work! <p dir

Please refer to the comments on Issue <a class="issue-link js-issue-link" data-error-t

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Dimension Shuffle about lstm-fcn HOT 5 CLOSED

Goschjann commented on July 23, 2024 2

Dimension Shuffle

from lstm-fcn.

Comments (5)

titu1994 commented on July 23, 2024

Please refer to the comments on Issue #4.

The dimension shuffle layer is utilized to reduce the univariate time series problem to a multivariate time series problem with one time step. Doing so reduces the model capacity of the LSTM block, therefore it alone is not a strong classifier. It, however, works in conjunction with the FCN block, which is the basic feature extractor.

Our motivation for doing so was multifold - regular LSTM severely overfits the simple classification problems of the UCR dataset and gets much lower accuracy than the SOTA, LSTM with dim shuffle alone severely underfits the task due to reduced capacity, FCN alone gets good performance but not as much as the concatenation of both the FCN and LSTM branch, and the necessity to process sequential information in a fast way without losing all semantics of sequential nature of data.

The Dim shuffle LSTM achieves fast training due to single timestep (for univariate problems, it becomes M time steps where M is the number of variables for a multivariate input time series), and augments the performance of the CNN.

As your work requires a multivariate input (2 variables), I suggest referring to our follow up work - https://arxiv.org/abs/1801.04503 which discusses the extension of this model to multivariate time series classification. The model architecture and training scripts are available at: https://github.com/titu1994/MLSTM-FCN

As to why we chose not to use a bidirectional LSTM, or a stack of LSTMs is because they simply overfit on the simple datasets of UCR, and that the additional capacity leads to the reduced overall performance of the LSTM-FCN model.

from lstm-fcn.

Goschjann commented on July 23, 2024

@titu1994 thanks a lot for your comprehensive answer, highly appreciated!

I started to question your approach, because in my case the dimension shuffle had a negative effect compared to a non-shuffling version! But sure, this can be due to my problem being more complicated than the UCR-problems. In this context (model is already too complex) for sure bilstm/stacking does not make sense from your perspective (though I found it to be performance-improving).

Again, thanks for your time and work!

from lstm-fcn.

shaform commented on July 23, 2024

I have a follow up question: What's the difference between
(1) Dimension shuffle + LSTM with 1 time step
(2) Simply feed the whole time series to fully-connected layers with tanh activation where the input size is the same as the the time steps of the input series.

from lstm-fcn.

titu1994 commented on July 23, 2024

We had similar questions from others as well, which is why we performed an extensive ablation study which can be found here - Insights into LSTM Fully Convolutional Networks for Time Series Classification.

In it, we replaced the dimension shuffled LSTM with dimension shuffled GRU, basic RNN and a fully connected layer with sigmoid activation function (which is similar to your (2), but with sigmoid instead of tanh).

We find that LSTM with dimension shuffle beats the rest in a large majority of cases. In addition, we find that the simple fully connected layer with sigmoid activation performs closer to the LSTM than all the other RNNs.

We used sigmoid, as 3 / 4 of LSTM activations are based on sigmoid, and think that the complex gating of LSTM is what boosts performance compared to a singular fully connected layer with sigmoid / tanh activation.

from lstm-fcn.

philippreis7 commented on July 23, 2024

@titu1994 I am actually working with the MALSTM-FCN Architecture for a classification task and I am impressed by its performance so far. What I want to ask you is, do you think a sliding window would increase the performance? In time-series, it is often necessary to find temporal small patterns, wouldn't a sliding window helping finding them? Have you tesed it?

Thank you in Advance for a quick response!

from lstm-fcn.

Dimension Shuffle about lstm-fcn HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent