Comments (5)
Please refer to the comments on Issue #4.
The dimension shuffle layer is utilized to reduce the univariate time series problem to a multivariate time series problem with one time step. Doing so reduces the model capacity of the LSTM block, therefore it alone is not a strong classifier. It, however, works in conjunction with the FCN block, which is the basic feature extractor.
Our motivation for doing so was multifold - regular LSTM severely overfits the simple classification problems of the UCR dataset and gets much lower accuracy than the SOTA, LSTM with dim shuffle alone severely underfits the task due to reduced capacity, FCN alone gets good performance but not as much as the concatenation of both the FCN and LSTM branch, and the necessity to process sequential information in a fast way without losing all semantics of sequential nature of data.
The Dim shuffle LSTM achieves fast training due to single timestep (for univariate problems, it becomes M time steps where M is the number of variables for a multivariate input time series), and augments the performance of the CNN.
As your work requires a multivariate input (2 variables), I suggest referring to our follow up work - https://arxiv.org/abs/1801.04503 which discusses the extension of this model to multivariate time series classification. The model architecture and training scripts are available at: https://github.com/titu1994/MLSTM-FCN
As to why we chose not to use a bidirectional LSTM, or a stack of LSTMs is because they simply overfit on the simple datasets of UCR, and that the additional capacity leads to the reduced overall performance of the LSTM-FCN model.
from lstm-fcn.
@titu1994 thanks a lot for your comprehensive answer, highly appreciated!
I started to question your approach, because in my case the dimension shuffle had a negative effect compared to a non-shuffling version! But sure, this can be due to my problem being more complicated than the UCR-problems. In this context (model is already too complex) for sure bilstm/stacking does not make sense from your perspective (though I found it to be performance-improving).
Again, thanks for your time and work!
from lstm-fcn.
I have a follow up question: What's the difference between
(1) Dimension shuffle + LSTM with 1 time step
(2) Simply feed the whole time series to fully-connected layers with tanh activation where the input size is the same as the the time steps of the input series.
from lstm-fcn.
We had similar questions from others as well, which is why we performed an extensive ablation study which can be found here - Insights into LSTM Fully Convolutional Networks for Time Series Classification.
In it, we replaced the dimension shuffled LSTM with dimension shuffled GRU, basic RNN and a fully connected layer with sigmoid activation function (which is similar to your (2), but with sigmoid instead of tanh).
We find that LSTM with dimension shuffle beats the rest in a large majority of cases. In addition, we find that the simple fully connected layer with sigmoid activation performs closer to the LSTM than all the other RNNs.
We used sigmoid, as 3 / 4 of LSTM activations are based on sigmoid, and think that the complex gating of LSTM is what boosts performance compared to a singular fully connected layer with sigmoid / tanh activation.
from lstm-fcn.
@titu1994 I am actually working with the MALSTM-FCN Architecture for a classification task and I am impressed by its performance so far. What I want to ask you is, do you think a sliding window would increase the performance? In time-series, it is often necessary to find temporal small patterns, wouldn't a sliding window helping finding them? Have you tesed it?
Thank you in Advance for a quick response!
from lstm-fcn.
Related Issues (20)
- Question about the choice of dropout layer rate. HOT 1
- Attention on only one time step? HOT 1
- Conv input permuted in code, but LSTM in paper? HOT 1
- Can this be used in real time? HOT 1
- Time series data to be fed into CNN HOT 1
- Can not reproduction the test acc in your paper. HOT 1
- How to operate the Mecanism attention model ?
- What is the connection between all_datasets_training.py and hyperparameter_search.py?
- TypeError: ('Keyword argument not understood:', 'use_chrono_initialization') HOT 4
- what is not is_timeseries
- ValueError: Tensor-typed variable initializers must either be wrapped in an init_scope or callable (e.g., `tf.Variable(lambda : tf.truncated_normal([10, 40]))`) when building functions. Please file a feature request if this restriction inconveniences you. HOT 1
- Issue when trying to train model on new dataset
- LSTM-FCN Help
- the size of the outputs of LSTM and FCN HOT 1
- Work with Tensorflow 2.2.0 and built-in Keras: Exception has occurred: ValueError HOT 3
- Hello, I plan to change LSTM into BiLSTM in this network. How can I solve such a mistake in the process? HOT 1
- Error output when I run all_datasets_training.py HOT 1
- AttentionLSTM Exception
- question about the input with one timestep to the LSTM layer HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lstm-fcn.