Giter Club home page Giter Club logo

lstm-fcn's Introduction

LSTM FCN for Time Series Classification

LSTM FCN models, from the paper LSTM Fully Convolutional Networks for Time Series Classification, augment the fast classification performance of Temporal Convolutional layers with the precise classification of Long Short Term Memory Recurrent Neural Networks.

Multivariate LSTM-FCN for Time Series Classification

General LSTM-FCNs are high performance models for univariate datasets. However, on multivariate datasets, we find that their performance is not optimal if applied directly. Therefore, we introduce Multivariate LSTM-FCN (MLSTM-FCN) for such datasets.

Paper: Multivariate LSTM-FCNs for Time Series Classification
Repository: MLSTM-FCN

Ablation Study of LSTM-FCN for Time Series Classification

Over the past year there have been several questions that have been raised by the community about the details of the model such as :

  • Why we chose to augment a Fully Convolutional Network with an LSTM?
  • What is dimension shuffle actually doing?
  • After dimension shuffle, does the LSTM simply lose all recurrent behaviour?
  • Why not replace the LSTM by another RNN such as GRU?
  • Whether there is any actual improvement to be obtained from this augmentation?

We therefore perform a detailed ablation study, composing nearly 3,627 experiments that attempt to analyse and answer these questions and to provide a better understanding of the LSTM-FCN/ALSTM-FCN time series classification model and each of its sub-module.

The paper, titled Insights into LSTM Fully Convolutional Networks for Time Series Classification can be read for a thorough discussion and statistical analysis of the benefit of the Dimension Shuffled LSTM to the Fully Convolutional Network.

Paper: Insights into LSTM Fully Convolutional Networks for Time Series Classification Repository: LSTM-FCN-Ablation

Installation

Download the repository and apply pip install -r requirements.txt to install the required libraries.

Keras with the Tensorflow backend has been used for the development of the models, and there is currently no support for Theano or CNTK backends. The weights have not been tested with those backends.

The data can be obtained as a zip file from here - http://www.cs.ucr.edu/~eamonn/time_series_data/

Extract that into some folder and it will give 127 different folders. Copy paste the util script extract_all_datasets.py to this folder and run it to get a single folder _data with all 127 datasets extracted. Cut-paste these files into the Data directory.

Note : The input to the Input layer of all models will be pre-shuffled to be in the shape (Batchsize, 1, Number of timesteps), and the input will be shuffled again before being applied to the CNNs (to obtain the correct shape (Batchsize, Number of timesteps, 1)). This is in contrast to the paper where the input is of the shape (Batchsize, Number of timesteps, 1) and the shuffle operation is applied before the LSTM to obtain the input shape (Batchsize, 1, Number of timesteps). These operations are equivalent.

Training and Evaluation

All 127 UCR datasets can be evaluated with the provided code and weight files. Refer to the weights directory for clarification.

There is now exactly 1 script to run all combinations of the LSTM-FCN, and its Attention variant, on the three different Cell combinations (8, 64, 128), on all 127 datasets in a loop.

  • To use the LSTM FCN model : model = generate_lstmfcn()
  • To use the ALSTM FCN model : model = generate_alstmfcn()

Training

Training now occurs in the innermost loop of the all_datasets_training.py.

A few parameters must be set in advance :

  • Datasets: Datasets must be listed as a pair (dataset name, id). The (name, id) pair for all 127 datasets has been preset. They correspond to the ids inside constants.py inside the utils directory. `

  • Models : Models in the list must be defined as a (model_name, model_function) pair. Please note : The model_function must be a model that returns a Keras Model, not an actual Model itself. The model_function can accept 3 parameters - maximum sequence length, number of classes and optionally the number of cells.

  • Cells : The configurations of cells required to be trained over. The default is [8, 64, 128], corresponding to the paper.

After this, once training begins, each model will trained according to specificiation and log files will be written to which describe all the parameters for convenience along with the training and testing set accuracy at the end of training.

Weight files will automatically be saved in the correct directories and can be used for later analysis.

Training Inner-loop

To train the a model, uncomment the line below and execute the script. Note that '???????' will already be provided, so there is no need to replace it. It refers to the prefix of the saved weight file. Also, if weights are already provided, this operation will overwrite those weights.

train_model(model, did, dataset_name_, epochs=2000, batch_size=128,normalize_timeseries=normalize_dataset)

Evaluate Inner-loop

To evaluate the performance of the model, simply execute the script with the below line uncommented.

evaluate_model(model, did, dataset_name_, batch_size=128,normalize_timeseries=normalize_dataset)

Evaluate

There is no seperate script for evaluation. In order to re-evaluate trained models, please comment out the train_model function in the inner-most loop.

Visualization

Due to the automatic name generation of folders and weight paths, careful selection of 3 common parameters will be required for all of the visualizations below:

  • DATASET_ID: The unique integer id inside constants.py referring to the dataset.

  • num_cells: The number of LSTM / Attention LSTM Cells.

  • model: The model function used to build the corresponding Keras Model.

Next is the selection of the dataset_name and model_name. The dataset_name must match the name of the dataset inside the all_dataset_traning.py script. Similarly, the model_name must match the name of the model in MODELS inside all_dataset_training.py.

Filters Visualization

To visualize the output of the Convolution filters of either the LSTMFCN or the Attention LSTMFCN, utilize the visualize_filters.py script.

There are two parameters, CONV_ID which refers to the convolution block number (and therefore ranges from [0, 2]) and FILTER_ID whose value dictates which filters of the convolution layer is selected. Its range depends on the CONV_ID selected, rangeing from [0, 127] for CONV_ID = {0, 2} and [0, 255] for CONV_ID = 1.

Context Visualization

To visualize the context vector of the Attention LSTM module, please utilize the visualize_context.py script.

To generate the context over all samples in the dataset, modify LIMIT=None. Setting VISUALIZE_CLASSWISE=False is also recommended to speed up the computation. Note that for the larger datasets, generation of the image may take exorbitant amounts of time, and the output may not be pleasant. We suggest visualizing classwise with 1 sample per class instead, as shown above.

Class Activation Maps

To visualize the class activation map of the final convolution layer, execute the visualize_cam.py. The class of the input signal being visualized can be changed by changing the CLASS_ID from (0 to NumberOfClasses - 1).

Results

Results Based on Test Validation Checkpoint

Results Based on Minimum Training Loss

Critical Difference Diagram

Wilcoxson Signed Rank Test - Statistical Test

After applying a Dunn-Sidak Correction, we compare the p-value table to an alpha level of 0.00465. Results show ALSTM, LSTM, and the Ensemble Methods (COTE and EE) are statistically the same.

Citation

@article{karim2018lstm,
  title={LSTM fully convolutional networks for time series classification},
  author={Karim, Fazle and Majumdar, Somshubra and Darabi, Houshang and Chen, Shun},
  journal={IEEE Access},
  volume={6},
  pages={1662--1669},
  year={2018},
  publisher={IEEE}
}

lstm-fcn's People

Contributors

fazlekarim avatar titu1994 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lstm-fcn's Issues

Test set used as validation set in callbacks

it seems like the code is using the test set as a validation set (which is not the problem itself):

class_weight=class_weight, verbose=2, validation_data=(X_test, y_test))

However, this becomes a problem as 'monitor' is set to 'val_acc' for both model checkpointing and adjusting the learning rate:

monitor='val_acc', save_best_only=True, save_weights_only=True)

I.e., the best model (model checkpoint) is chosen based on the test set accuracy.

reduce_lr = ReduceLROnPlateau(monitor='val_acc', patience=100, mode='max',

I.e., the learning rate is tuned using the test set accuracy.

Thus, monitor has to be set to monitor='acc' oder monitor='loss' to avoid any test data leakage.
See Callbacks

Only one time step?

There is no doubt that the performance of your models is very good. But I have a little question.


The input shape of RNN in keras is that

3D tensor with shape (batch_size, timesteps, input_dim).
keras RNN

The follow is your codes

ip = Input(shape=(1, MAX_SEQUENCE_LENGTH))
x = LSTM(8)(ip)
x = Dropout(0.8)(x)

Is there only one timestep in your model, but "sequence length" input dim. So whether this can be directly considered as a fully connected layer represented by a one-timestep LSTM layer?

Severe overfitting

image
I don't know the cause of the problem,maybe the data ? I use your model to train my own data., and my X_train size is (8800,128,2), it's a signal data,and the label is (8800,11) ,it's a classfication problem .I change your code like this:
image

if name == "main":
model = generate_model()
from keras.callbacks import EarlyStopping, ModelCheckpoint

early_stopping = EarlyStopping(patience=20)
checkpointer = ModelCheckpoint('vgg19model_best.h5', verbose=1, save_best_only=True)


model.compile(loss='categorical_crossentropy',
              optimizer=Adam(lr=0.001),

              metrics=['accuracy'])

print('Train...')
model.fit(x_train, train_label,
          batch_size=batch_size,
          nb_epoch=500,
          callbacks=[early_stopping, checkpointer],
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size) 

It can run directly,but the result is the first picture. Could you please give me some advice?

fine tune

Hello, I saw a note in the code of LSTM-FCN that the fine-tuning model can be added, but the input data is one-dimensional. How to use a picture classification model like VGG16 for fine-tuning?

How to operate the Mecanism attention model ?

Hi,

I give in input exactly the same dataset as for the simple model that works, but the Attention mechanism model doesn't work.

I give the algorithm a dataset of 336 features, and it ask me for 336*336 input.

Can you help me please.

LSTM-FCN Help

I'm Raoult, a new python developer. I would like to simulate code on python. It shows me this error, please can you help me solve it in the code.
Thank you for your assistance.

xception has occurred: MemoryError
Unable to allocate 574. TiB for an array with shape (30734400, 5132157) and data type float32
File "C:\Test1\utils\keras_utils.py", line 133, in train_model
y_train = to_categorical(y_train, len(np.unique(y_train)))
File "C:\Test1\acvitivity_model.py", line 196, in
train_model(model, DATASET_INDEX, dataset_prefix='activity_attention', epochs=1000, batch_size=128)

Attention on only one time step?

Since there will only be one time step, I am curious about what the attention mechanism will do.
According to my understanding, the attention process is to perform attention on the output of each timestamp, but since there is only one time step, it makes the difference between attention and normal LSTM strange.

Am I missing anything here? Thank you!

Dimension Shuffle

Hi,

thanks a lot for your proposed architecture and work!

I fully get your idea and want to refer to and include your architecture in my thesis.

The only thing I do not get: why do you use this dimension shuffle layer before feeding data in the lstm? Doesn't that cause the lstm to fully neglect the timely information from the time series?

Another thing that came to my mind: did you try/ think about using bilstm or using >1 lstm layers? Is there a special reason why you haven't tried those?

I am working with ts from 2 variables with a length of 1200 time ticks each (sensor data).

Glad for any response, folks!

Issue when trying to train model on new dataset

Hello, I am trying to train the model on a new dataset. When I do so, however, I run into this error: ValueError: Error when checking target: expected dense_1 to have shape (60684,) but got array with shape (119278,). How do I fix this error? Thank you very much!

what is not is_timeseries

Hi,Thank you very much for the code you provided. I want to test my data with this model. I found that there is a judgment of whether it is time series data when loading data. So, I want to ask what is not the time series data.

Can not reproduction the test acc in your paper.

Hello @titu1994 ,
I have tried several times to approach the SOTA results you offered in the paper, but failed. event I can not pass the strong baseline score. My GPU is GTX1080 under Ubuntu18.04LTS.

example of my result(hyperparameter_search.py script):
dataset_id,dataset_name,dataset_name_,test_accuracy
0,Adiac,lstmfcn_128_cells_weights/Adiac,0.749361
2,ChlorineConcentration,lstmfcn_128_cells_weights/ChlorineConcentration,0.815625
dataset_id,dataset_name,dataset_name_,test_accuracy
1,ArrowHead,lstmfcn_8_cells_weights/ArrowHead,0.811429
Can you give me some suggestions ?

Work with Tensorflow 2.2.0 and built-in Keras: Exception has occurred: ValueError

Hi Titu!
First of all many thanks for sharing RNN for TSC. It's very useful!

I'm using Tensorflow 2.2.0 and built-in Keras, so slightly change the code (for example, using 'from tensorflow.keras.models import Model' instead of 'from keras.models import Model')
When i'm trying to run all_datasets_training.py, I've got next error in keras_utils.py (both Windows and Linux):

Exception has occurred: ValueError
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

It occurs in train_model function within row:
model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, callbacks=callback_list,
class_weight=class_weight, verbose=2, validation_data=(X_test, y_test))

The shape of input data (GunPoint):
X_train [50, 1 , 150]
X_test [150, 1, 150]

What am I do wrong?
image_2020_07_03T12_47_01_333Z

Question about the choice of dropout layer rate.

HI,
I have a small question. In LSTM model, you set the dropout rate 0.8, is it too large for LSTM to fit the data? I always set the rate 0.1~0.2. So, I would like to know the reason about the choice of it. Thanks a lot.

TypeError: ('Keyword argument not understood:', 'use_chrono_initialization')

First of all: Thanks a lot for providing this code!

Unfortuantely, I get an error whenever I try to test the trained models, which were written to the disk. You can find the traceback below:

Traceback (most recent call last):
  File "pathToProject\Main.py", line 87, in <module>
    ANNs.evaluateSavedModels(ALSTM_FCNOutputFolderAllData,modelTypeString,annotatedDataDictionaryWithoutNoiseAndWithoutExcludedDataTruncated,resultsFolderAllData,scalerID,truncationIndizes,annotationIndizes)
  File "pathToProject\ANNs.py", line 260, in evaluateSavedModels
    [predictions,currentModelSummary] = loadAndEvaluateGivenModel(completePathToModelFile,testXScaled)
  File "pathToProject\ANNs.py", line 75, in loadAndEvaluateGivenModel
    model = load_model(pathToModel,custom_objects={'AttentionLSTM': AttentionLSTM})
  File "pathToAnaconda\Anaconda\lib\site-packages\keras\engine\saving.py", line 492, in load_wrapper
    return load_function(*args, **kwargs)
  File "pathToAnaconda\Anaconda\lib\site-packages\keras\engine\saving.py", line 584, in load_model
    model = _deserialize_model(h5dict, custom_objects, compile)
  File "pathToAnaconda\Anaconda\lib\site-packages\keras\engine\saving.py", line 274, in _deserialize_model
    model = model_from_config(model_config, custom_objects=custom_objects)
  File "pathToAnaconda\Anaconda\lib\site-packages\keras\engine\saving.py", line 627, in model_from_config
    return deserialize(config, custom_objects=custom_objects)
  File "pathToAnaconda\Anaconda\lib\site-packages\keras\layers\__init__.py", line 168, in deserialize
    printable_module_name='layer')
  File "pathToAnaconda\Anaconda\lib\site-packages\keras\utils\generic_utils.py", line 147, in deserialize_keras_object
    list(custom_objects.items())))
  File "pathToAnaconda\Anaconda\lib\site-packages\keras\engine\network.py", line 1056, in from_config
    process_layer(layer_data)
  File "pathToAnaconda\Anaconda\lib\site-packages\keras\engine\network.py", line 1042, in process_layer
    custom_objects=custom_objects)
  File "pathToAnaconda\Anaconda\lib\site-packages\keras\layers\__init__.py", line 168, in deserialize
    printable_module_name='layer')
  File "pathToAnaconda\Anaconda\lib\site-packages\keras\utils\generic_utils.py", line 149, in deserialize_keras_object
    return cls.from_config(config['config'])
  File "pathToLSTMCode/LSTM-FCN/utils\layer_utils.py", line 730, in from_config
    return cls(**config)
  File "pathToAnaconda\Anaconda\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "pathToLSTMCode/LSTM-FCN/utils\layer_utils.py", line 596, in __init__
    **kwargs)
  File "pathToAnaconda\Anaconda\lib\site-packages\keras\layers\recurrent.py", line 410, in __init__
    super(RNN, self).__init__(**kwargs)
  File "pathToAnaconda\Anaconda\lib\site-packages\keras\engine\base_layer.py", line 147, in __init__
    raise TypeError('Keyword argument not understood:', kwarg)
TypeError: ('Keyword argument not understood:', 'use_chrono_initialization')

I'm using keras (Version: 2.3.1) and TensorFlow (Version: 2.0.0).

the size of the outputs of LSTM and FCN

We need to concatenate the outputs of LSTM and FCN but their sizes are different. So I am not able to concatenate them. Where should I work on to fix this?

How to make a prediction

hello, I have trained my own data, and how to make a prediction with a new unknown data(without labels)?
code like"result = pd.read_csv("unknown108.csv",header=None); model.predict(unknown)" doesnot work

Conv input permuted in code, but LSTM in paper?

In the paper you write

the LSTM block (...) receives the input as a multivariate time series with a single time step. This is accomplished by the dimension shuffle layer, which transposes the temporal dimension of the time series.

But in the code you always permute the y-section, i.e. the convolutional part, and never the inputs for the LSTM:

def generate_model():
    ip = Input(shape=(MAX_NB_VARIABLES, MAX_TIMESTEPS))

    x = Masking()(ip)
    x = LSTM(8)(x)
    x = Dropout(0.8)(x)

    y = Permute((2, 1))(ip)
    y = Conv1D(...

    x = concatenate([x, y])

    out = Dense(NB_CLASS, activation='softmax')(x)

Did I misunderstand?

Error output when I run all_datasets_training.py

Traceback (most recent call last):
File "/home/lidar/projects/LSTM-FCN/all_datasets_training.py", line 268, in
train_model(model, did, dataset_name_, epochs=2000, batch_size=128,
File "/home/lidar/projects/LSTM-FCN/utils/keras_utils.py", line 63, in train_model
X_train, y_train, X_test, y_test, is_timeseries = load_dataset_at(dataset_id,
File "/home/lidar/projects/LSTM-FCN/utils/generic_utils.py", line 65, in load_dataset_at
y_train = (y_train - y_train.min()) / (y_train.max() - y_train.min()) * (nb_classes - 1)
TypeError: unsupported operand type(s) for -: 'str' and 'str'

I have this error after running the all_datasets_training.py
can you help me fixing this?

OSError: Unable to open file (Unable to open file:

OSError: Unable to open file (Unable to open file: name = './weights/bird_chicke
n_weights.h5', errno = 2, error message = 'no such file or directory', flags = 0
, o_flags = 0) Hello, in the weight folder, there is no% s_weights.h5 this file, so this error? New to deep learning, so I do not really understand

Hello, I plan to change LSTM into BiLSTM in this network. How can I solve such a mistake in the process?

Hello, I plan to change LSTM into BiLSTM in this network. How can I solve such a mistake in the process?
InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 304 values, but the requested shape requires a multiple of 48 [[Node: attention_bi_lstm_1/while/Reshape_1 = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](attention_bi_lstm_1/while/BiasAdd, attention_bi_lstm_1/while/stack_1)]]

AttentionLSTM Exception

Hi, thanks for your code. I meet a problem when I run it. The Error Message is as below:

Using TensorFlow backend.
Traceback (most recent call last):
File "E:/pycharm-program/thesis/scheme1/function2/LSTM-FCN-1.0/adiac_model.py", line 7, in
from utils.layer_utils import AttentionLSTM
File "E:\pycharm-program\thesis\scheme1\function2\LSTM-FCN-1.0\utils\layer_utils.py", line 62, in
class AttentionLSTM(Recurrent):
TypeError: module.init() takes at most 2 arguments (3 given)

I want to know why it occurs? Look forward to your reply.

question about the input with one timestep to the LSTM layer

Excuse me, sir. There is no doubt that the performance of your models is very good. But I have the same question as @zsh965866221 .

As we all know, LSTM is usually used to extract the features of time and the input shape of LSTM in Keras is that:

3D tensor with shape (batch_size, timesteps, input_dim).

However, you set the timestep as one in your paper and codes such as:

ip = Input(shape=(1, MAX_SEQUENCE_LENGTH))
x = LSTM(8)(ip)
x = Dropout(0.8)(x)
which means the shape of LSTM's input is (batch_size, 1 , MAX_SEQUENCE_LENGTH)

Emm, I just want to learn from you that whether it can extract the the features of time if it just has one timestep?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.