Thank you for sharing your code. By using DeepConvLSTM.ipynb, instead of using your pre-trained models, I retrained all models for all tasks (by simply uncommenting #model.fit() lines, commenting load_model() and changing y_prob=new_model.predict(X_test_new) with y_prob=model.predict(X_test_new)).
For TASK B :- Applying Model on Gestures (with NULL Class), I got an f1-score of 0.874, which is close to what you find in your model (0.888). However, for TASK B:- Applying Model on Gesture dataset (without NULL class), I got an f1-score of 0.682, which is very low compared to your model's result (0.842).
Do you have an idea about what could be the problem or why my model has very low performance compared to your pre-trained model (task_b_without_null_working.h5)? Is there a problem with the code (maybe something different with the code used to train task_b_without_null_working.h5)?
The piece of your code generating 0.682 f1-score: https://colab.research.google.com/drive/1tvwa5bboTTXQ8MgH7fcpa-b27KkLwQTS?usp=sharing
Thank you.