justin1904 / tensorfusionnetworks Goto Github PK

Pytorch Implementation of Tensor Fusion Networks for multimodal sentiment analysis.

Python 100.00%

tensorfusionnetworks's Introduction

IMPORTANT NOTICE

The CMU-MultimodalSDK on which this repo depend has drastically changed its API since this code is written. Hence the code in this repo cannot be run off-the-shelf anymore. However, the code for the model itself can still be of reference.

Tensor Fusion Networks

This is a PyTorch implementation of:

Zadeh, Amir, et al. "Tensor fusion network for multimodal sentiment analysis." EMNLP 2017 Oral.

It requires PyTorch and the CMU Multimodal Data SDK (https://github.com/A2Zadeh/CMU-MultimodalDataSDK) to function properly. The training data (CMU-MOSI dataset) will be automatically downloaded if you run the script for the first time.

The model is defined in model.py, and the training script is train.py. Here's a list of commandline arguments for train.py:

--dataset: default is 'MOSI', currently don't really support other datasets. Just ignore this option

--epochs: max number of epochs, default is 50

--batch_size: batch size, default is 32

--patience: specifies the early stopping condition, similar to that in Keras, default 20

--cuda: whether or not to use GPU, default False

--model_path: a string that specifies the location for storing trained models, default='models'

--max_len: max sequence length when preprocessing data, default=20

In a nutshell, you can train the model using the following command:

python train.py --epochs 100 --patience 10

The script starts with a randomly selected set of hyper-parameters. If you want to tune it, you can change them yourself in the script.

tensorfusionnetworks's People

Contributors

Stargazers

Watchers

tensorfusionnetworks's Issues

calculating features

Hi,

Lets say I'd like to calculate the whole thing for a given video (a test sample) from zero - Can you post the code that calculates the tri- features of visual (openFace ?) audio (pyAudioAnalysis / Covarep ?) and language? - and how to extract utterances from audio?

TFN results from paper

Hi, in table 2 of the TFN (2017) paper, there is a reference to performance results for TFN_trimodal and another to TFN. The results are different. What's the difference? Because the data is trimodal, shouldn't they be the same?

The dataset link is invalid.

Hello, may I ask if there is a solution to the issue that the link to download the dataset is invalid: http://sorena.multicomp.cs.cmu.edu/downloads/?

How can I get the contribution of each modality?

Dear Justin,

I saw the article in https://towardsdatascience.com/multimodal-deep-learning-ce7d1d994f4, but I did not see the code about the contribution of each modality. Besides, I saw some descriptions about the loss function on the website https://purvanshi.github.io/documents/Multimodalppt.pdf, when I saw the code I could not find where it is.

Thank you very much and I am looking forward to your reply.

Best wishes

How to apply CMU Multimodal Data SDK into this project

Firstly: In the utils.py, I have no idea what is "mmdata" variable

Secondly: After I run the train.py , I got error as follow：
Temp location for saving model: models\tfn.pt
Traceback (most recent call last):
Currently using MOSI dataset.
File "D:/A PROJECT/TensorFusionNetworks-master/train.py", line 213, in
main(PARAMS)
File "D:/A PROJECT/TensorFusionNetworks-master/train.py", line 91, in main
train_set, valid_set, test_set, input_dims = preprocess(options)
File "D:/A PROJECT/TensorFusionNetworks-master/train.py", line 41, in preprocess
mosi = MultimodalDataset(dataset, max_len=max_len)
File "D:\A PROJECT\TensorFusionNetworks-master\utils.py", line 56, in init
self.dataloader = mmdata.dictdataset
KeyError: 'MOSI'
It seems it can't find "MOSI" dataset

I have dowload the CMU-Multimodal SDK into this project.
But how to apply it? set mmdata like "from mmsdk import mmdatasdk as mmdata"?

I'm looking forward to your reply.
Thanks and Best Regards
Jia Li

Unable to replicate paper results

Hey,
I am unable to replicate the results (get a binary test accuracy of 77.1) I made the following changes of hyperparameters in your code -

Changed the output of the fusion layer to 128 instead of 32
dropout of 0.15 as stated in the paper
added l2 norm regulariser by adding a weight decaying factor of 0.01
Changed the number of output nodes of the text modality to 64 instead of 32

I run the model with a batch_size of 128, 1000 epochs and a patience of 100
Am I missing something here?

training process

I ran the demo according to your instructions on ubuntu:
python train.py --epochs 100 --patience 10

and the results are:
Epoch 21 complete! Average Training loss: 0.358191079584 Validation loss is: 1.0758594363 Validation binary accuracy is: 0.685589519651 MAE on test set is 1.08729708925 Binary accuracy on test set is 0.69970845481 Precision on test set is 0.675958188153 Recall on test set is 0.631921824104 F1 score on test set is 0.653198653199 Seven-class accuracy on test set is 0.316326530612 Correlation w.r.t human evaluation on test set is 0.544894550488

So my question is why the process ends after 21 epochs (not 100)

F-score

Hi,

I am curious why the F-score is returned with the average parameter set to 'binary'? Doesn't it lead to reporting the F-score only for the 1 class, meaning positive sentiment? Wouldnt it be fair to have a macro F-score, where individual negative and positive class performance is averaged? The performance of the 0 class is kind of ignored in the 'binary' setup.

Old api

Hi Justin,
It seems that the api mmsdk was renewed but the codes here still use the old api. So it cannot be used now. Would you have plans to renew the codes here?

Thank you!

Best Wishes,
Lan-qing

KeyError: 'MOSI'

Traceback (most recent call last): File "train.py", line 202, in <module> main(PARAMS) File "train.py", line 80, in main train_set, valid_set, test_set, input_dims = preprocess(options) File "train.py", line 30, in preprocess mosi = MultimodalDataset(dataset, max_len=max_len) File "E:\My Collection\3rd year\DL\TensorFusionNetworks-master\utils.py", line 53, in __init__ self.dataloader = mmdata.__dict__[dataset]() KeyError: 'MOSI'

How can I make it run after the API of the mmsdk changed?Can you tell me the old API or give me the old mmsdk?Thank you so much!

model architecture

Hi Justin,
may thanks for the contribution, I read the paper and your implementation, could you answer the following questions:

For each subnet you begin the first layer with dropout- why?
y_1 = F.relu(self.linear_1(dropped))
according to Figure 3, in the paper, there are two 128 Relus, so why you implemented a linear layer:
y_1 = self.linear_1(h)
without defining it as Relu in the forward function, like you did for the Subnet layers, for example:
y_2 = F.relu(self.linear_2(y_1))
what is the output of the network? according to the paper, the binary result achieved the best accuracy, so why do you return a scalar value between -3 and 3?
according to the paper, the tensor is 3d, what are you doing here in order to get that?

fusion_tensor = fusion_tensor.view(-1, (self.audio_hidden + 1) * (self.video_hidden + 1), 1)
fusion_tensor = torch.bmm(fusion_tensor, _text_h.unsqueeze(1)).view(batch_size, -1)

Weights of audio sub network always converging towards 0

The last layer of audio subnetwork (audio_h in model.py file) always updates to a tensor of zeros, which means that audio input has no contribution towards the output. Is this what the implementation should be?

Weights of audio sub network returning NAN

The layer1 weights of the audio subnetwork start giving NAN values. Particularly talking about the output of
self.linear_1(dropped)
I tried adjusting the learning rate but it doesn't help.

justin1904 / tensorfusionnetworks Goto Github PK

tensorfusionnetworks's Introduction

IMPORTANT NOTICE

Tensor Fusion Networks

tensorfusionnetworks's People

Contributors

Stargazers

Watchers

Forkers

tensorfusionnetworks's Issues

Recommend Projects

Recommend Topics

Recommend Org