Giter Club home page Giter Club logo

tensorfusionnetworks's Introduction

IMPORTANT NOTICE

The CMU-MultimodalSDK on which this repo depend has drastically changed its API since this code is written. Hence the code in this repo cannot be run off-the-shelf anymore. However, the code for the model itself can still be of reference.

Tensor Fusion Networks

This is a PyTorch implementation of:

Zadeh, Amir, et al. "Tensor fusion network for multimodal sentiment analysis." EMNLP 2017 Oral.

It requires PyTorch and the CMU Multimodal Data SDK (https://github.com/A2Zadeh/CMU-MultimodalDataSDK) to function properly. The training data (CMU-MOSI dataset) will be automatically downloaded if you run the script for the first time.

The model is defined in model.py, and the training script is train.py. Here's a list of commandline arguments for train.py:

--dataset: default is 'MOSI', currently don't really support other datasets. Just ignore this option

--epochs: max number of epochs, default is 50

--batch_size: batch size, default is 32

--patience: specifies the early stopping condition, similar to that in Keras, default 20

--cuda: whether or not to use GPU, default False

--model_path: a string that specifies the location for storing trained models, default='models'

--max_len: max sequence length when preprocessing data, default=20

In a nutshell, you can train the model using the following command:

python train.py --epochs 100 --patience 10

The script starts with a randomly selected set of hyper-parameters. If you want to tune it, you can change them yourself in the script.

tensorfusionnetworks's People

Contributors

justin1904 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

tensorfusionnetworks's Issues

calculating features

Hi,

Lets say I'd like to calculate the whole thing for a given video (a test sample) from zero - Can you post the code that calculates the tri- features of visual (openFace ?) audio (pyAudioAnalysis / Covarep ?) and language? - and how to extract utterances from audio?

TFN results from paper

Hi, in table 2 of the TFN (2017) paper, there is a reference to performance results for TFN_trimodal and another to TFN. The results are different. What's the difference? Because the data is trimodal, shouldn't they be the same?

How can I get the contribution of each modality?

Dear Justin,

I saw the article in https://towardsdatascience.com/multimodal-deep-learning-ce7d1d994f4, but I did not see the code about the contribution of each modality. Besides, I saw some descriptions about the loss function on the website https://purvanshi.github.io/documents/Multimodalppt.pdf, when I saw the code I could not find where it is.

Thank you very much and I am looking forward to your reply.

Best wishes

How to apply CMU Multimodal Data SDK into this project

Firstly: In the utils.py, I have no idea what is "mmdata" variable

Secondly: After I run the train.py , I got error as follow:
Temp location for saving model: models\tfn.pt
Traceback (most recent call last):
Currently using MOSI dataset.
File "D:/A PROJECT/TensorFusionNetworks-master/train.py", line 213, in
main(PARAMS)
File "D:/A PROJECT/TensorFusionNetworks-master/train.py", line 91, in main
train_set, valid_set, test_set, input_dims = preprocess(options)
File "D:/A PROJECT/TensorFusionNetworks-master/train.py", line 41, in preprocess
mosi = MultimodalDataset(dataset, max_len=max_len)
File "D:\A PROJECT\TensorFusionNetworks-master\utils.py", line 56, in init
self.dataloader = mmdata.dictdataset
KeyError: 'MOSI'
It seems it can't find "MOSI" dataset

I have dowload the CMU-Multimodal SDK into this project.
But how to apply it? set mmdata like "from mmsdk import mmdatasdk as mmdata"?

I'm looking forward to your reply.
Thanks and Best Regards
Jia Li

Unable to replicate paper results

Hey,
I am unable to replicate the results (get a binary test accuracy of 77.1) I made the following changes of hyperparameters in your code -

  1. Changed the output of the fusion layer to 128 instead of 32
  2. dropout of 0.15 as stated in the paper
  3. added l2 norm regulariser by adding a weight decaying factor of 0.01
  4. Changed the number of output nodes of the text modality to 64 instead of 32

I run the model with a batch_size of 128, 1000 epochs and a patience of 100
Am I missing something here?

training process

I ran the demo according to your instructions on ubuntu:
python train.py --epochs 100 --patience 10

and the results are:
Epoch 21 complete! Average Training loss: 0.358191079584 Validation loss is: 1.0758594363 Validation binary accuracy is: 0.685589519651 MAE on test set is 1.08729708925 Binary accuracy on test set is 0.69970845481 Precision on test set is 0.675958188153 Recall on test set is 0.631921824104 F1 score on test set is 0.653198653199 Seven-class accuracy on test set is 0.316326530612 Correlation w.r.t human evaluation on test set is 0.544894550488

So my question is why the process ends after 21 epochs (not 100)

F-score

Hi,

I am curious why the F-score is returned with the average parameter set to 'binary'? Doesn't it lead to reporting the F-score only for the 1 class, meaning positive sentiment? Wouldnt it be fair to have a macro F-score, where individual negative and positive class performance is averaged? The performance of the 0 class is kind of ignored in the 'binary' setup.

Old api

Hi Justin,
It seems that the api mmsdk was renewed but the codes here still use the old api. So it cannot be used now. Would you have plans to renew the codes here?

Thank you!

Best Wishes,
Lan-qing

KeyError: 'MOSI'

Traceback (most recent call last): File "train.py", line 202, in <module> main(PARAMS) File "train.py", line 80, in main train_set, valid_set, test_set, input_dims = preprocess(options) File "train.py", line 30, in preprocess mosi = MultimodalDataset(dataset, max_len=max_len) File "E:\My Collection\3rd year\DL\TensorFusionNetworks-master\utils.py", line 53, in __init__ self.dataloader = mmdata.__dict__[dataset]() KeyError: 'MOSI'

model architecture

Hi Justin,
may thanks for the contribution, I read the paper and your implementation, could you answer the following questions:

  1. For each subnet you begin the first layer with dropout- why?
    y_1 = F.relu(self.linear_1(dropped))

  2. according to Figure 3, in the paper, there are two 128 Relus, so why you implemented a linear layer:
    y_1 = self.linear_1(h)
    without defining it as Relu in the forward function, like you did for the Subnet layers, for example:
    y_2 = F.relu(self.linear_2(y_1))

  3. what is the output of the network? according to the paper, the binary result achieved the best accuracy, so why do you return a scalar value between -3 and 3?

  4. according to the paper, the tensor is 3d, what are you doing here in order to get that?

fusion_tensor = fusion_tensor.view(-1, (self.audio_hidden + 1) * (self.video_hidden + 1), 1)
fusion_tensor = torch.bmm(fusion_tensor, _text_h.unsqueeze(1)).view(batch_size, -1)

Weights of audio sub network returning NAN

The layer1 weights of the audio subnetwork start giving NAN values. Particularly talking about the output of
self.linear_1(dropped)
I tried adjusting the learning rate but it doesn't help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.