Giter Club home page Giter Club logo

ft-w2v2-ser's People

Contributors

b04901014 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ft-w2v2-ser's Issues

The result on MELD dataset is not as good as IEMOCAP

  1. I download the MELD Datasets from https://affective-meld.github.io/
  2. Then I do some lable mapping which map the ‘joy’ to 'happy' ,'sadness' to 'sad' by this script:
    if x == 'neutral':
        return 'neutral'
    elif x == 'joy':
        return 'happy'
    elif x == 'anger':
        return 'anger'
    elif x == 'sadness':
        return 'sad'
    else:
        return '-1'

3.Then I exact wav from mp4 by this shell:

#!/bin/bash
files=$(ls $1)
for filename in $files
do
   echo ${filename%.*}
   ffmpeg -i $1/${filename%.*}.mp4 -f wav -ar 16000 -ac 1  $2/${filename%.*}.wav
done

4.I select samples to train the model.And here my sample statistics:

Statistics of training splits:
----Involved Emotions----
sad: 450 examples
anger: 450 examples
neutral: 450 examples
happy: 450 examples
Total 1800 examples
----Examples Involved----

Statistics of testing splits:
----Involved Emotions----
sad: 233 examples
anger: 233 examples
happy: 233 examples
neutral: 233 examples
Total 932 examples
----Examples Involved----

5、Then I run the https://github.com/b04901014/FT-w2v2-ser/blob/main/bin/run_exp_iemocap.sh
Unfortunately I got an error like this:

f"`mask_length` has to be smaller than `sequence_length`, but got `mask_length`: {mask_length} and `sequence_length`: {sequence_length}`"`

So I modified the code in https://github.com/b04901014/FT-w2v2-ser/blob/main/modules/FeatureFuser.py from

                # apply SpecAugment along time axis
                batch_size, sequence_length, hidden_size = wav2vec_z.size()
                mask_time_indices = _compute_mask_indices(
                    (batch_size, sequence_length),
                    self.mask_time_prob,
                    self.mask_time_length,
                    min_masks=2,
                    device=x.device
                )

to

                # apply SpecAugment along time axis
                batch_size, sequence_length, hidden_size = wav2vec_z.size()
                mask_length = min(self.mask_time_length, sequence_length)
                mask_time_indices = _compute_mask_indices(
                    (batch_size, sequence_length),
                    self.mask_time_prob,
                    mask_length,
                    min_masks=2,
                    device=x.device
                )

6.Finally I got the result

image

Here is the P-TAPT.log:

+++ SUMMARY +++
Mean UAR [%]: 39.21
Fold Std. UAR [%]: 0.00
Fold Median UAR [%]: 39.21
Run Std. UAR [%]: 0.62
Run Median UAR [%]: 38.84
Mean WAR [%]: 39.21
Fold Std. WAR [%]: 0.00
Fold Median WAR [%]: 39.21
Run Std. WAR [%]: 0.62
Run Median WAR [%]: 38.84
Mean macroF1 [%]: 38.16
Fold Std. macroF1 [%]: 0.00
Fold Median macroF1 [%]: 38.16
Run Std. macroF1 [%]: 1.45
Run Median macroF1 [%]: 37.73
Mean microF1 [%]: 39.95
Fold Std. microF1 [%]: 0.00
Fold Median microF1 [%]: 39.95
Run Std. microF1 [%]: 0.38
Run Median microF1 [%]: 39.93

And the confusion matrix:

[[358. 310. 207. 290.]
 [145. 512. 230. 278.]
 [175. 299. 370. 321.]
 [156. 231. 191. 587.]]

The result is so bad,It will be great if you can help.

run_downstream_custom_multiple_fold.py CUDA out of memory

Got the following when running run_downstream_custom_multiple_fold.py
RuntimeError: CUDA out of memory. Tried to allocate 730.00 MiB (GPU 0; 23.70 GiB total capacity; 21.65 GiB already allocated; 426.81 MiB free; 21.81 GiB reserved in total by PyTorch)

I have NVIDIA GeForce RTX 3090 with 24GB.

Any insights on how to workaround it?

NUM_EXPS

Hello, I want to know about the values of NUM_EXP because I cannot obtain the correct confusion matrix when performing each fine-tuning method.
The results are :
Testing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1085/1085 [00:32<00:00, 33.17it/s]
[[ 0. 2519. 916. 0.]
[ 0. 3058. 1112. 0.]
[ 0. 4224. 1536. 0.]
[ 0. 2134. 776. 0.]]
Saved figure to downstream/checkpoints/custom/confmat.png.

Questions about batch size and clustering model

  1. What's the rationale behind making the default batch size 64 for the pre-training, continued pre-training, and fine-tuning loops? Others have mentioned that they had to reduce the batch size to make it run on their systems, considering the original code uses a single GPU. Is this the batch size that produced the best results in your experiments?
  2. I noticed that cluster.py accepts either wav2vec or wav2vec2 as the model_type. Why did you move forward with making wav2vec2 as the default model? Could you have used HuBERT or other variations of a transformer-based model?

How to test

Thank you for your share, I am new to SER, and I am learning this code recently. I would like to ask how to test it. Is there a reference code?

Unable to get file status

cp: Unable to get'Dataset/IEMOCAP/labels_sess/label_{SESSION_TO_TEST}.json' s File status(stat): There is no file or directory
cp: 无法获取'Dataset/IEMOCAP/labels_sess/label_{SESSION_TO_TEST}.json' 的文件状态(stat): 没有那个文件或目录

Pre-training

python run_pretrain.py --datadir Audio_Dir \
                       --labelpath Label_Path
                       --labeling_method hard \
                       --saving_path Saving_Path \
                       --training_step 10000 \
                       --save_top_k 1 \
                       --wav2vecpath Wav2vecCKPT \
                       --precision 16

Here --labelpath, should be a session label or metalabel.json?

Issue about TAPT

We cannot reproduce the results about TAPT, and our pretraining loss is ''nan'' when running ''FT-w2v2-ser-main\run_baseline_continueFT.py''. Can you help us solve this issue?

TypeError: _compute_mask_indices() got an unexpected keyword argument 'device'

Hi while running run_downstream_custom_multiple_fold.py, I ran into
TypeError: _compute_mask_indices() got an unexpected keyword argument 'device'
In package transformers modeling_wav2vec2.py also shows that _compute_mask_indices() does not take arg device
Am I missing something here?

Any help is appreciated!

How many GPUs is enough?

Wondering how many Gpus you used, I used 2 v100s (16GB) and still couldn't run the last phase of the code until I reduced batch-size to 48 .I've made sure I use both gpus by modifying the following code:

        trainer = Trainer(
            precision=args.precision,
            amp_backend='native',
            callbacks=[checkpoint_callback] if hasattr(model, 'valid_met') else None,
            checkpoint_callback=hasattr(model, 'valid_met'),
            resume_from_checkpoint=None,
            check_val_every_n_epoch=1,
            max_epochs=hparams.max_epochs,
            num_sanity_val_steps=2 if hasattr(model, 'valid_met') else 0,
            gpus=-1,
            strategy='dp',  # multiple-gpus, 1 machine
            logger=False
        )

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.