Giter Club home page Giter Club logo

speech_emotion_recognition_blstm's Introduction

Speech_emotion_recognition_BLSTM

Bidirectional LSTM network for speech emotion recognition.

Environment:

  • Python 2.7/3.6
  • NVIDIA Geforce GTX 1060 6GB
  • Conda version 4.5

Dependencies

Datasets

Usage

  • Since the function "stFeatureSpeed" in pyAudioAnalysis is default unworkable, you have to modify the code in audioFeatureExtraction.py (for index related issue, just cast the value type to integer; for the issue in method stHarmonic, cast M to integer(M = int(M); Comment out the invocation of method 'mfccInitFilterBanks' in stFeatureSpeed).
  • If you run the code in python 3, please upgrade pyAudioAnalysis to the latest version that compatible with python 3.
  • You have to prepare at least two different sets of data, one for find the best model and the other for testing.
Long option Option Description
--dataset -d dataset type
--dataset_path -p dataset or the predicted data path
--load_data -l load dataset and dump the data stream to a .p file
--feature_extract -e extract features from data and dump to a .p file
--model_path -m the model path you want to load
--nb_classes -c the number of classes of your data
--speaker_indipendence -s cross validation is made using different actors for train and test sets

Example find_best_model.py:

python find_best_model.py -d "berlin" -p [berlin data path] -l -e -c 7
  • The first time you run the script, -l and -e options are mandatory since you need to load data and extract features.
  • Every time you change the training data and/or the method of feature engineering, you have to specify -l and/or -e respectively to update your .p files.
  • You can also modify the code for tuning other hyper parameters.

Example prediction.py:

python prediction.py -p [data path] -m [model path] -c 7

Example model_cross_validation.py:

python model_cross_validation.py -d "berlin" -p [berlin data path] -l -e -c 7
  • Use -s for k-fold cross validation in different actors.

Experimental result

  • Use hyperas for tuning optimizers, batch_size and epochs, the remaining parameters are the values applied to the paper below.
  • The average accuracy is about 68.60%(+/- 1.88%, through 10-fold cross validation, using Berlin dataset).

References

  • S. Mirsamadi, E. Barsoum, and C. Zhang, “Automatic speech emotion recognition using recurrent neural networks with local attention,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, U.S.A., Mar. 2017, IEEE, pp. 2227–2231.

  • Fei Tao, Gang Liu, “Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition,” Submitted to 2018 IEEE International Conference on Acoustics, Speech and Signal Processing.

  • Video from Microsoft Research

Future work

  • The training data I list above (Berlin) may insufficient, the validation accuracy and loss can't be improved while the training result is not good.
  • Given sufficient training examples, the parameters of short-term characterization, long-term aggregation, and the attention model can be jointly optimized for best performance.
  • Update the current network architecture to improve the accuracy (already in progress).

speech_emotion_recognition_blstm's People

Contributors

rayanwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speech_emotion_recognition_blstm's Issues

TypeError: 'float' object cannot be interpreted as an integer

Please help me to solve this problem:
(environment:Python3.5)

Traceback (most recent call last):
File "find_best_model.py", line 167, in
extract_dataset(ds.data, nb_samples=len(ds.targets), dataset=dataset)
File "E:\TensorFlow\GitHub\Speech_emotion_recognition_BLSTM-master\utility\audio.py", line 75, in extract_dataset
hr_pitch = ShortTermFeatures.speed_feature(x, Fs, globalvars.frame_size * Fs, globalvars.step * Fs)
File "C:\Users\asus\Anaconda3\envs\tensorflow\lib\site-packages\pyAudioAnalysis\ShortTermFeatures.py", line 473, in speed_feature
logsc, nlinfil, nlogfil)
File "C:\Users\asus\Anaconda3\envs\tensorflow\lib\site-packages\pyAudioAnalysis\ShortTermFeatures.py", line 199, in mfcc_filter_banks
fbank = np.zeros((num_filt_total, num_fft))
TypeError: 'float' object cannot be interpreted as an integer

Thanks!

which method does this repo based on?

Hi RayanWANG, i have go through ur repo, first of all, that's a good work, thx for ur share.
But, i want to know the method u use. I see that u have put 2 reference here, but which one do u based on?
I want more information to understand better the algorithm.
If u can tell the one u use, that will be great.
Thx.

Error about mfccInitFilterBanks()

Hello RanyanWang,I read your code and want to run it.But I catch the erorr "TypeError: mfccInitFilterBanks() takes exactly 2 arguments (7 given)",then I delete five arguments But new error appear "TypeError: 'float' object cannot be interpreted as an index",Can you tell me how to modify the code in audioFeatureExtraction.py?Just delete the code of stFeatureSpeed?

accuracy

sir I applied this code to ieamocap dataset. i got 54.5 after running find_best_model.py . but i didn't get 63.3 accuracy as you mentioned in paper . can you tell me how much you got and why i got less . and also can you please explain me the use of model_cross_validation.py

Program just stops without any messa

Hi,
For some reason, when I try to run training of the model, the program starts and then it just exits without any message. What could be the issue?

I'm using Python 3.6, so I did comment out the invocation of method 'mfccInitFilterBanks' in stFeatureSpeed and did cast M to integer.

Any suggestions?
capture
as
asd

meet a issue in python3

image

File "F:/123/find_best_model.py", line 160, in
ds = pickle.load(open(dataset + '_db.p', 'rb'))

FileNotFoundError: [Errno 2] No such file or directory: 'berlin_db.p'

dataset.py file doubt

Could you please explain to me why you use this line of code?
"for speak_test in itertools.product(males, females): # test_couples:"
Shouldn't you only use a for cycle for going over all the audio files once?

error about ipykernel

when I am trying to run the code I am getting the following error
Usage: ipykernel_launcher.py [options]

ipykernel_launcher.py: error: no such option: -f

An exception has occurred, use %tb to see the full traceback.

SystemExit: 2

please respond.Thank you in advance

TypeError: slice indices must be integers or None or have an __index__ method

Hi! Rayan! I'm sorry to disturb you, I have a issue :

python find_best_model.py -d "berlin" -p E:\TensorFlow\Emo-DB\wav -l -e -c 7
Using TensorFlow backend.
Writing berlin data set to file...
Traceback (most recent call last):
File "find_best_model.py", line 167, in
extract_dataset(ds.data, nb_samples=len(ds.targets), dataset=dataset)
File "E:\TensorFlow\GitHub\Speech_emotion_recognition_BLSTM-master\utility\audio.py", line 75, in extract_dataset
hr_pitch = ShortTermFeatures.speed_feature(x, Fs, globalvars.frame_size * Fs, globalvars.step * Fs)
File "C:\Users\asus\Anaconda3\envs\tensorflow\lib\site-packages\pyAudioAnalysis\ShortTermFeatures.py", line 485, in speed_feature
x = signal[cur_p:cur_p + window]
TypeError: slice indices must be integers or None or have an index method

please help me about that! Thanks very much!

Other languages

Hi,

Does Speech_emotion_recognition_BLSTM work with other languages or only in German?

how do you deal with the variant length of audio

Hi RayanWang,

I have gone through your code these days, thank you so much for your sharing and it is really nice work.

But I still have a question, can you tell me in which part of your code is to deal with the length of the audio data. I also work on Berlin Dataset, but the audio has a different length from each other. I used the padding method but the results were not that good as yours.

I am looking forward to getting your reply.

Chason

Test Accuracy

Hi RayanWang,
I have started the model training with the use of find_best_model.py.Below validation accuracy was achieved.
While the training process is ongoing 200 epochs, highest validation accuracy was 0.5652(early stopping at 35)

While the training process is ongoing 100 epochs, highest validation accuracy was 0.3354(early stopping at 19)
I want to increase the testing accuracy more (at least 70).what are additional things and modifications should I follow.
Thank u.

I have a statement that doesn't quite understand

Hi Rayan:
Thanks,I'm sorry I ask for your help again.Thanks very much for your enjoying code that helps me a lot.When I read the code,I have a statement that doesnit quite understand in ‘find_best_model.py’.In the function of the create_model,"globalvars.globalVar += 1",this code I don't understand ,Please help explain this statement.Please help me,thanks very much!

Speech_emotion_recognition_BLSTM error

Loading data and features...
Number of samples: 535
Traceback (most recent call last):
File "/home/lwin/speech-emotion/Speech_emotion_recognition_BLSTM-master1/find_best_model.py", line 171, in
trials=trials)
File "/usr/local/lib/python3.5/dist-packages/hyperas/optim.py", line 67, in minimize
verbose=verbose)
File "/usr/local/lib/python3.5/dist-packages/hyperas/optim.py", line 115, in base_minimizer
space=get_space(),
File "./temp_model.py", line 203, in get_space
NameError: name 'sgd' is not defined
Please help to solve!

about paper

Does this experiment have a corresponding paper?

Real Time Application

Thanks for sharing this awesome repo! I wonder if the current model would handle real time prediction on videos? Could you briefly outline how to do that?

Thanks in advance!

TypeError: mfccInitFilterBanks() takes 2 positional arguments but 7 were given

/usr/anaconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/usr/anaconda3/lib/python3.6/site-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
/usr/anaconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/usr/anaconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/shakey/.local/lib/python3.6/site-packages/pydub/utils.py:165: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
Writing berlin data set to file...
Traceback (most recent call last):
File "find_best_model.py", line 167, in
extract_dataset(ds.data, nb_samples=len(ds.targets), dataset=dataset)
File "/home/shakey/speech_emotion_recongtion/Speech_emotion_recognition_BLSTM/utility/audio.py", line 73, in extract_dataset
hr_pitch = audioFeatureExtraction.stFeatureSpeed(x, Fs, globalvars.frame_size * Fs, globalvars.step * Fs)
File "/home/shakey/.local/lib/python3.6/site-packages/pyAudioAnalysis/audioFeatureExtraction.py", line 685, in stFeatureSpeed
[fbank, freqs] = mfccInitFilterBanks(fs, nfft, lowfreq, linsc, logsc, nlinfil, nlogfil)
TypeError: mfccInitFilterBanks() takes 2 positional arguments but 7 were given

How to modify the code in audioFeatureExtraction.py to fix this error

=========================================================
Writing berlin data set to file...
Traceback (most recent call last):
File "/home/lwin/speech-emotion/Speech_emotion_recognition_BLSTM-master/find_best_model.py", line 163, in
functions.feature_extract(ds.data, nb_samples=len(ds.targets), dataset=dataset)
File "/home/lwin/speech-emotion/Speech_emotion_recognition_BLSTM-master/utility/functions.py", line 20, in feature_extract
hr_pitch = audioFeatureExtraction.stFeatureSpeed(x, Fs, globalvars.frame_size * Fs, globalvars.step * Fs)
File "/usr/local/lib/python3.5/dist-packages/pyAudioAnalysis/audioFeatureExtraction.py", line 669, in stFeatureSpeed
[fbank, freqs] = mfccInitFilterBanks(Fs, nfft, lowfreq, linsc, logsc, nlinfil, nlogfil)
TypeError: mfccInitFilterBanks() takes 2 positional arguments but 7 were given

=================================================================
How to solve this issue.plz help me .Thank u

Data cannot be loaded

I downloaded the Berlin data but couldn't find the .p file. Is the training data format .wav?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.