Giter Club home page Giter Club logo

mica-gender-from-audio's Introduction

mica-gender-from-audio

Generate gender and SAD timestamps of audio based on neural network models trained in Keras. Input must be a text file containing full paths to either mp4/mkv media files or .wav audio files, and optionally the path to the directory where all the output will be stored (default=$proj_dir/expt).
Outputs will be a text file for each movie/audio file, each line of which will contain the start and end times for the speech segment followed by the gender (male/female). Frame level posteriors are also saved in the output directory.

Usage:

bash generate_gender_timestamps.sh [-h] [-c config_file] movie_paths.txt (out_dir)  
e.g.: bash generate_gender_timestamps.sh -c ./config.sh demo.txt DEMO  
where:   
-h              : Show help 
-c              : Configuration file
movie_paths.txt : Text file consisting of complete paths to media files (eg, .mp4/.mkv) on each line    
out_dir         : Directory in which to store all output files (default: "$PWD/gender_out_dir")  

Example config file:

nj=4                # Number of files to process simultaneously 
feats_flag="y"      # "y"/"n" flag to keep kaldi-feature files after inference
wavs_flag="n"       # "y"/"n" flag to keep .wav audio files after inference
sad_overlap=0       # % overlap in SAD-segments (range: 0-1, 0 for no overlap) (single segment is 0.64s)
gender_overlap=0    # % overlap in GENDER-segments (single segment is 0.96s) 
uniform_seg_len=2.0   # Segment length for uniform speaker-segmentation 
only_vad=0          # 1 if only sad outputs required

Dependencies :

kaldi                    :   ensure that all kaldi binaries are added to system path. If not,
                                 either add them to system path, or modify kaldi_root in 1st line of
                                 'path.sh' to reflect kaldi installation directory.
keras, tensorflow        :   required to load data, model and make VAD predictions.
Other python libraries required include numpy, scipy, resampy.
h5py==2.7.1
Keras==2.1.5
numpy==1.14.2
requests==2.18.4
resampy==0.2.0
scipy==1.0.0
six==1.12.0
tensorflow==1.4.1

This tool can be used for noise-robust gender identification from audio. Two parallel systems are implemented for this purpose:

  1. Speech Activity Detection (SAD), and
  2. Gender Identification (GID) of speech segments.

Both of the DNN-based systems make predictions at segment-level as opposed to traditional frame-level analysis. Segment duration for the SAD system is 0.64s (design choice) and for the GID system is 0.96s (pre-trained VGGish embeddings). For more details about the architecture and training procedures, please refer to the ICASSP '19 paper (SAD), and INTERSPEECH '18 paper (GID).

mica-gender-from-audio's People

Contributors

krsna6 avatar rabbeh avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mica-gender-from-audio's Issues

About the dataset

Hi,I have requested the open source data set mentioned in the paper, but I have not responded,can you help me

Configurable wrapper script

Add a config file which lets you pick which parts of the pipeline to run. e.g., just run gender prediction, don't run wav file extraction, etc.

How to use this project using mp3 files of spoken dialogs

I have followed the readme and made a text file pointing to a list of mp3 files which contain spoken dialogs in various languages which I want to label with gender.

I am getting the error:


Using configuration :
nj=2
feats_flag="n"
wavs_flag="y"
sad_overlap=0.0
gender_overlap=0
uniform_seg_len=1
only_vad=0

 >>>> CREATING WAV FILES <<<< 
bash_scripts/create_wav_files.sh: line 33: 1 % 0 : division by 0 (error token is "0 ")
[mp3float @ 0x565405cffc80] Header missing
Error while decoding stream #0:0: Invalid data found when processing input
[mp3float @ 0x565405cffc80] Header missing
Error while decoding stream #0:0: Invalid data found when processing input
[mp3float @ 0x565405cffc80] Header missing
Error while decoding stream #0:0: Invalid data found when processing input
[mp3float @ 0x565405cffc80] Header missing
Error while decoding stream #0:0: Invalid data found when processing input
[mp3float @ 0x565405cffc80] Header missing
Error while decoding stream #0:0: Invalid data found when processing input
Unable to extract all .wav files, exiting...
 >>>> EXTRACTING FEATURES FOR SAD <<<< 
Usage: split_scp.pl [--utt2spk=<utt2spk_file>] in.scp out1.scp out2.scp ... 
 or: split_scp.pl -j num-jobs job-id [--utt2spk=<utt2spk_file>] in.scp [out.scp]
 ... where 0 <= job-id < num-jobs. at ./split_scp.pl line 64.
cat: results//features/feats.scp: No such file or directory
 >>>> GENERATING SAD LABELS <<<< 
cat: results//features/feats.scp: No such file or directory
generate_gender_pred.sh: line 142: 1 % 0: division by 0 (error token is "0")
 >>>> EXTRACTING VGGISH EMBEDDINGS <<<< 
WARNING:tensorflow:From /home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py:1624: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
WARNING:tensorflow:From /home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

Traceback (most recent call last):
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/audioset_scripts/vggish_model.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
	 [[{{node vggish_load_pretrained/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/extract_vggish_feats.py", line 139, in <module>
    tf.app.run()
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/extract_vggish_feats.py", line 75, in main
    vggish_slim.load_vggish_slim_checkpoint(sess, checkpoint)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/audioset_scripts/vggish_slim.py", line 128, in load_vggish_slim_checkpoint
    saver.restore(session, checkpoint_path)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1276, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/audioset_scripts/vggish_model.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
	 [[node vggish_load_pretrained/RestoreV2 (defined at /home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/audioset_scripts/vggish_slim.py:127) ]]

Caused by op 'vggish_load_pretrained/RestoreV2', defined at:
  File "/home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/extract_vggish_feats.py", line 139, in <module>
    tf.app.run()
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/extract_vggish_feats.py", line 75, in main
    vggish_slim.load_vggish_slim_checkpoint(sess, checkpoint)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/audioset_scripts/vggish_slim.py", line 127, in load_vggish_slim_checkpoint
    saver = tf.train.Saver(vggish_vars, name='vggish_load_pretrained')
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 832, in __init__
    self.build()
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 844, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 881, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 513, in _build_internal
    restore_sequentially, reshape)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 332, in _AddRestoreOps
    restore_sequentially)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 580, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1572, in restore_v2
    name=name)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

DataLossError (see above for traceback): Unable to open table file /home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/audioset_scripts/vggish_model.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
	 [[node vggish_load_pretrained/RestoreV2 (defined at /home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/audioset_scripts/vggish_slim.py:127) ]]

 >>>> PREDICTING GENDER SEGMENTS <<<< 
 >>>> GENDER SEGMENTS PER-MOVIE CAN BE FOUND IN results//GENDER/timestamps <<<< 

There is no such command as "spk-seg".

I am trying to run ./generate_gender_pred.sh but it's not executing properly as there is no such command or file for speaker segmentation. Please help me out on this. Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.