usc-sail / mica-gender-from-audio Goto Github PK

Gender prediction in movie audio

Shell 3.89% Python 26.53% Perl 7.42% TypeScript 62.17%

gender-recognition-by-voice voice-activity-detection movie-data female-speaking-time audioset

mica-gender-from-audio's Introduction

mica-gender-from-audio

Generate gender and SAD timestamps of audio based on neural network models trained in Keras. Input must be a text file containing full paths to either mp4/mkv media files or .wav audio files, and optionally the path to the directory where all the output will be stored (default=$proj_dir/expt).
Outputs will be a text file for each movie/audio file, each line of which will contain the start and end times for the speech segment followed by the gender (male/female). Frame level posteriors are also saved in the output directory.

Usage:

bash generate_gender_timestamps.sh [-h] [-c config_file] movie_paths.txt (out_dir)  
e.g.: bash generate_gender_timestamps.sh -c ./config.sh demo.txt DEMO  
where:   
-h              : Show help 
-c              : Configuration file
movie_paths.txt : Text file consisting of complete paths to media files (eg, .mp4/.mkv) on each line    
out_dir         : Directory in which to store all output files (default: "$PWD/gender_out_dir")

Example config file:

nj=4                # Number of files to process simultaneously 
feats_flag="y"      # "y"/"n" flag to keep kaldi-feature files after inference
wavs_flag="n"       # "y"/"n" flag to keep .wav audio files after inference
sad_overlap=0       # % overlap in SAD-segments (range: 0-1, 0 for no overlap) (single segment is 0.64s)
gender_overlap=0    # % overlap in GENDER-segments (single segment is 0.96s) 
uniform_seg_len=2.0   # Segment length for uniform speaker-segmentation 
only_vad=0          # 1 if only sad outputs required

Dependencies :

kaldi                    :   ensure that all kaldi binaries are added to system path. If not,
                                 either add them to system path, or modify kaldi_root in 1st line of
                                 'path.sh' to reflect kaldi installation directory.
keras, tensorflow        :   required to load data, model and make VAD predictions.
Other python libraries required include numpy, scipy, resampy.
h5py==2.7.1
Keras==2.1.5
numpy==1.14.2
requests==2.18.4
resampy==0.2.0
scipy==1.0.0
six==1.12.0
tensorflow==1.4.1

This tool can be used for noise-robust gender identification from audio. Two parallel systems are implemented for this purpose:

Speech Activity Detection (SAD), and
Gender Identification (GID) of speech segments.

Both of the DNN-based systems make predictions at segment-level as opposed to traditional frame-level analysis. Segment duration for the SAD system is 0.64s (design choice) and for the GID system is 0.96s (pre-trained VGGish embeddings). For more details about the architecture and training procedures, please refer to the ICASSP '19 paper (SAD), and INTERSPEECH '18 paper (GID).

mica-gender-from-audio's People

Contributors

Stargazers

Watchers

Forkers

v-yunbin samsudinng moriahdavid


Using configuration :
nj=2
feats_flag="n"
wavs_flag="y"
sad_overlap=0.0
gender_overlap=0
uniform_seg_len=1
only_vad=0

 >>>> CREATING WAV FILES <<<< 
bash_scripts/create_wav_files.sh: line 33: 1 % 0 : division by 0 (error token is "0 ")
[mp3float @ 0x565405cffc80] Header missing
Error while decoding stream #0:0: Invalid data found when processing input
[mp3float @ 0x565405cffc80] Header missing
Error while decoding stream #0:0: Invalid data found when processing input
[mp3float @ 0x565405cffc80] Header missing
Error while decoding stream #0:0: Invalid data found when processing input
[mp3float @ 0x565405cffc80] Header missing
Error while decoding stream #0:0: Invalid data found when processing input
[mp3float @ 0x565405cffc80] Header missing
Error while decoding stream #0:0: Invalid data found when processing input
Unable to extract all .wav files, exiting...
 >>>> EXTRACTING FEATURES FOR SAD <<<< 
Usage: split_scp.pl [--utt2spk=<utt2spk_file>] in.scp out1.scp out2.scp ... 
 or: split_scp.pl -j num-jobs job-id [--utt2spk=<utt2spk_file>] in.scp [out.scp]
 ... where 0 <= job-id < num-jobs. at ./split_scp.pl line 64.
cat: results//features/feats.scp: No such file or directory
 >>>> GENERATING SAD LABELS <<<< 
cat: results//features/feats.scp: No such file or directory
generate_gender_pred.sh: line 142: 1 % 0: division by 0 (error token is "0")
 >>>> EXTRACTING VGGISH EMBEDDINGS <<<< 
WARNING:tensorflow:From /home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py:1624: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
WARNING:tensorflow:From /home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

Traceback (most recent call last):
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/audioset_scripts/vggish_model.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
	 [[{{node vggish_load_pretrained/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/extract_vggish_feats.py", line 139, in <module>
    tf.app.run()
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/extract_vggish_feats.py", line 75, in main
    vggish_slim.load_vggish_slim_checkpoint(sess, checkpoint)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/audioset_scripts/vggish_slim.py", line 128, in load_vggish_slim_checkpoint
    saver.restore(session, checkpoint_path)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1276, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/audioset_scripts/vggish_model.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
	 [[node vggish_load_pretrained/RestoreV2 (defined at /home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/audioset_scripts/vggish_slim.py:127) ]]

Caused by op 'vggish_load_pretrained/RestoreV2', defined at:
  File "/home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/extract_vggish_feats.py", line 139, in <module>
    tf.app.run()
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/extract_vggish_feats.py", line 75, in main
    vggish_slim.load_vggish_slim_checkpoint(sess, checkpoint)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/audioset_scripts/vggish_slim.py", line 127, in load_vggish_slim_checkpoint
    saver = tf.train.Saver(vggish_vars, name='vggish_load_pretrained')
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 832, in __init__
    self.build()
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 844, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 881, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 513, in _build_internal
    restore_sequentially, reshape)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 332, in _AddRestoreOps
    restore_sequentially)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 580, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1572, in restore_v2
    name=name)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/xxx/sources/priavte/mica-gender-from-audio/.venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

DataLossError (see above for traceback): Unable to open table file /home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/audioset_scripts/vggish_model.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
	 [[node vggish_load_pretrained/RestoreV2 (defined at /home/xxx/sources/priavte/mica-gender-from-audio/python_scripts/audioset_scripts/vggish_slim.py:127) ]]

 >>>> PREDICTING GENDER SEGMENTS <<<< 
 >>>> GENDER SEGMENTS PER-MOVIE CAN BE FOUND IN results//GENDER/timestamps <<<<

There is no such command as "spk-seg".

I am trying to run ./generate_gender_pred.sh but it's not executing properly as there is no such command or file for speaker segmentation. Please help me out on this. Thanks.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

usc-sail / mica-gender-from-audio Goto Github PK

mica-gender-from-audio's Introduction

mica-gender-from-audio

Usage:

Example config file:

Dependencies :

mica-gender-from-audio's People

Contributors

Stargazers

Watchers

Forkers

mica-gender-from-audio's Issues

About the dataset

Configurable wrapper script

Check if gender prediction can be run without speaker segmentation

How to use this project using mp3 files of spoken dialogs

There is no such command as "spk-seg".

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent