jrgillick / laughter-detection Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi Jon Gillick,
I tried you code with some podcast audio files and it works excellent. There are some false positive outputs too, where I do see music ( BGM ) also are identified as laughter.
Is there anyway, to detect music and remove them? Do you have any model file for music detection?
I am also interested in the Switchboard files and if possible logic to build my own model using laughter samples I have.
Thank you very much in advance.
SSV
Ran into a couple of issues trying to run the laughter detector:
The latest version of librosa 0.6.1 doesn't run with the latest version of joblib 0.12.0. Had to rollback to joblib 0.11.0 for librosa to work.
laugh_segmenter.py is dependent on python_speech_features library.
And two problems I haven't been able to solve:
UserWarning: Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer.
warnings.warn('Error in loading the saved optimizer '
Traceback (most recent call last):
File "segment_laughter.py", line 50, in
laughs = laugh_segmenter.segment_laughs(input_path,model_path,output_path,threshold,min_length)
File "/Users/maneesh/development/testCode/jnotebooks/laughter-detection/laugh_segmenter.py", line 112, in segment_laughs
librosa.output.write_wav(wav_path, (laughs * maxv).astype(np.int16), full_res_sr)
File "/Users/maneesh/anaconda2/envs/tensorflow/lib/python2.7/site-packages/librosa/output.py", line 223, in write_wav
util.valid_audio(y, mono=False)
File "/Users/maneesh/anaconda2/envs/tensorflow/lib/python2.7/site-packages/librosa/util/utils.py", line 159, in valid_audio
raise ParameterError('data must be floating-point')
librosa.util.exceptions.ParameterError: data must be floating-point
Hey there,
I'm curious what version of tensorflow you're running alongside your keras dependency, as when I follow your requirements I receive the following error when trying to run the segment_laughter.py file:
Traceback (most recent call last):
File "segment_laughter.py", line 3, in
import laugh_segmenter
File "/home/mark/work/work/voice/laughterdetection/laugh_segmenter.py", line 2, in
config = tf.ConfigProto()
AttributeError: module 'tensorflow' has no attribute 'ConfigProto'
298d8c2 renamed save_cuts but segment_laughter.py still uses save_cuts, causing a NameError
Can you provide a detailed process of training? I don't know where to start the preprocessing script right now. Can you provide detailed steps, such as step 1, step 2, step 3
Hello @jrgillick ,
Thanks for this work. I have a doubt regarding using this code for laughter detection in datasets which are sampled at 16 kHz. Where all do we need to change the code in that case?
Also, do you expect any drop in performance with this change?
Thanks,
Soumya
Hi, thanks a lot for the contribution and the repository.
I have two questions about the audioset annotations (calling that 999-element set the "audioset-laughter" set hereon):
There are some weird annotations like start=end=0 (examples are on lines 7, 29, 80, 88, 95, 102, ... there are more). Is that a special annotation (e.g., does that mean the whole file contains a laugh etc.)? I don't understand what a zero-length laugh segment means
does "window_start" correspond to the start time instant in the youtube video for the recorded audio snippet?
"audio_length" and "window_length" seem to be equal at all times, I'm guessing that's the length (in seconds) of the recorded audio snippet I described above, is that correct?
I think this script downloads mp3 audio files for youtube videos that are specified on a csv. Some csv files can be downloaded using this script, but it seems like none of the csv files there correspond to the clips in the audioset-laughter annotations (950 of the IDs on the "unbalanced_train_segments.csv" and 38 of the IDs on the "eval_segments.csv" match with the audioset-laughter IDs, but this even is not a full list). Is there a csv that can be fed to the download script to download just the audioset-laughter audio files?
When running from CMD
At the moment, segment_laughter.py
cuts out the detected laughter sequences and prints out a list of time locations of these in the original file. It would be useful if in addition it could produce a time-aligned annotation file in some commonly used format.
In phonetics, which is my field, one widely used annotation tool is Praat, which saves its annotations in TextGrid files. I could provide a pull request adding such a functionality.
I have a question about the training data, because you mentioned in section 3.4 of the paper that the training Audioset dataset does not mark the start and end times of laughter, but the Audioset test dataset does mark the start and end times , so how did you calculate the accuracy when testing?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.