Giter Club home page Giter Club logo

audio-classification's Introduction

Audio-Classification (Kapre Version)

Pipeline for prototyping audio classification algorithms with TF 2.3

melspectrogram

YouTube

This series has been re-worked. There are new videos to support this repository. It is recommended to follow the new series.

https://www.youtube.com/playlist?list=PLhA3b2k8R3t0SYW_MhWkWS5fWg-BlYqWn

If you want to follow the old videos, restore to a previous commit.

git checkout 404f2a6f989cec3421e8217d71ef070f3593a84d

Environment

conda create -n audio python=3.7
activate audio
pip install -r requirements.txt

Jupyter Notebooks

Assuming you have ipykernel installed from your conda environment

ipython kernel install --user --name=audio

conda activate audio

jupyter-notebook

Audio Preprocessing

clean.py can be used to preview the signal envelope at a threshold to remove low magnitude data

When you uncomment split_wavs, a clean directory will be created with downsampled mono audio split by delta time

python clean.py

signal envelope

Training

Change model_type to: conv1d, conv2d, lstm

Sample rate and delta time should be the same from clean.py

python train.py

Plot History

Assuming you have ran all 3 models and saved the images into logs, check notebooks/Plot History.ipynb

history

notebooks/Confusion Matrix and ROC.ipynb

Confusion Matrix

conf_mat

Receiver Operating Characteristic

roc

Kapre

For computation of audio transforms from time to frequency domain on the fly

https://github.com/keunwoochoi/kapre
https://arxiv.org/pdf/1706.05781.pdf

audio-classification's People

Contributors

dependabot[bot] avatar seth814 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

audio-classification's Issues

train.py not accepting sample rates other than 16000 | ValueError: could not broadcast input array from shape (1,32000) into shape (1,16000)

After fixing/bypassing issue #14, python train.py seems to dislike sample rates other than 16000:

Traceback (most recent call last):
  File "train.py", line 114, in <module>
    train(args)
  File "train.py", line 96, in train
    callbacks=[csv_logger, cp])
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit
    use_multiprocessing=use_multiprocessing)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 235, in fit
    use_multiprocessing=use_multiprocessing)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 593, in _process_training_inputs
    use_multiprocessing=use_multiprocessing)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 706, in _process_inputs
    use_multiprocessing=use_multiprocessing)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 952, in __init__
    **kwargs)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 747, in __init__
    peek, x = self._peek_and_restore(x)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 956, in _peek_and_restore
    return x[0], x
  File "train.py", line 46, in __getitem__
    X[i,] = wav.reshape(1, -1)
ValueError: could not broadcast input array from shape (1,32000) into shape (1,16000)

Power Decibel

Why are you using power_melgram as 1.0 instead of 2.0 which is the usual parameter for Mel Spectrogram. And What do you think about Delta? Kapre has no module to add Deltas but is it a good idea to stack deltas?

mean() instead of max() ?

def envelope(y, rate, threshold):
    mask = []
    y = pd.Series(y).apply(np.abs)
    y_mean = y.rolling(window=int(rate/20),
                       min_periods=1,
                       center=True).max()

In envelope function instead of max() It should be mean right? as explained in the youtube video.
Or it some change can you please clarify that.

ImportError: cannot import name 'Melspectrogram'

In your notebook directory you have few jupyter notebook with following import statement

from kapre.time_frequency import Melspectrogram, Spectrogram

it give error
ImportError: cannot import name 'Melspectrogram'

tensorflow.python.framework.errors_impl.NotFoundError: logs/conv1d_history.csv; No such file or directory

good morning, i am trying to run the train script and i get this not found error, how can i correct its, i am using my own separate data, and want the model to classify 2 classes. how can i solve this:

File "training.py", line 120, in
train(args)
File "training.py", line 103, in train
callbacks=[csv_logger, cp])
File "/home/marco/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1230, in fit
callbacks.on_epoch_end(epoch, epoch_logs)
File "/home/marco/.local/lib/python3.6/site-packages/keras/callbacks.py", line 413, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/home/marco/.local/lib/python3.6/site-packages/keras/callbacks.py", line 2775, in on_epoch_end
self.writer.writeheader()
File "/usr/lib/python3.6/csv.py", line 144, in writeheader
self.writerow(header)
File "/usr/lib/python3.6/csv.py", line 155, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/home/marco/.local/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 103, in write
self._prewrite_check()
File "/home/marco/.local/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 89, in _prewrite_check
compat.path_to_bytes(self.__name), compat.as_bytes(self.__mode))
tensorflow.python.framework.errors_impl.NotFoundError: logs/conv1d_history.csv; No such file or directory

Can you update the code with kapre3.0?

def Conv1D(N_CLASSES=6, SR=16000, DT=0.5):
i = layers.Input(shape=(1, int(SR*DT)), name='input')
x = Melspectrogram(n_dft=512, n_hop=160,
padding='same', sr=SR, n_mels=128,
fmin=0.0, fmax=SR/2, power_melgram=2.0,
return_decibel_melgram=True, trainable_fb=False,
trainable_kernel=False,
name='melbands')(i)

tqdm evaluation number - What does the '72' stand for?

Not really an issue, but...

I'm running your tutorial. Could you tell me what this number represents: "72" (see picture)?
It's from the train.py using your data. I couldn't figure out where it comes from and how to modify it.

image

Clarification on TimeDistributed

Hi,

I have a question about the TimeDistributed wrapper - I think I understand how it works, but I'm not sure. I would really appreciate if someone could check my understanding. The first 'real' layer of the 1D Conv model:

x = TimeDistributed(layers.Conv1D(8, kernel_size=(4), activation='tanh'), name='td_conv_1d_tanh')(x)

with input 100, 128, 1 yielding 40 trainable parameters and an output of 100, 125, 8.

This makes sense to me as applying a 1D conv along the 128-shape frequency/melbin axis, for each of the 100 time bins, and learning filters weights/biases as it goes. I'm guessing the step size for the convolution is 1, which is why we get 125 on the output. For the LSTM:

s = TimeDistributed(layers.Dense(64, activation='tanh'), name='td_dense_tanh')(x)

with input 100, 128 yielding 8256 trainable parameters and output 100, 64.

We have 64 nodes fully connected to the 128 frequencies, so we wind up with 128 * 64 + 64 = 8256 parameters. Then these are trained over each of the 100 time bins. Does that sound right?

Thanks a lot to Seth for the code and videos, they are very helpful.

Version Mismatching

Just went through your video and got this error when pip installing

Screen Shot 2020-03-06 at 2 38 02 PM

I assume this is caused by Tensorflow version?

BTW, Thanks for the tutorials

Can we detect instruments list from a song?

Hi,all:

it's supported split instrument from a mixed sound?

input: WAV song

output:
WAV of instrument 1:
WAV of instrument 2:

or just detect instrument tag:

instruments: ["Piano","Guitar","Double Bass"]

Missing Files Related to YouTube Tutorials Part by Part

Hi,
I See that you are updating the repo., grate

ful for your efforts.
I'm not able to find docx and instruments.csv file related pt.2 video.
Could you please try to arrange the files part by part as mentioned in the youtube.
ref:
Capture

Thanks

Error: Unknown Format: 3

Good day again Seth,

I have uploaded my own data wavfiles and when I try to clean the wave files using the clean.py code I get this error; “Error: unknown format: 3.” Is there a solution which you could provide so I can execute the cleaning process without this recurring error?

Looking forward to your feedback.
91D5F7DE-924F-4861-8A32-8A8F2C482D5F

Input signal length=1 is too small to resample from 44100->16000

Hi !
As i am runing through the original code i faced the issue below,
what could be the possible solution?
Thanks and looking forward for your response.
Reloaded modules: clean
0%| | 0/300 [00:00<?, ?it/s]
Traceback (most recent call last):

File "C:\Users\CYY\Desktop\AudioClassification\4.predict.py", line 77, in
make_prediction(args)

File "C:\Users\CYY\Desktop\AudioClassification\4.predict.py", line 35, in make_prediction
rate, wav = downsample_mono(wav_fn, args.sr)

File "C:\Users\CYY\Desktop\AudioClassification\clean.py", line 36, in downsample_mono
wav = resample(wav, rate, sr)

File "C:\Users\CYY\Anaconda3\lib\site-packages\librosa\core\audio.py", line 584, in resample
y_hat = resampy.resample(y, orig_sr, target_sr, filter=res_type, axis=-1)

File "C:\Users\CYY\Anaconda3\lib\site-packages\resampy\core.py", line 97, in resample
raise ValueError('Input signal length={} is too small to '

ValueError: Input signal length=1 is too small to resample from 44100->16000

Python and Wav files - error 65534

Hi Seth,

This is a Python issue but...I have a large audio collection that I am processing using your codes. For some of my files, I get this:
wave.Error: unknown format: 65534

I opened one of the files in Audicity, saved it as a 16-bit PCM and ran your codes again. It worked!
I have thousands of files so it's not practical to modify each of them. Is there anything that could be done directly from your codes?

Thanks

Not able to run predict.py on IRMAS: a dataset for instrument recognition in musical audio signals

I ran the previous version of predict.py that you are using in the video as the latest version was not showing output on the terminal. I have two problems.

  1. When I run the prediction on the IRMAS: a dataset for instrument recognition in musical audio signals, I get an error that I am not able to understand.
    image
    My folder structure was as follows
    image

  2. and when I copied the only Acoustic guitar folder from wavfiles directory to a new directory wavfiles2 and then ran the predict.py I got an incomplete output

image

NotADirectoryError when splitting wavfiles with split_wavs(args)

Hi, thanks for the material and the videos.
I am trying to run the clean.py script with my own audio files, but I get this error:

Traceback (most recent call last):
  File "clean.py", line 129, in <module>
    split_wavs(args)
  File "clean.py", line 66, in split_wavs
    for fn in tqdm(os.listdir(src_dir)):
NotADirectoryError: [Errno 20] Not a directory: 'wavfiles/.DS_Store'

I tried with the directory you provided, but after it iterates for 3 sub-folders I still get the same error:

100%|███████████████| 30/30 [00:05<00:00,  5.31it/s]
100%|███████████████| 30/30 [00:04<00:00,  6.84it/s]
100%|███████████████| 30/30 [00:06<00:00,  4.75it/s]
Traceback (most recent call last):
  File "clean.py", line 129, in <module>
    split_wavs(args)
  File "clean.py", line 66, in split_wavs
    for fn in tqdm(os.listdir(src_dir)):
NotADirectoryError: [Errno 20] Not a directory: 'wavfiles/.DS_Store'

Audio buffer is not Fortran-contiguous

rate, wav = downsample_mono(wav_path[0], args.sr)

wav = resample(wav.astype(np.float32), rate, sr)

raise ParameterError('Audio buffer is not Fortran-contiguous. '
librosa.util.exceptions.ParameterError: Audio buffer is not Fortran-contiguous. Use numpy.asfortranarray to ensure Fortran contiguity.

Leave one out

I tried to implement leave one out cross-validation for your code but have not had success yet. I would appreciate it if you could have any suggestions.
The main reason behind using leave one out cross-validation is that your code is working with a dependent data set, but how about an independent dataset?

clean.py being picky about WAV bit depth | ValueError: Unsupported bit depth: the wav file has 24-bit data

I was attempting to use my sample and multitrack collections in order to train a model geared more towards electronic music by following your youtube posts from February. I made sure that I all my files were in WAV format (as suggested in the video) but scipy seems to be unable to process "high" bit depths.
What would be the best way to convert my custom WAV files in in "wavfiles" folder to the appropriate bit depth (and/or sample rate)?

Output of python clean.py:

  0%|                                                                                       | 0/4 [00:00<?, ?it/s]
clean.py:27: WavFileWarning: Chunk (non-data) not understood, skipping it.
  rate, wav = wavfile.read(path)
  0%|                                                                                       | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "clean.py", line 128, in <module>
    split_wavs(args)
  File "clean.py", line 68, in split_wavs
    rate, wav = downsample_mono(src_fn, args.sr)
  File "clean.py", line 27, in downsample_mono
    rate, wav = wavfile.read(path)
  File "/home/myusername/anaconda3/lib/python3.7/site-packages/scipy/io/wavfile.py", line 298, in read
    "has {}-bit data.".format(bit_depth))
ValueError: Unsupported bit depth: the wav file has 24-bit data.

downsample_mono method results in ValueError and y variable from Pandas Series results in Exception

When I run the code as is, I first get an error that seems like it has something to do with the data not liking being downsampled to 16000:

"ValueError: Input signal length=1 is too small to resample from 44100->16000"

After commenting out line 36, I then get an error in line 15:

Exception: Data must be 1-dimensional

I fixed this by reshaping the y variable using this code:
y = pd.Series(np.reshape(y, (len(y),))).apply(np.abs)

I know someone else got their version of the repo to work without making any changes and that was on a Mac. Perhaps since I'm on Windows, there is some inconsistency?

index_error

FutureWarning: norm=1 behavior will change in librosa 0.8.0. To maintain forward compatibility, use norm='slaney' instead

labels = [self.labels[k] for k in indexes]
IndexError: index 7 is out of bounds for axis 0 with size 3

Deploying the Model

I was wondering as to how can I deploy this model to actually see it in action on Azure.

Using from google colab notebook

Good day Seth,

I have been going over your work and as I am about to import the libraries, the code throws an import error, "cannot import name 'downsample_mono'." I am not sure which version of the 'clean' library did you import but could you kindly assist me in solving this error?
I am using Google Colab so I could utilize the GPU.

Thank you for supplying us with such, we are really grateful for it. Thank you for your time.
screenshot_20201024_020255

train.py: IndexError: index 9 is out of bounds for axis 1 with size 9

Hi Seth

I'm working on an amended clean.py - one that will support other audio file types.

I'm testing it with your wavfiles folder as a benchmark with the hope that I can reproduce the score I got before any changes. I've noticed it creates a few more files (differences in threshold for trimming i think), however, they are all still one second long and the file structure is identical (ie 10 classes named exactly as yours are).

running train.py returns the following:

IndexError: index 9 is out of bounds for axis 1 with size 9

I think the problem is with wav_train, wav_val, label_train, label_val = train_test_split(wav_paths, labels, test_size=0.1, random_state=0). To look into it I added:

print(len(set(label_train)))
10
print(len(set(label_val)))
9

if I set test_size=0.5 then I get:

print(len(set(label_train)))
10
print(len(set(label_val)))
10

Training then runs without error but the accuracy is about 10%

I'm really not sure what's going on here. I've listened to the audio files and looked at the spectrograms of files created by my clean.py and they seem very normal. They can be found here, I'd be v grateful if you could try and reproduce this.

Overlapping window ??

You have taken a window frame as 1/10 of a sec if I am not wrong I also want to know what is the overlapping Window you have taken ?? because nowhere in your tutorials you have mentioned about overlapping window.

Model Predict Error

Hi Seth,

wonderful and really useful project and videos! I love it.

I am working on predicting on the original files as you finish with in the video however, when I do so, I get the following error:
"local variable 'batch_outputs' referenced before assignment"

After some quick research, this is due to the batch actually being an empty array and the error shows here:
X_batch = np.array(batch, dtype=np.float32)
y_pred = model.predict(X_batch)

since X_batch is made off of the batch array and when Y is trying to predict it throws the error.

I tried with my original data that trained just fine and I also tried this with a singular file and got the same issue.

Any ideas or is this on my end?

Thank you!

Clean.py memory usage

Hi Seth,

In the clean.py, the memory usage (RAM) keeps going up. So, if a large dataset with many classes is being studied, eventually, the computer goes out of memory. I get this message from time to time:

MemoryError: Unable to allocate 1.22 MiB for an array with shape (160399,) and data type int64

This can easily be dealt with by increasing the page.sys file memory size, increasing computer spec, or applying clean.py one subset of the data at a time but I was wondering if it would not be better to dump 'garbage' between two classes. In your YT video, that would mean dumping memory between Cello and Clarinet for example.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.