seth814 / audio-classification Goto Github PK

View Code? Open in Web Editor NEW

512.0 17.0 178.0 100.04 MB

Code for YouTube series: Deep Learning for Audio Classification

License: MIT License

Python 0.90% Jupyter Notebook 99.10%

kapre youtube audio-classification tensorflow2 keras

audio-classification's Introduction

Audio-Classification (Kapre Version)

Pipeline for prototyping audio classification algorithms with TF 2.3

YouTube
Environment
Jupyter Notebooks
Audio Preprocessing
Training
Plot History
Confusion Matrix
Receiver Operating Characteristic
Kapre

YouTube

This series has been re-worked. There are new videos to support this repository. It is recommended to follow the new series.

https://www.youtube.com/playlist?list=PLhA3b2k8R3t0SYW_MhWkWS5fWg-BlYqWn

If you want to follow the old videos, restore to a previous commit.

git checkout 404f2a6f989cec3421e8217d71ef070f3593a84d

Environment

conda create -n audio python=3.7
activate audio
pip install -r requirements.txt

Jupyter Notebooks

Assuming you have ipykernel installed from your conda environment

ipython kernel install --user --name=audio

conda activate audio

jupyter-notebook

Audio Preprocessing

clean.py can be used to preview the signal envelope at a threshold to remove low magnitude data

When you uncomment split_wavs, a clean directory will be created with downsampled mono audio split by delta time

python clean.py

Training

Change model_type to: conv1d, conv2d, lstm

Sample rate and delta time should be the same from clean.py

python train.py

Plot History

Assuming you have ran all 3 models and saved the images into logs, check notebooks/Plot History.ipynb

notebooks/Confusion Matrix and ROC.ipynb

Confusion Matrix

Receiver Operating Characteristic

Kapre

For computation of audio transforms from time to frequency domain on the fly

https://github.com/keunwoochoi/kapre
https://arxiv.org/pdf/1706.05781.pdf

audio-classification's People

Contributors

Stargazers

Watchers

Forkers

mrmthornton akhilesh97 covarj bharat0to amka96 teora ambika55 luis-ramirez-r rmangino jiangbin713 cyhe50 manjitha-teshara kriyeng frizzid07 m-kaminska hakanaku1234 fastaro shubhamgoel90 robrown97 atlantis13 byxhy lbalido flyzero1114 beardedbioelectronics mikful tiravata nindidooo tripleorange swirkes briandannenmueller abdullahalnutayfat lin18846164924 exciteddeimos ufolei rahulkumar1112 themidwestcanapps wuweitao needs-searcher thejawker msgreat srosen3 wirelesswizard gotbutchi rohanbanerjee thegodparticle nickbetke umair13adil iamsaransh gpdsec maoxin7676 cyyeh martindisley pragunmangla3 airhorizons markusbuchholz yoojinhwang ridwan689 melinghu mxe191 swissbeats93 shiva2410 kevinnazhar krakenkrak ashishpatel26 charithcherry mryelameli lixianyi jwatq aamitabhforks nkgevorgyan atrayeeneog godisloveforme bigsoftcms ugemassolo oriankeith001 mulkiah musaho vanova allanpichardo davidwdw maybeee18 woogonchung jay-chan farzee155 sabrinat012899 u7karshs simely-simnz flitx murattkilinc shivam-131 amrintdv 4rthurmonteiro reekithak christianfares1 andrewhpump srinivasgutta7 srovolis yychen-code chionh ch2ohch2oh

audio-classification's Issues

train.py not accepting sample rates other than 16000 | ValueError: could not broadcast input array from shape (1,32000) into shape (1,16000)

After fixing/bypassing issue #14, python train.py seems to dislike sample rates other than 16000:

Traceback (most recent call last):
  File "train.py", line 114, in <module>
    train(args)
  File "train.py", line 96, in train
    callbacks=[csv_logger, cp])
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit
    use_multiprocessing=use_multiprocessing)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 235, in fit
    use_multiprocessing=use_multiprocessing)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 593, in _process_training_inputs
    use_multiprocessing=use_multiprocessing)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 706, in _process_inputs
    use_multiprocessing=use_multiprocessing)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 952, in __init__
    **kwargs)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 747, in __init__
    peek, x = self._peek_and_restore(x)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 956, in _peek_and_restore
    return x[0], x
  File "train.py", line 46, in __getitem__
    X[i,] = wav.reshape(1, -1)
ValueError: could not broadcast input array from shape (1,32000) into shape (1,16000)

Power Decibel

Why are you using power_melgram as 1.0 instead of 2.0 which is the usual parameter for Mel Spectrogram. And What do you think about Delta? Kapre has no module to add Deltas but is it a good idea to stack deltas?

mean() instead of max() ?

def envelope(y, rate, threshold):
    mask = []
    y = pd.Series(y).apply(np.abs)
    y_mean = y.rolling(window=int(rate/20),
                       min_periods=1,
                       center=True).max()

In envelope function instead of max() It should be mean right? as explained in the youtube video.
Or it some change can you please clarify that.

ImportError: cannot import name 'Melspectrogram'

In your notebook directory you have few jupyter notebook with following import statement

from kapre.time_frequency import Melspectrogram, Spectrogram

it give error
ImportError: cannot import name 'Melspectrogram'

I sincerely ask the author so what version of KAPRE is it right for us to download

tensorflow.python.framework.errors_impl.NotFoundError: logs/conv1d_history.csv; No such file or directory

good morning, i am trying to run the train script and i get this not found error, how can i correct its, i am using my own separate data, and want the model to classify 2 classes. how can i solve this:

File "training.py", line 120, in
train(args)
File "training.py", line 103, in train
callbacks=[csv_logger, cp])
File "/home/marco/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1230, in fit
callbacks.on_epoch_end(epoch, epoch_logs)
File "/home/marco/.local/lib/python3.6/site-packages/keras/callbacks.py", line 413, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/home/marco/.local/lib/python3.6/site-packages/keras/callbacks.py", line 2775, in on_epoch_end
self.writer.writeheader()
File "/usr/lib/python3.6/csv.py", line 144, in writeheader
self.writerow(header)
File "/usr/lib/python3.6/csv.py", line 155, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/home/marco/.local/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 103, in write
self._prewrite_check()
File "/home/marco/.local/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 89, in _prewrite_check
compat.path_to_bytes(self.__name), compat.as_bytes(self.__mode))
tensorflow.python.framework.errors_impl.NotFoundError: logs/conv1d_history.csv; No such file or directory

Can you update the code with kapre3.0?

def Conv1D(N_CLASSES=6, SR=16000, DT=0.5):
i = layers.Input(shape=(1, int(SR*DT)), name='input')
x = Melspectrogram(n_dft=512, n_hop=160,
padding='same', sr=SR, n_mels=128,
fmin=0.0, fmax=SR/2, power_melgram=2.0,
return_decibel_melgram=True, trainable_fb=False,
trainable_kernel=False,
name='melbands')(i)

tqdm evaluation number - What does the '72' stand for?

Not really an issue, but...

I'm running your tutorial. Could you tell me what this number represents: "72" (see picture)?
It's from the train.py using your data. I couldn't figure out where it comes from and how to modify it.

Clarification on TimeDistributed

Hi,

I have a question about the TimeDistributed wrapper - I think I understand how it works, but I'm not sure. I would really appreciate if someone could check my understanding. The first 'real' layer of the 1D Conv model:

x = TimeDistributed(layers.Conv1D(8, kernel_size=(4), activation='tanh'), name='td_conv_1d_tanh')(x)

with input 100, 128, 1 yielding 40 trainable parameters and an output of 100, 125, 8.

This makes sense to me as applying a 1D conv along the 128-shape frequency/melbin axis, for each of the 100 time bins, and learning filters weights/biases as it goes. I'm guessing the step size for the convolution is 1, which is why we get 125 on the output. For the LSTM:

s = TimeDistributed(layers.Dense(64, activation='tanh'), name='td_dense_tanh')(x)

with input 100, 128 yielding 8256 trainable parameters and output 100, 64.

We have 64 nodes fully connected to the 128 frequencies, so we wind up with 128 * 64 + 64 = 8256 parameters. Then these are trained over each of the 100 time bins. Does that sound right?

Thanks a lot to Seth for the code and videos, they are very helpful.

jypyter run !python clean.py only audio file not found for sub string: 3a3d0279 is displayed #84

Version Mismatching

Just went through your video and got this error when pip installing

I assume this is caused by Tensorflow version?

BTW, Thanks for the tutorials

Can we detect instruments list from a song?

Hi,all:

it's supported split instrument from a mixed sound?

input: WAV song

output:
WAV of instrument 1:
WAV of instrument 2:

or just detect instrument tag:

instruments: ["Piano","Guitar","Double Bass"]

Missing Files Related to YouTube Tutorials Part by Part

Hi,
I See that you are updating the repo., grate

ful for your efforts.
I'm not able to find docx and instruments.csv file related pt.2 video.
Could you please try to arrange the files part by part as mentioned in the youtube.
ref:

Thanks

Error: Unknown Format: 3

Good day again Seth,

I have uploaded my own data wavfiles and when I try to clean the wave files using the clean.py code I get this error; “Error: unknown format: 3.” Is there a solution which you could provide so I can execute the cleaning process without this recurring error?

Looking forward to your feedback.

jypyter run !python clean.py is None

on pip install -r requirements command, errors appears!?

Hello,
I am trying your code, however I am facing an error, when running "pip install -r requirements" command it show an error in object-detection package as not found!

Input signal length=1 is too small to resample from 44100->16000

Hi !
As i am runing through the original code i faced the issue below,
what could be the possible solution?
Thanks and looking forward for your response.
Reloaded modules: clean
0%| | 0/300 [00:00<?, ?it/s]
Traceback (most recent call last):

File "C:\Users\CYY\Desktop\AudioClassification\4.predict.py", line 77, in
make_prediction(args)

File "C:\Users\CYY\Desktop\AudioClassification\4.predict.py", line 35, in make_prediction
rate, wav = downsample_mono(wav_fn, args.sr)

File "C:\Users\CYY\Desktop\AudioClassification\clean.py", line 36, in downsample_mono
wav = resample(wav, rate, sr)

File "C:\Users\CYY\Anaconda3\lib\site-packages\librosa\core\audio.py", line 584, in resample
y_hat = resampy.resample(y, orig_sr, target_sr, filter=res_type, axis=-1)

File "C:\Users\CYY\Anaconda3\lib\site-packages\resampy\core.py", line 97, in resample
raise ValueError('Input signal length={} is too small to '

ValueError: Input signal length=1 is too small to resample from 44100->16000

Python and Wav files - error 65534

Hi Seth,

This is a Python issue but...I have a large audio collection that I am processing using your codes. For some of my files, I get this:
wave.Error: unknown format: 65534

I opened one of the files in Audicity, saved it as a 16-bit PCM and ran your codes again. It worked!
I have thousands of files so it's not practical to modify each of them. Is there anything that could be done directly from your codes?

Thanks

TypeError: Value passed to parameter 'input' has DataType int16 not in list of allowed values: float16, bfloat16, float32, float64

I got error on this line -
y_pred = model.predict(X_batch)
please check and help me.
all files are available here :-
https://drive.google.com/drive/folders/1hRpOLXArm0gP9QPB9RuYgeWNFQerkbuP?usp=sharing

Not able to run predict.py on IRMAS: a dataset for instrument recognition in musical audio signals

I ran the previous version of predict.py that you are using in the video as the latest version was not showing output on the terminal. I have two problems.

When I run the prediction on the IRMAS: a dataset for instrument recognition in musical audio signals, I get an error that I am not able to understand.

My folder structure was as follows
and when I copied the only Acoustic guitar folder from wavfiles directory to a new directory wavfiles2 and then ran the predict.py I got an incomplete output

NotADirectoryError when splitting wavfiles with split_wavs(args)

Hi, thanks for the material and the videos.
I am trying to run the clean.py script with my own audio files, but I get this error:

Traceback (most recent call last):
  File "clean.py", line 129, in <module>
    split_wavs(args)
  File "clean.py", line 66, in split_wavs
    for fn in tqdm(os.listdir(src_dir)):
NotADirectoryError: [Errno 20] Not a directory: 'wavfiles/.DS_Store'

I tried with the directory you provided, but after it iterates for 3 sub-folders I still get the same error:

100%|███████████████| 30/30 [00:05<00:00,  5.31it/s]
100%|███████████████| 30/30 [00:04<00:00,  6.84it/s]
100%|███████████████| 30/30 [00:06<00:00,  4.75it/s]
Traceback (most recent call last):
  File "clean.py", line 129, in <module>
    split_wavs(args)
  File "clean.py", line 66, in split_wavs
    for fn in tqdm(os.listdir(src_dir)):
NotADirectoryError: [Errno 20] Not a directory: 'wavfiles/.DS_Store'

Audio buffer is not Fortran-contiguous

rate, wav = downsample_mono(wav_path[0], args.sr)

wav = resample(wav.astype(np.float32), rate, sr)

raise ParameterError('Audio buffer is not Fortran-contiguous. '
librosa.util.exceptions.ParameterError: Audio buffer is not Fortran-contiguous. Use numpy.asfortranarray to ensure Fortran contiguity.

Leave one out

I tried to implement leave one out cross-validation for your code but have not had success yet. I would appreciate it if you could have any suggestions.
The main reason behind using leave one out cross-validation is that your code is working with a dependent data set, but how about an independent dataset?

clean.py being picky about WAV bit depth | ValueError: Unsupported bit depth: the wav file has 24-bit data

I was attempting to use my sample and multitrack collections in order to train a model geared more towards electronic music by following your youtube posts from February. I made sure that I all my files were in WAV format (as suggested in the video) but scipy seems to be unable to process "high" bit depths.
What would be the best way to convert my custom WAV files in in "wavfiles" folder to the appropriate bit depth (and/or sample rate)?

Output of python clean.py:

  0%|                                                                                       | 0/4 [00:00<?, ?it/s]
clean.py:27: WavFileWarning: Chunk (non-data) not understood, skipping it.
  rate, wav = wavfile.read(path)
  0%|                                                                                       | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "clean.py", line 128, in <module>
    split_wavs(args)
  File "clean.py", line 68, in split_wavs
    rate, wav = downsample_mono(src_fn, args.sr)
  File "clean.py", line 27, in downsample_mono
    rate, wav = wavfile.read(path)
  File "/home/myusername/anaconda3/lib/python3.7/site-packages/scipy/io/wavfile.py", line 298, in read
    "has {}-bit data.".format(bit_depth))
ValueError: Unsupported bit depth: the wav file has 24-bit data.

IndexError: index 63 is out of bounds for axis 0 with size 57

Hello seth as you said i need more files for this kind of network so now i have more than 30 files in one class and i have two classes but it gave me

IndexError: index 63 is out of bounds for axis 0 with size 57
this error so please help me here you can find all the regarding files of project :-
https://drive.google.com/drive/folders/1DretGkD66hQ0zV2Fwo8YzWwh0woNFS8X?usp=sharing

What is the source of our dataset?

I am just curious what is the origin of our dataset? It is collected from Youtube Video or somewhere other?

downsample_mono method results in ValueError and y variable from Pandas Series results in Exception

When I run the code as is, I first get an error that seems like it has something to do with the data not liking being downsampled to 16000:

"ValueError: Input signal length=1 is too small to resample from 44100->16000"

After commenting out line 36, I then get an error in line 15:

Exception: Data must be 1-dimensional

I fixed this by reshaping the y variable using this code:
y = pd.Series(np.reshape(y, (len(y),))).apply(np.abs)

I know someone else got their version of the repo to work without making any changes and that was on a Mac. Perhaps since I'm on Windows, there is some inconsistency?

ValueError: The innermost dimension of input_shape must be defined, but saw: (None, None)

The kapre package has updated and some of notebooks cannot work anymore

Please update them.Thanks a looooot!

No module named 'numba.decorators'

seems like numba removed the decorators module with version 0.50. Hotfix is to pip install numba==0.48

index_error

FutureWarning: norm=1 behavior will change in librosa 0.8.0. To maintain forward compatibility, use norm='slaney' instead

labels = [self.labels[k] for k in indexes]
IndexError: index 7 is out of bounds for axis 0 with size 3

Deploying the Model

I was wondering as to how can I deploy this model to actually see it in action on Azure.

Using from google colab notebook

Good day Seth,

I have been going over your work and as I am about to import the libraries, the code throws an import error, "cannot import name 'downsample_mono'." I am not sure which version of the 'clean' library did you import but could you kindly assist me in solving this error?
I am using Google Colab so I could utilize the GPU.

Thank you for supplying us with such, we are really grateful for it. Thank you for your time.

train.py: IndexError: index 9 is out of bounds for axis 1 with size 9

Hi Seth

I'm working on an amended clean.py - one that will support other audio file types.

I'm testing it with your wavfiles folder as a benchmark with the hope that I can reproduce the score I got before any changes. I've noticed it creates a few more files (differences in threshold for trimming i think), however, they are all still one second long and the file structure is identical (ie 10 classes named exactly as yours are).

running train.py returns the following:

IndexError: index 9 is out of bounds for axis 1 with size 9

I think the problem is with wav_train, wav_val, label_train, label_val = train_test_split(wav_paths, labels, test_size=0.1, random_state=0). To look into it I added:

print(len(set(label_train)))
10
print(len(set(label_val)))
9

if I set test_size=0.5 then I get:

print(len(set(label_train)))
10
print(len(set(label_val)))
10

Training then runs without error but the accuracy is about 10%

I'm really not sure what's going on here. I've listened to the audio files and looked at the spectrograms of files created by my clean.py and they seem very normal. They can be found here, I'd be v grateful if you could try and reproduce this.

use a.any or a.all error

Rec_048.mp4

Overlapping window ??

You have taken a window frame as 1/10 of a sec if I am not wrong I also want to know what is the overlapping Window you have taken ?? because nowhere in your tutorials you have mentioned about overlapping window.

valueError

X[i,] = wav.reshape(1, -1)
ValueError: could not broadcast input array from shape (1,3080192) into shape (1,16000)

please find files in attached link :- https://drive.google.com/drive/folders/1hRpOLXArm0gP9QPB9RuYgeWNFQerkbuP?usp=sharing

Model Predict Error

Hi Seth,

wonderful and really useful project and videos! I love it.

I am working on predicting on the original files as you finish with in the video however, when I do so, I get the following error:
"local variable 'batch_outputs' referenced before assignment"

After some quick research, this is due to the batch actually being an empty array and the error shows here:
X_batch = np.array(batch, dtype=np.float32)
y_pred = model.predict(X_batch)

since X_batch is made off of the batch array and when Y is trying to predict it throws the error.

I tried with my original data that trained just fine and I also tried this with a singular file and got the same issue.

Any ideas or is this on my end?

Thank you!

Clean.py memory usage

Hi Seth,

In the clean.py, the memory usage (RAM) keeps going up. So, if a large dataset with many classes is being studied, eventually, the computer goes out of memory. I get this message from time to time:

MemoryError: Unable to allocate 1.22 MiB for an array with shape (160399,) and data type int64

This can easily be dealt with by increasing the page.sys file memory size, increasing computer spec, or applying clean.py one subset of the data at a time but I was wondering if it would not be better to dump 'garbage' between two classes. In your YT video, that would mean dumping memory between Cello and Clarinet for example.

Thanks

seth814 / audio-classification Goto Github PK

audio-classification's Introduction

Audio-Classification (Kapre Version)

YouTube

Environment

Jupyter Notebooks

Audio Preprocessing

Training

Plot History

Confusion Matrix

Receiver Operating Characteristic

Kapre

audio-classification's People

Contributors

Stargazers

Watchers

Forkers

audio-classification's Issues

Recommend Projects

Recommend Topics

Recommend Org