seth814 / audio-classification Goto Github PK
View Code? Open in Web Editor NEWCode for YouTube series: Deep Learning for Audio Classification
License: MIT License
Code for YouTube series: Deep Learning for Audio Classification
License: MIT License
I got error on this line -
y_pred = model.predict(X_batch)
please check and help me.
all files are available here :-
https://drive.google.com/drive/folders/1hRpOLXArm0gP9QPB9RuYgeWNFQerkbuP?usp=sharing
Hi, thanks for the material and the videos.
I am trying to run the clean.py script with my own audio files, but I get this error:
Traceback (most recent call last):
File "clean.py", line 129, in <module>
split_wavs(args)
File "clean.py", line 66, in split_wavs
for fn in tqdm(os.listdir(src_dir)):
NotADirectoryError: [Errno 20] Not a directory: 'wavfiles/.DS_Store'
I tried with the directory you provided, but after it iterates for 3 sub-folders I still get the same error:
100%|███████████████| 30/30 [00:05<00:00, 5.31it/s]
100%|███████████████| 30/30 [00:04<00:00, 6.84it/s]
100%|███████████████| 30/30 [00:06<00:00, 4.75it/s]
Traceback (most recent call last):
File "clean.py", line 129, in <module>
split_wavs(args)
File "clean.py", line 66, in split_wavs
for fn in tqdm(os.listdir(src_dir)):
NotADirectoryError: [Errno 20] Not a directory: 'wavfiles/.DS_Store'
def Conv1D(N_CLASSES=6, SR=16000, DT=0.5):
i = layers.Input(shape=(1, int(SR*DT)), name='input')
x = Melspectrogram(n_dft=512, n_hop=160,
padding='same', sr=SR, n_mels=128,
fmin=0.0, fmax=SR/2, power_melgram=2.0,
return_decibel_melgram=True, trainable_fb=False,
trainable_kernel=False,
name='melbands')(i)
You have taken a window frame as 1/10 of a sec if I am not wrong I also want to know what is the overlapping Window you have taken ?? because nowhere in your tutorials you have mentioned about overlapping window.
Hi Seth,
This is a Python issue but...I have a large audio collection that I am processing using your codes. For some of my files, I get this:
wave.Error: unknown format: 65534
I opened one of the files in Audicity, saved it as a 16-bit PCM and ran your codes again. It worked!
I have thousands of files so it's not practical to modify each of them. Is there anything that could be done directly from your codes?
Thanks
Good day Seth,
I have been going over your work and as I am about to import the libraries, the code throws an import error, "cannot import name 'downsample_mono'." I am not sure which version of the 'clean' library did you import but could you kindly assist me in solving this error?
I am using Google Colab so I could utilize the GPU.
Thank you for supplying us with such, we are really grateful for it. Thank you for your time.
I tried to implement leave one out cross-validation for your code but have not had success yet. I would appreciate it if you could have any suggestions.
The main reason behind using leave one out cross-validation is that your code is working with a dependent data set, but how about an independent dataset?
Hi Seth,
In the clean.py, the memory usage (RAM) keeps going up. So, if a large dataset with many classes is being studied, eventually, the computer goes out of memory. I get this message from time to time:
MemoryError: Unable to allocate 1.22 MiB for an array with shape (160399,) and data type int64
This can easily be dealt with by increasing the page.sys file memory size, increasing computer spec, or applying clean.py one subset of the data at a time but I was wondering if it would not be better to dump 'garbage' between two classes. In your YT video, that would mean dumping memory between Cello and Clarinet for example.
Thanks
Good day again Seth,
I have uploaded my own data wavfiles and when I try to clean the wave files using the clean.py code I get this error; “Error: unknown format: 3.” Is there a solution which you could provide so I can execute the cleaning process without this recurring error?
Hi !
As i am runing through the original code i faced the issue below,
what could be the possible solution?
Thanks and looking forward for your response.
Reloaded modules: clean
0%| | 0/300 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\CYY\Desktop\AudioClassification\4.predict.py", line 77, in
make_prediction(args)
File "C:\Users\CYY\Desktop\AudioClassification\4.predict.py", line 35, in make_prediction
rate, wav = downsample_mono(wav_fn, args.sr)
File "C:\Users\CYY\Desktop\AudioClassification\clean.py", line 36, in downsample_mono
wav = resample(wav, rate, sr)
File "C:\Users\CYY\Anaconda3\lib\site-packages\librosa\core\audio.py", line 584, in resample
y_hat = resampy.resample(y, orig_sr, target_sr, filter=res_type, axis=-1)
File "C:\Users\CYY\Anaconda3\lib\site-packages\resampy\core.py", line 97, in resample
raise ValueError('Input signal length={} is too small to '
ValueError: Input signal length=1 is too small to resample from 44100->16000
def envelope(y, rate, threshold):
mask = []
y = pd.Series(y).apply(np.abs)
y_mean = y.rolling(window=int(rate/20),
min_periods=1,
center=True).max()
In envelope function instead of max() It should be mean right? as explained in the youtube video.
Or it some change can you please clarify that.
When I run the code as is, I first get an error that seems like it has something to do with the data not liking being downsampled to 16000:
"ValueError: Input signal length=1 is too small to resample from 44100->16000"
After commenting out line 36, I then get an error in line 15:
Exception: Data must be 1-dimensional
I fixed this by reshaping the y variable using this code:
y = pd.Series(np.reshape(y, (len(y),))).apply(np.abs)
I know someone else got their version of the repo to work without making any changes and that was on a Mac. Perhaps since I'm on Windows, there is some inconsistency?
Hi Seth
I'm working on an amended clean.py
- one that will support other audio file types.
I'm testing it with your wavfiles
folder as a benchmark with the hope that I can reproduce the score I got before any changes. I've noticed it creates a few more files (differences in threshold for trimming i think), however, they are all still one second long and the file structure is identical (ie 10 classes named exactly as yours are).
running train.py
returns the following:
IndexError: index 9 is out of bounds for axis 1 with size 9
I think the problem is with wav_train, wav_val, label_train, label_val = train_test_split(wav_paths, labels, test_size=0.1, random_state=0)
. To look into it I added:
print(len(set(label_train)))
10
print(len(set(label_val)))
9
if I set test_size=0.5
then I get:
print(len(set(label_train)))
10
print(len(set(label_val)))
10
Training then runs without error but the accuracy is about 10%
I'm really not sure what's going on here. I've listened to the audio files and looked at the spectrograms of files created by my clean.py
and they seem very normal. They can be found here, I'd be v grateful if you could try and reproduce this.
Hello seth as you said i need more files for this kind of network so now i have more than 30 files in one class and i have two classes but it gave me
IndexError: index 63 is out of bounds for axis 0 with size 57
this error so please help me here you can find all the regarding files of project :-
https://drive.google.com/drive/folders/1DretGkD66hQ0zV2Fwo8YzWwh0woNFS8X?usp=sharing
In your notebook directory you have few jupyter notebook with following import statement
from kapre.time_frequency import Melspectrogram, Spectrogram
it give error
ImportError: cannot import name 'Melspectrogram'
good morning, i am trying to run the train script and i get this not found error, how can i correct its, i am using my own separate data, and want the model to classify 2 classes. how can i solve this:
File "training.py", line 120, in
train(args)
File "training.py", line 103, in train
callbacks=[csv_logger, cp])
File "/home/marco/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1230, in fit
callbacks.on_epoch_end(epoch, epoch_logs)
File "/home/marco/.local/lib/python3.6/site-packages/keras/callbacks.py", line 413, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/home/marco/.local/lib/python3.6/site-packages/keras/callbacks.py", line 2775, in on_epoch_end
self.writer.writeheader()
File "/usr/lib/python3.6/csv.py", line 144, in writeheader
self.writerow(header)
File "/usr/lib/python3.6/csv.py", line 155, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/home/marco/.local/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 103, in write
self._prewrite_check()
File "/home/marco/.local/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 89, in _prewrite_check
compat.path_to_bytes(self.__name), compat.as_bytes(self.__mode))
tensorflow.python.framework.errors_impl.NotFoundError: logs/conv1d_history.csv; No such file or directory
Hello,
I am trying your code, however I am facing an error, when running "pip install -r requirements" command it show an error in object-detection package as not found!
I ran the previous version of predict.py that you are using in the video as the latest version was not showing output on the terminal. I have two problems.
When I run the prediction on the IRMAS: a dataset for instrument recognition in musical audio signals, I get an error that I am not able to understand.
My folder structure was as follows
and when I copied the only Acoustic guitar folder from wavfiles directory to a new directory wavfiles2 and then ran the predict.py I got an incomplete output
X[i,] = wav.reshape(1, -1)
ValueError: could not broadcast input array from shape (1,3080192) into shape (1,16000)
please find files in attached link :- https://drive.google.com/drive/folders/1hRpOLXArm0gP9QPB9RuYgeWNFQerkbuP?usp=sharing
Hi,
I have a question about the TimeDistributed wrapper - I think I understand how it works, but I'm not sure. I would really appreciate if someone could check my understanding. The first 'real' layer of the 1D Conv model:
x = TimeDistributed(layers.Conv1D(8, kernel_size=(4), activation='tanh'), name='td_conv_1d_tanh')(x)
with input 100, 128, 1
yielding 40
trainable parameters and an output of 100, 125, 8
.
This makes sense to me as applying a 1D conv along the 128-shape frequency/melbin axis, for each of the 100 time bins, and learning filters weights/biases as it goes. I'm guessing the step size for the convolution is 1, which is why we get 125 on the output. For the LSTM:
s = TimeDistributed(layers.Dense(64, activation='tanh'), name='td_dense_tanh')(x)
with input 100, 128
yielding 8256
trainable parameters and output 100, 64
.
We have 64 nodes fully connected to the 128 frequencies, so we wind up with 128 * 64 + 64 = 8256 parameters. Then these are trained over each of the 100 time bins. Does that sound right?
Thanks a lot to Seth for the code and videos, they are very helpful.
rate, wav = downsample_mono(wav_path[0], args.sr)
wav = resample(wav.astype(np.float32), rate, sr)
raise ParameterError('Audio buffer is not Fortran-contiguous. '
librosa.util.exceptions.ParameterError: Audio buffer is not Fortran-contiguous. Use numpy.asfortranarray to ensure Fortran contiguity.
I was attempting to use my sample and multitrack collections in order to train a model geared more towards electronic music by following your youtube posts from February. I made sure that I all my files were in WAV format (as suggested in the video) but scipy seems to be unable to process "high" bit depths.
What would be the best way to convert my custom WAV files in in "wavfiles" folder to the appropriate bit depth (and/or sample rate)?
Output of python clean.py
:
0%| | 0/4 [00:00<?, ?it/s]
clean.py:27: WavFileWarning: Chunk (non-data) not understood, skipping it.
rate, wav = wavfile.read(path)
0%| | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
File "clean.py", line 128, in <module>
split_wavs(args)
File "clean.py", line 68, in split_wavs
rate, wav = downsample_mono(src_fn, args.sr)
File "clean.py", line 27, in downsample_mono
rate, wav = wavfile.read(path)
File "/home/myusername/anaconda3/lib/python3.7/site-packages/scipy/io/wavfile.py", line 298, in read
"has {}-bit data.".format(bit_depth))
ValueError: Unsupported bit depth: the wav file has 24-bit data.
After fixing/bypassing issue #14, python train.py
seems to dislike sample rates other than 16000:
Traceback (most recent call last):
File "train.py", line 114, in <module>
train(args)
File "train.py", line 96, in train
callbacks=[csv_logger, cp])
File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit
use_multiprocessing=use_multiprocessing)
File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 235, in fit
use_multiprocessing=use_multiprocessing)
File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 593, in _process_training_inputs
use_multiprocessing=use_multiprocessing)
File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 706, in _process_inputs
use_multiprocessing=use_multiprocessing)
File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 952, in __init__
**kwargs)
File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 747, in __init__
peek, x = self._peek_and_restore(x)
File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 956, in _peek_and_restore
return x[0], x
File "train.py", line 46, in __getitem__
X[i,] = wav.reshape(1, -1)
ValueError: could not broadcast input array from shape (1,32000) into shape (1,16000)
ValueError: The innermost dimension of input_shape must be defined, but saw: (None, None)
seems like numba removed the decorators module with version 0.50. Hotfix is to pip install numba==0.48
Please update them.Thanks a looooot!
FutureWarning: norm=1 behavior will change in librosa 0.8.0. To maintain forward compatibility, use norm='slaney' instead
labels = [self.labels[k] for k in indexes]
IndexError: index 7 is out of bounds for axis 0 with size 3
I sincerely ask the author so what version of KAPRE is it right for us to download
Hi,all:
it's supported split instrument from a mixed sound?
input: WAV song
output:
WAV of instrument 1:
WAV of instrument 2:
or just detect instrument tag:
instruments: ["Piano","Guitar","Double Bass"]
Hi Seth,
wonderful and really useful project and videos! I love it.
I am working on predicting on the original files as you finish with in the video however, when I do so, I get the following error:
"local variable 'batch_outputs' referenced before assignment"
After some quick research, this is due to the batch actually being an empty array and the error shows here:
X_batch = np.array(batch, dtype=np.float32)
y_pred = model.predict(X_batch)
since X_batch is made off of the batch array and when Y is trying to predict it throws the error.
I tried with my original data that trained just fine and I also tried this with a singular file and got the same issue.
Any ideas or is this on my end?
Thank you!
Why are you using power_melgram as 1.0 instead of 2.0 which is the usual parameter for Mel Spectrogram. And What do you think about Delta? Kapre has no module to add Deltas but is it a good idea to stack deltas?
I was wondering as to how can I deploy this model to actually see it in action on Azure.
I am just curious what is the origin of our dataset? It is collected from Youtube Video or somewhere other?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.