seth814 / audio-classification Goto Github PK

Code for YouTube series: Deep Learning for Audio Classification

License: MIT License

Python 0.90% Jupyter Notebook 99.10%

kapre youtube audio-classification tensorflow2 keras

audio-classification's Issues

TypeError: Value passed to parameter 'input' has DataType int16 not in list of allowed values: float16, bfloat16, float32, float64

I got error on this line -
y_pred = model.predict(X_batch)
please check and help me.
all files are available here :-
https://drive.google.com/drive/folders/1hRpOLXArm0gP9QPB9RuYgeWNFQerkbuP?usp=sharing

use a.any or a.all error

Rec_048.mp4

NotADirectoryError when splitting wavfiles with split_wavs(args)

Hi, thanks for the material and the videos.
I am trying to run the clean.py script with my own audio files, but I get this error:

Traceback (most recent call last):
  File "clean.py", line 129, in <module>
    split_wavs(args)
  File "clean.py", line 66, in split_wavs
    for fn in tqdm(os.listdir(src_dir)):
NotADirectoryError: [Errno 20] Not a directory: 'wavfiles/.DS_Store'

I tried with the directory you provided, but after it iterates for 3 sub-folders I still get the same error:

100%|███████████████| 30/30 [00:05<00:00,  5.31it/s]
100%|███████████████| 30/30 [00:04<00:00,  6.84it/s]
100%|███████████████| 30/30 [00:06<00:00,  4.75it/s]
Traceback (most recent call last):
  File "clean.py", line 129, in <module>
    split_wavs(args)
  File "clean.py", line 66, in split_wavs
    for fn in tqdm(os.listdir(src_dir)):
NotADirectoryError: [Errno 20] Not a directory: 'wavfiles/.DS_Store'

Can you update the code with kapre3.0?

def Conv1D(N_CLASSES=6, SR=16000, DT=0.5):
i = layers.Input(shape=(1, int(SR*DT)), name='input')
x = Melspectrogram(n_dft=512, n_hop=160,
padding='same', sr=SR, n_mels=128,
fmin=0.0, fmax=SR/2, power_melgram=2.0,
return_decibel_melgram=True, trainable_fb=False,
trainable_kernel=False,
name='melbands')(i)

Overlapping window ??

You have taken a window frame as 1/10 of a sec if I am not wrong I also want to know what is the overlapping Window you have taken ?? because nowhere in your tutorials you have mentioned about overlapping window.

Python and Wav files - error 65534

Hi Seth,

This is a Python issue but...I have a large audio collection that I am processing using your codes. For some of my files, I get this:
wave.Error: unknown format: 65534

I opened one of the files in Audicity, saved it as a 16-bit PCM and ran your codes again. It worked!
I have thousands of files so it's not practical to modify each of them. Is there anything that could be done directly from your codes?

Thanks

Using from google colab notebook

Good day Seth,

I have been going over your work and as I am about to import the libraries, the code throws an import error, "cannot import name 'downsample_mono'." I am not sure which version of the 'clean' library did you import but could you kindly assist me in solving this error?
I am using Google Colab so I could utilize the GPU.

Thank you for supplying us with such, we are really grateful for it. Thank you for your time.

Leave one out

I tried to implement leave one out cross-validation for your code but have not had success yet. I would appreciate it if you could have any suggestions.
The main reason behind using leave one out cross-validation is that your code is working with a dependent data set, but how about an independent dataset?

Clean.py memory usage

Hi Seth,

In the clean.py, the memory usage (RAM) keeps going up. So, if a large dataset with many classes is being studied, eventually, the computer goes out of memory. I get this message from time to time:

MemoryError: Unable to allocate 1.22 MiB for an array with shape (160399,) and data type int64

This can easily be dealt with by increasing the page.sys file memory size, increasing computer spec, or applying clean.py one subset of the data at a time but I was wondering if it would not be better to dump 'garbage' between two classes. In your YT video, that would mean dumping memory between Cello and Clarinet for example.

Thanks

jypyter run !python clean.py only audio file not found for sub string: 3a3d0279 is displayed #84

Error: Unknown Format: 3

Good day again Seth,

I have uploaded my own data wavfiles and when I try to clean the wave files using the clean.py code I get this error; “Error: unknown format: 3.” Is there a solution which you could provide so I can execute the cleaning process without this recurring error?

Looking forward to your feedback.

Input signal length=1 is too small to resample from 44100->16000

Hi !
As i am runing through the original code i faced the issue below,
what could be the possible solution?
Thanks and looking forward for your response.
Reloaded modules: clean
0%| | 0/300 [00:00<?, ?it/s]
Traceback (most recent call last):

File "C:\Users\CYY\Desktop\AudioClassification\4.predict.py", line 77, in
make_prediction(args)

File "C:\Users\CYY\Desktop\AudioClassification\4.predict.py", line 35, in make_prediction
rate, wav = downsample_mono(wav_fn, args.sr)

File "C:\Users\CYY\Desktop\AudioClassification\clean.py", line 36, in downsample_mono
wav = resample(wav, rate, sr)

File "C:\Users\CYY\Anaconda3\lib\site-packages\librosa\core\audio.py", line 584, in resample
y_hat = resampy.resample(y, orig_sr, target_sr, filter=res_type, axis=-1)

File "C:\Users\CYY\Anaconda3\lib\site-packages\resampy\core.py", line 97, in resample
raise ValueError('Input signal length={} is too small to '

ValueError: Input signal length=1 is too small to resample from 44100->16000

mean() instead of max() ?

def envelope(y, rate, threshold):
    mask = []
    y = pd.Series(y).apply(np.abs)
    y_mean = y.rolling(window=int(rate/20),
                       min_periods=1,
                       center=True).max()

In envelope function instead of max() It should be mean right? as explained in the youtube video.
Or it some change can you please clarify that.

downsample_mono method results in ValueError and y variable from Pandas Series results in Exception

When I run the code as is, I first get an error that seems like it has something to do with the data not liking being downsampled to 16000:

"ValueError: Input signal length=1 is too small to resample from 44100->16000"

After commenting out line 36, I then get an error in line 15:

Exception: Data must be 1-dimensional

I fixed this by reshaping the y variable using this code:
y = pd.Series(np.reshape(y, (len(y),))).apply(np.abs)

I know someone else got their version of the repo to work without making any changes and that was on a Mac. Perhaps since I'm on Windows, there is some inconsistency?

train.py: IndexError: index 9 is out of bounds for axis 1 with size 9

Hi Seth

I'm working on an amended clean.py - one that will support other audio file types.

I'm testing it with your wavfiles folder as a benchmark with the hope that I can reproduce the score I got before any changes. I've noticed it creates a few more files (differences in threshold for trimming i think), however, they are all still one second long and the file structure is identical (ie 10 classes named exactly as yours are).

running train.py returns the following:

IndexError: index 9 is out of bounds for axis 1 with size 9

I think the problem is with wav_train, wav_val, label_train, label_val = train_test_split(wav_paths, labels, test_size=0.1, random_state=0). To look into it I added:

print(len(set(label_train)))
10
print(len(set(label_val)))
9

if I set test_size=0.5 then I get:

print(len(set(label_train)))
10
print(len(set(label_val)))
10

Training then runs without error but the accuracy is about 10%

I'm really not sure what's going on here. I've listened to the audio files and looked at the spectrograms of files created by my clean.py and they seem very normal. They can be found here, I'd be v grateful if you could try and reproduce this.

IndexError: index 63 is out of bounds for axis 0 with size 57

Hello seth as you said i need more files for this kind of network so now i have more than 30 files in one class and i have two classes but it gave me

IndexError: index 63 is out of bounds for axis 0 with size 57
this error so please help me here you can find all the regarding files of project :-
https://drive.google.com/drive/folders/1DretGkD66hQ0zV2Fwo8YzWwh0woNFS8X?usp=sharing

ImportError: cannot import name 'Melspectrogram'

In your notebook directory you have few jupyter notebook with following import statement

from kapre.time_frequency import Melspectrogram, Spectrogram

it give error
ImportError: cannot import name 'Melspectrogram'

tqdm evaluation number - What does the '72' stand for?

Not really an issue, but...

I'm running your tutorial. Could you tell me what this number represents: "72" (see picture)?
It's from the train.py using your data. I couldn't figure out where it comes from and how to modify it.

tensorflow.python.framework.errors_impl.NotFoundError: logs/conv1d_history.csv; No such file or directory

good morning, i am trying to run the train script and i get this not found error, how can i correct its, i am using my own separate data, and want the model to classify 2 classes. how can i solve this:

File "training.py", line 120, in
train(args)
File "training.py", line 103, in train
callbacks=[csv_logger, cp])
File "/home/marco/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1230, in fit
callbacks.on_epoch_end(epoch, epoch_logs)
File "/home/marco/.local/lib/python3.6/site-packages/keras/callbacks.py", line 413, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/home/marco/.local/lib/python3.6/site-packages/keras/callbacks.py", line 2775, in on_epoch_end
self.writer.writeheader()
File "/usr/lib/python3.6/csv.py", line 144, in writeheader
self.writerow(header)
File "/usr/lib/python3.6/csv.py", line 155, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/home/marco/.local/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 103, in write
self._prewrite_check()
File "/home/marco/.local/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 89, in _prewrite_check
compat.path_to_bytes(self.__name), compat.as_bytes(self.__mode))
tensorflow.python.framework.errors_impl.NotFoundError: logs/conv1d_history.csv; No such file or directory

jypyter run !python clean.py is None

on pip install -r requirements command, errors appears!?

Hello,
I am trying your code, however I am facing an error, when running "pip install -r requirements" command it show an error in object-detection package as not found!

Not able to run predict.py on IRMAS: a dataset for instrument recognition in musical audio signals

I ran the previous version of predict.py that you are using in the video as the latest version was not showing output on the terminal. I have two problems.

When I run the prediction on the IRMAS: a dataset for instrument recognition in musical audio signals, I get an error that I am not able to understand.

My folder structure was as follows
and when I copied the only Acoustic guitar folder from wavfiles directory to a new directory wavfiles2 and then ran the predict.py I got an incomplete output

Missing Files Related to YouTube Tutorials Part by Part

Hi,
I See that you are updating the repo., grate

ful for your efforts.
I'm not able to find docx and instruments.csv file related pt.2 video.
Could you please try to arrange the files part by part as mentioned in the youtube.
ref:

Thanks

valueError

X[i,] = wav.reshape(1, -1)
ValueError: could not broadcast input array from shape (1,3080192) into shape (1,16000)

please find files in attached link :- https://drive.google.com/drive/folders/1hRpOLXArm0gP9QPB9RuYgeWNFQerkbuP?usp=sharing

Clarification on TimeDistributed

Hi,

I have a question about the TimeDistributed wrapper - I think I understand how it works, but I'm not sure. I would really appreciate if someone could check my understanding. The first 'real' layer of the 1D Conv model:

x = TimeDistributed(layers.Conv1D(8, kernel_size=(4), activation='tanh'), name='td_conv_1d_tanh')(x)

with input 100, 128, 1 yielding 40 trainable parameters and an output of 100, 125, 8.

This makes sense to me as applying a 1D conv along the 128-shape frequency/melbin axis, for each of the 100 time bins, and learning filters weights/biases as it goes. I'm guessing the step size for the convolution is 1, which is why we get 125 on the output. For the LSTM:

s = TimeDistributed(layers.Dense(64, activation='tanh'), name='td_dense_tanh')(x)

with input 100, 128 yielding 8256 trainable parameters and output 100, 64.

We have 64 nodes fully connected to the 128 frequencies, so we wind up with 128 * 64 + 64 = 8256 parameters. Then these are trained over each of the 100 time bins. Does that sound right?

Thanks a lot to Seth for the code and videos, they are very helpful.

Audio buffer is not Fortran-contiguous

rate, wav = downsample_mono(wav_path[0], args.sr)

wav = resample(wav.astype(np.float32), rate, sr)

raise ParameterError('Audio buffer is not Fortran-contiguous. '
librosa.util.exceptions.ParameterError: Audio buffer is not Fortran-contiguous. Use numpy.asfortranarray to ensure Fortran contiguity.

clean.py being picky about WAV bit depth | ValueError: Unsupported bit depth: the wav file has 24-bit data

I was attempting to use my sample and multitrack collections in order to train a model geared more towards electronic music by following your youtube posts from February. I made sure that I all my files were in WAV format (as suggested in the video) but scipy seems to be unable to process "high" bit depths.
What would be the best way to convert my custom WAV files in in "wavfiles" folder to the appropriate bit depth (and/or sample rate)?

Output of python clean.py:

  0%|                                                                                       | 0/4 [00:00<?, ?it/s]
clean.py:27: WavFileWarning: Chunk (non-data) not understood, skipping it.
  rate, wav = wavfile.read(path)
  0%|                                                                                       | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "clean.py", line 128, in <module>
    split_wavs(args)
  File "clean.py", line 68, in split_wavs
    rate, wav = downsample_mono(src_fn, args.sr)
  File "clean.py", line 27, in downsample_mono
    rate, wav = wavfile.read(path)
  File "/home/myusername/anaconda3/lib/python3.7/site-packages/scipy/io/wavfile.py", line 298, in read
    "has {}-bit data.".format(bit_depth))
ValueError: Unsupported bit depth: the wav file has 24-bit data.

train.py not accepting sample rates other than 16000 | ValueError: could not broadcast input array from shape (1,32000) into shape (1,16000)

After fixing/bypassing issue #14, python train.py seems to dislike sample rates other than 16000:

Traceback (most recent call last):
  File "train.py", line 114, in <module>
    train(args)
  File "train.py", line 96, in train
    callbacks=[csv_logger, cp])
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit
    use_multiprocessing=use_multiprocessing)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 235, in fit
    use_multiprocessing=use_multiprocessing)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 593, in _process_training_inputs
    use_multiprocessing=use_multiprocessing)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 706, in _process_inputs
    use_multiprocessing=use_multiprocessing)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 952, in __init__
    **kwargs)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 747, in __init__
    peek, x = self._peek_and_restore(x)
  File "/home/myuser/anaconda3/envs/audio/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/data_adapter.py", line 956, in _peek_and_restore
    return x[0], x
  File "train.py", line 46, in __getitem__
    X[i,] = wav.reshape(1, -1)
ValueError: could not broadcast input array from shape (1,32000) into shape (1,16000)

ValueError: The innermost dimension of input_shape must be defined, but saw: (None, None)

No module named 'numba.decorators'

seems like numba removed the decorators module with version 0.50. Hotfix is to pip install numba==0.48

The kapre package has updated and some of notebooks cannot work anymore

Please update them.Thanks a looooot!

Version Mismatching

Just went through your video and got this error when pip installing

I assume this is caused by Tensorflow version?

BTW, Thanks for the tutorials

index_error

FutureWarning: norm=1 behavior will change in librosa 0.8.0. To maintain forward compatibility, use norm='slaney' instead

labels = [self.labels[k] for k in indexes]
IndexError: index 7 is out of bounds for axis 0 with size 3

I sincerely ask the author so what version of KAPRE is it right for us to download

Can we detect instruments list from a song?

Hi,all:

it's supported split instrument from a mixed sound?

input: WAV song

output:
WAV of instrument 1:
WAV of instrument 2:

or just detect instrument tag:

instruments: ["Piano","Guitar","Double Bass"]

Model Predict Error

Hi Seth,

wonderful and really useful project and videos! I love it.

I am working on predicting on the original files as you finish with in the video however, when I do so, I get the following error:
"local variable 'batch_outputs' referenced before assignment"

After some quick research, this is due to the batch actually being an empty array and the error shows here:
X_batch = np.array(batch, dtype=np.float32)
y_pred = model.predict(X_batch)

since X_batch is made off of the batch array and when Y is trying to predict it throws the error.

I tried with my original data that trained just fine and I also tried this with a singular file and got the same issue.

Any ideas or is this on my end?

Thank you!

seth814 / audio-classification Goto Github PK

audio-classification's Issues

Recommend Projects

Recommend Topics

Recommend Org