Giter Club home page Giter Club logo

audio-classification's Introduction

Environmental Sound Classification using Deep Learning

A project from Digital Signal Processing course

Dependencies

  • Python 3.6
  • numpy
  • librosa
  • pysoundfile
  • sounddevice
  • matplotlib
  • scikit-learn
  • tensorflow
  • keras

Dataset

Dataset could be downloaded at Dataverse or Github.

I'd recommend use ESC-10 for the sake of convenience.

Example:

├── 001 - Cat
│  ├── cat_1.ogg
│  ├── cat_2.ogg
│  ├── cat_3.ogg
│  ...
...
└── 002 - Dog
   ├── dog_barking_0.ogg
   ├── dog_barking_1.ogg
   ├── dog_barking_2.ogg
   ...

Feature Extraction

Put audio files (.wav untested) under data directory and run the following command:

python feat_extract.py

Features and labels will be generated and saved in the directory.

Classify with SVM

Make sure you have scikit-learn installed and feat.npy and label.npy under the same directory. Run svm.py and you could see the result.

Classify with Multilayer Perceptron

Install tensorflow and keras at first. Run nn.py to train and test the network.

Classify with Convolutional Neural Network

  • Run cnn.py -t to train and test a CNN. Optionally set how many epochs to train on.
  • Predict files by either:
    • Putting target files under predict/ directory and running cnn.py -p
    • Recording on the fly with cnn.py -P

audio-classification's People

Contributors

imfing avatar micah5 avatar mvrozanti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

audio-classification's Issues

Error! feat.npy is not UTF-8 encoded

Hi , i am using the same structure as you said but when i run feat_extract.py ,it run smoothly and gives me this output :

extract .ipynb_checkpoints features done
extract 001 - Dog bark features done
extract 002 - Rain features done
extract 003 - Sea waves features done
extract 004 - Baby cry features done
extract 005 - Clock tick features done
extract 006 - Person sneeze features done
extract 007 - Helicopter features done
extract 008 - Chainsaw features done
extract 009 - Rooster features done
extract 010 - Fire crackling features done

but in feat.npy, it says Error! feat.npy is not UTF-8 encoded

please help me to understand this and how i can resolve it

UnboundLocalError: local variable 'X_predict' referenced before assignment

The following is the error I have encountered. Could you please guide me out of it?

Traceback (most recent call last):
File "cnn.py", line 139, in
main(args)
File "cnn.py", line 125, in main
elif args.predict: predict(args)
File "cnn.py", line 88, in predict
X_predict = np.expand_dims(X_predict, axis=2)
UnboundLocalError: local variable 'X_predict' referenced before assignment

still getting error after executing the command cnn.py -p

Steps which I did..
1)download ESC-10 dataset then inserted in data directory
2)executed command python feat_extract.py
3)4 file are created feat.py,lable.py,predict_feat,predict_filenames
4)executed command svm.py and output is fitting... acc=0.756
5)executed nn.py
6)executed cnn.py -t
7)putting single sample audio at predict folder and executed the command cnn.py -t but erorr occures like

Traceback (most recent call last):
File "cnn.py", line 123, in
main(args)
File "cnn.py", line 109, in main
elif args.predict: predict(args)
File "cnn.py", line 73, in predict
pred = model.predict_classes(X_predict)
File "C:\Users\Abhijeet\Anaconda3\envs\tfp3.6\lib\site-packages\keras\engine\s
equential.py", line 268, in predict_classes
if proba.shape[-1] > 1:
AttributeError: 'list' object has no attribute 'shape'

errors in cnn.py

I got two errors when executing the file "cnn.py".

  1. Unresolved reference 'feat_extract' when we define the function:train(args).
    solution for this error: we should use "import feat_extract" instead of using "from feat_extract import *";
    2.Unresolved reference 'X_predict' when we define the function: predict(args).
    "X_predict = np.expand_dims(X_predict, axis=2)", the "X_predict" in the right side is not pre-defined nor pre-produced.
    solution for this error: we can replace "X_predict" by "predict_feat_path".

error on svm.ph

I got the following error after running svm.py. How can I solve this one?

audio-classification-error

The class labels seem to differ from the actual labels.

When I tried to predict the classes there seemed to be some kind of pattern. The class names given by folder is different from the predicted name:
Name, Predicted,Actual
rain=0,2
seawaves=1,3
baby=2,4
clock=3,5
sneeze=4,6
helicopter=5,7
chainsaw=6,8
rooster=7,9
firecracker=8,10
dog=13,1
What could be the reason?

index out of bound

when executing nn.py gives following error

Using TensorFlow backend.
Traceback (most recent call last):
File "nn.py", line 37, in
y_train = keras.utils.to_categorical(y_train-1, num_classes=10)
File "C:\Users\TUSHAR\Anaconda3\lib\site-packages\keras\utils\np_utils.py", line 32, in to_categorical
categorical[np.arange(n), y] = 1
IndexError: index 21 is out of bounds for axis 1 with size 10

Attribute error

Hello, when I run nn.py and cnn.py, the spyder showed that AttributeError: 'ProgbarLogger' object has no attribute 'log_values' . It seems that something wrong the keras

Classes

What are the classes if I use ECS-10? Is it like: 0 - dog bark, 1 - ***, ..., etc.

I put a dog bark.ogg from the training data into the predict folder, it outputs 13 however. What is wrong with it?

Thank you.

keras.utils.to_categorical(y, num_classes) question

Hello,
I have a question.

nn.py (under 34 line)

Convert label to onehot
y_train = keras.utils.to_categorical(y_train-1, num_classes=num_classes)
y_test = keras.utils.to_categorical(y_test-1, num_classes=num_classes)

Why did you use (y_train - 1), not just (y_train) when you call keras.utils.to_categorical()

Frequency band exceeds Nyquist. Reduce either fmin or n_bands.

I am experiencing the same issue. Should I prep the .wav files in a specific way?

Also, if I reduce fmin to 120.0, I receive the following error:
librosa.util.exceptions.ParameterError: Filter pass-band lies beyond Nyquist

Please advise.

Okay, thanks. It's a problem related to the audio files.
I'll close this issue for now.

Originally posted by @mtobeiyf in #10 (comment)

I get an index 12 out of bounds when I run the cnn.py -t

So this happened with both my own OGG data and the ESC data, with one OGG file in the predict folder.

Everything went normally, the SVN and MLP worked. Then I tried the CNN:

_

Phonecian:audio-classification skiwheelr$ python3 cnn.py -t
Using TensorFlow backend.
2020-04-03 23:29:32.002190: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-03 23:29:32.015288: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f8e13dfbac0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-03 23:29:32.015310: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Traceback (most recent call last):
File "cnn.py", line 124, in
main(args)
File "cnn.py", line 109, in main
if args.train: train(args)
File "cnn.py", line 51, in train
y_train = keras.utils.to_categorical(y_train, num_classes=class_count)
File "/usr/local/lib/python3.7/site-packages/keras/utils/np_utils.py", line 52, in to_categorical
categorical[np.arange(n), y] = 1
IndexError: index 12 is out of bounds for axis 1 with size 12

What could be causing this? There is enough data.

wav file does not work

When I try to put wav files in data directory, it appears:
[Error] extract feature error in data_wav/6/148835-6-0-0.wav. Frequency band exceeds Nyquist. Reduce either fmin or n_bands.
Solution: tune "fmin" parameter in function "extract_feature(file_name=None)" as follows:
contrast = np.mean(librosa.feature.spectral_contrast(S=stft, sr=sample_rate, fmin=180.0).T,axis=0)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.