Giter Club home page Giter Club logo

speech-recognition-ann's Introduction

Speech Recognition ANN Implementation

An implementation of Speech Recognition using Artificial Neural Networks.

Language Used: Python

You need numpy and scipy for this to work.

Words Recognized: "Apple", "Banana", "Kiwi", "Lime", "Orange"

#How to add new words

  1. Record your new word in Audacity or any audio processing software. Set the sampling rate to 44100Hz then export into a .wav file. It would be better to record a lot of samples from different speakers to improve accuracy.

  2. Put the wav files into the training_sets directory. Rename your wav files to the word you want to add + -sample_index (ex: hello-1.wav,hello-2.wav). In this way, the feature extractor later can iterate within the files easily.

  3. In the featureExtractor.py, append your new word to the words array.

  4. Run the featureExtractor.py. Numpy files with Mel Cepstrum Coefficients will be generated in the mfccData folder.

  5. In anntrainer.py, go to the main function, open another file instance: Ex. f6 = open("mfccData/hello_mfcc.npy").

  6. Load the npy file by using np.load() then concatenate it in the inputArray

  7. You have to edit the Neural network target outputs, so if I'm going to add the word hello, I'll need to edit the results as follows

t1 = np.array([[1,0,0,0,0,0] for _ in range(len(inputArray1))]) #Apple
t2 = np.array([[0,1,0,0,0,0] for _ in range(len(inputArray2))]) #Banana
t3 = np.array([[0,0,1,0,0,0] for _ in range(len(inputArray3))]) #Kiwi
t4 = np.array([[0,0,0,1,0,0] for _ in range(len(inputArray4))]) #Lime
t5 = np.array([[0,0,0,0,1,0] for _ in range(len(inputArray5))]) #Orange
t6 = np.array([[0,0,0,0,0,1] for _ in range(len(inputArray6))]) #Hello

target = np.concatenate([t1,t2,t3,t4,t5,t6])

then run anntrainer.py. This could take a lot of time to compute. Grab a coffee while you wait =)

#Running the speech recognizer Just run main.py! =) You can view demo.mp4 for sample usage.

#Developers A CS 180 Artificial Intelligence Project, University of the Philippines Diliman Developers: Romelio Tavas Jr., Dion Melosantos

speech-recognition-ann's People

Contributors

bongtavas avatar kurabemono avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

speech-recognition-ann's Issues

Error rate not reducing

I wanted to recognise 10 digits. with BackPropagationNetwork((260,25,25,10)) the error rate (after training more than 12 hours) is greater than 100. What changes are needed in the code?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.