Giter Club home page Giter Club logo

ecoz2-sr4x's Introduction

ECOZ2

Linear Predictive Coding Vector Quantization and Hidden Markov Modeling for speech recognition.

This project is a general revision of an implementation I did as part of my BS thesis1 on isolated word speech recognition using LPC vector quantization2 and HMM3 4. The implementation was directly based on the algorithms described in the literature.

With the implementation essentially remaining as originally written, this revision mainly involves style adjustments and modernization (e.g., better alignment with C99), parallelization, use of scaling factors5 6 in HMM operations (such that the system can deal with larger models, longer observation sequences, and larger vocabulary sizes), and facilitating the creation wrappers (Python, Rust).

Status

The programs are already usable. Some pending implementation aspects include some better automation regarding model tuning, and perhaps use of a more flexible and interoperable file format for various generated artifacts (predictor files, quantized observation sequences, codebooks and HMM models). Also, note that large scale model training and tuning have not been considered at all.

Documentation? Sorry, There's no documentation (yet). But you can take a look at how the programs are used in some exercises under https://github.com/ecoz2/.

References. Some key literature references are already mentioned here and in some parts of the code, but I haven't really addressed this aspect in any systematic way, so I'm likely missing some! (This would be a straightforward task if I had a copy of my BS thesis at hand, but, alas, I don't at the moment!)

Building the programs

This is a pretty straightforward Makefile-based project with no dependencies other than GNU GCC and standard C libraries. Under a Unix-like environment just type:

$ make

On a MacOS, you may need something like CC=gcc-10 make to indicate use of GNU GCC, which can be installed via brew install gcc.

The programs will be placed under _out/bin/.

The main programs are:

 lpc            - Performs LPC on wav files.
 vq.learn       - Trains codebooks for vector quantization
 vq.quantize    - Generates observation sequences
 hmm.learn      - Trains HMM model
 hmm.classify   - Performs HMM based classification of observation sequences
 vq.classify    - Performs VQ based classification of predictor files

Other utilities:

 sgn.show       - Displays info about wav file
 sgn.endp       - Performs endpoint detection
 prd.show       - Displays info about generated predictor files
 vq.show        - Displays info about generated codebook
 hmm.show       - Displays info about generated HMM
 seq.show       - Displays info about observation and state sequences

The programs will print a basic usage message when called with no arguments.

There's no program installation per se, but you can simply include _out/bin in your $PATH:

$ export PATH=`pwd`/_out/bin:$PATH

I have added some plotting utilities. These are python based requiring pandas and matplotlib:

$ pip install pandas matplotlib 

You may want to add src/py to your $PATH as well:

$ export PATH=`pwd`/src/py:$PATH

Codebook related:

cb.plot_evaluation.py        - Distortion, σ-ratio, inertia
cb.plot_cards_dists.py       - Cell cardinality and distortion plots
cb.plot_reflections.py       - Reflection coefficient scatter plot
                               to visualize clusters and centroids

Footnotes

  1. Rueda, C.A., "Implementation and Experimentation with Speech Recognition –Isolated Digits– Using LPC Vector Quantization and Hidden Markov Models," BS. Thesis, Systems Engineering Dept., Universidad Autónoma de Manizales, Colombia, 1993.

  2. Juang, B-H., Wong, D.Y., Gray, A.H., "Distortion Performance of Vector Quantization for LPC Voice Coding," IEEE Trans. ASSP, Vol. 30, No. 2, April, 1982.

  3. Rabiner, L. R., Levinson, S.D., Sondhi, M.M., "On the Application of Vector Quantization and Hidden Markov Models to Speaker-Independent, Isolated Word Recognition," The Bell System Technical Journal, Vol. 62, No.4, April 1983.

  4. Rabiner, L. R., "A tutorial on hidden Markov models and selected applications in speech recognition," Proceedings of the IEEE, Vol. 77, No. 2, 1989.

  5. Shen, D., "Some Mathematics for HMM," http://courses.media.mit.edu/2010fall/mas622j/ProblemSets/ps4/tutorial.pdf

  6. Stamp, M., "A Revealing Introduction to Hidden Markov Models," Computer Science Dept, San Jose State University, https://www.cs.sjsu.edu/~stamp/RUA/HMM.pdf

ecoz2-sr4x's People

Contributors

carueda avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.