Giter Club home page Giter Club logo

Comments (9)

dhdaines avatar dhdaines commented on July 19, 2024

This is because for some reason -remove_silence and -remove_noise are enabled by default in the pocketsphinx configuration. This makes it drop silence and noise frames, so the frame indices only correspond to actual speech. It speeds up decoding but confuses everybody! You can fix it like this:

cfg = pocketsphinx.Decoder.get_default_config()
cfg.set_boolean('-remove_silence', False)
cfg.set_boolean('-remove_noise', False)
decoder = pocketsphinx.Decoder(cfg)

etc.

from pocketsphinx-python.

nshmyrev avatar nshmyrev commented on July 19, 2024

Hi @dhdaines

remove_noise should be almost always enabled because models are trained with noise removal feature.

remove_silence is also a very good thing because it allows to properly compute CMN estimate, thus much better accuracy.

The current proposal is to use continuous processing, then timing will be correct. The implementation of continuous processing is not great though.

from pocketsphinx-python.

dhdaines avatar dhdaines commented on July 19, 2024

If you use continuous processing, doesn't this prevent you from getting whole-utterance CMN, though?

from pocketsphinx-python.

dhdaines avatar dhdaines commented on July 19, 2024

I suppose that is only relevant if you're doing offline recognition, of course...

from pocketsphinx-python.

nshmyrev avatar nshmyrev commented on July 19, 2024

Whole utterance cmn is also ok, and still CMN estimation is best without silence frames which could be really long in real life (2 seconds of silence around short command) unlike in common ASR databases. Thats what remove_silence is doing and this is why it is enabled by default.

Timing and whole-utterance processing are not very easy, it is correct.

from pocketsphinx-python.

ToxicSam avatar ToxicSam commented on July 19, 2024

Hi, is there a way to get the starting and ending time-step of each phoneme. I am still a little bit confused by that.

from pocketsphinx-python.

dhdaines avatar dhdaines commented on July 19, 2024

from pocketsphinx-python.

ToxicSam avatar ToxicSam commented on July 19, 2024

Hi, There isn't a particularly easy way to do that at the moment. The search is word-based (otherwise it would be horribly slow) - if you use allphone decoding you will get phone segmentations but the phoneme accuracy isn't very good. The intention is that state align search can be used as a second pass to get phone alignments. (in fact, it will give them to you, but this involves writing code) I am trying to find some time to implement this.

Thank you for your reply! I still wondering does continuous processing approximate the time steps?

from pocketsphinx-python.

lenzo-ka avatar lenzo-ka commented on July 19, 2024

This module is obsolete; python bindings are now in pocketsphinx

from pocketsphinx-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.