Comments (4)
So, I'm assuming that I don't need to train any Language Model or fill dictionary, as my current focus is to understand how it works with pre-built models.. Only things I'd like to tune is probably Feature Extraction part and perhaps Acoustic Model.
I had a chance to play around with the same audio file using Google Speech API, Microsoft Cognitive Services - Bing Speech API, and IBM Watson via online service.
So Google API offers sync and async API with 1 min and 80 mins audio file duration limit respectively. And Microsoft's limit is 10 sec per each file. And I split my full audio (3:12) into N-parts per 10 sec using ffmpeg, and with pocketsphinx_continuous
utility I got worse results than using the same params via same utility but using 1 entire audio file. So question is - how does pocketsphinx get the utterance?
From this paper, I got that
Feature vectors are typically computed every 10 ms using an overlapping analysis window of around 25 ms
I'm not worried too much about accuracy, because eventually I will maybe use Kaldi :) But would like to understand Sphinx first as it is great toolkit to start (appreciate you, guys).
This is probably some Quora question, but please suggest if my understanding of this is correct and if I'm going right direction on the way of understanding SR. Thanks again.
from pocketsphinx-python.
def recognize_audio(audio_file, args):
This code is not correct. It is good for processing the audio at once, but not for audio chunks going one by one because you restart after first word is recognized and chunks are processed independently, not continuously. Continuous processing example with VAD is here:
https://github.com/cmusphinx/pocketsphinx/blob/master/swig/python/test/continuous_test.py
And I split my full audio (3:12) into N-parts per 10 sec using ffmpeg, and with pocketsphinx_continuous utility I got worse results than using the same params via same utility but using 1 entire audio file.
Recognizer needs few second to estimate channel parameters (CMN), for that reason it is better to process audio continuously without restarts. Alternative is batch processing when audio parameters are estimated from the whole utterance at once. There is full_utt parameter in ps_process_raw to perform batch processing.
from pocketsphinx-python.
it is better to process audio continuously without restarts
Ok, thanks, so I tried with this code below, see I start utterance only once here.
So does this 1024 mean that 1024 bytes of audio are read separately in loop, and Decoder works only with this 1024 bytes chunk? I mean, there is no information whether this chunk has the full utterance or it can be chunked abruptly. Is it the same as pocketsphinx_continuous
work? Does it use 1024 bytes too?
def recognize_audio(audio_file, args):
try:
decoder.start_utt()
stream = open(audio_file, 'rb')
in_speech_bf = False
while True:
buf = stream.read(args.chunk_size)
if buf:
decoder.process_raw(buf, False, False) # full_utt - False
if decoder.get_in_speech() != in_speech_bf:
in_speech_bf = decoder.get_in_speech()
if decoder.hyp() is not None:
# decoder.end_utt()
hypothesis.append(decoder.hyp().hypstr)
# decoder.start_utt()
else:
break
except Exception, ex:
print 'Error occurred with %s \n%s' % (audio_file, ex)
from pocketsphinx-python.
Ok, thanks, so I tried with this code below, see I start utterance only once here.
It is a bad idea to modify example without understanding. Original code is correct, your modification is wrong.
So does this 1024 mean that 1024 bytes of audio are read separately in loop, and Decoder works only with this 1024 bytes chunk?
Decoder remembers previously processed audio since start_utt
I mean, there is no information whether this chunk has the full utterance or it can be chunked abruptly.
chunks are not full utterances since full_utt is false.
Is it the same as pocketsphinx_continuous work?
Yes
Does it use 1024 bytes too?
It uses 4096
from pocketsphinx-python.
Related Issues (20)
- train Indian English g2p model with seq2seq
- Please clarify the way of importing pocketshinx and sphinxbase modules in example.py HOT 1
- Basic usage example inaccurate? HOT 1
- How to print phoneme sequence? HOT 1
- how do i trained tidigits acoustic model with my own audiofiles? missing mixture_weights file in the acoustic tidigits model. HOT 1
- ImportError: cannot import name 'LiveSpeech' from 'pocketsphinx' HOT 3
- new_Decoder returned -1 when Unicode character on project path HOT 1
- Decoding with other acoustic models HOT 1
- how to use microphone ? HOT 2
- ERROR: "acmod.c", line 79: does not contain acoustic model definition 'mdef' HOT 1
- pip install fails HOT 1
- Can I use with PySoundFile? HOT 3
- pocketsphinx phrase in speech HOT 2
- This repository is 120 commits behind bambocher HOT 1
- Not able to install pocketsphinx HOT 2
- Pipenv install fails HOT 2
- Understand 'no_search' and 'full_utt' parameters HOT 1
- Should we be now using this branch? HOT 1
- testsuite failure on i386 HOT 1
- NameError: global name 'Ad' is not defined pocketsphinx HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pocketsphinx-python.