The speech-to-text-wavenet from en10

speech-to-text-wavenet's Introduction

#Speech-to-Text-WaveNet#

Based on: https://github.com/buriburisuri/speech-to-text-wavenet
I have included the asset folder with pre-trained model which is not included in original repository.

The pre-trained model is from here:
https://github.com/buriburisuri/speech-to-text-wavenet#pre-trained-models
The model was trained on the CSTR VCTK Corpus:
http://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html

Dependencies

The original dependancies are not 100% correct, as described here:
https://github.com/buriburisuri/speech-to-text-wavenet#dependencies
It seems to break with newer versions of tensorflow or sugartensor.

My Updated Dependancies File: https://github.com/EN10/STT/blob/master/requirements.txt

Working Dependancies

Works with:
pandas 0.19.2 (latest)
librosa to 0.5.0 (latest)
tqdm to 4.11.2 (latest)
tensorflow 1.0.0, 0.12.1 & 0.12.0 doesn't work, only tensorflow 0.11.0.
sugartensor version > 0.0.1.9 doesn't work, only 0.0.1.9 does.

Changing Dependancies

To see which version installed use:

pip freeze
pip show tensorflow

If a newer version is installed then uninstall:

sudo pip uninstall sugartensor

Then install correct version:

sudo pip install sugartensor==0.0.1.9

To install correct version of tensorflow:

sudo pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.11.0-cp27-none-linux_x86_64.whl

Run

Use recognise using test file:

python recognize.py --file test.wav

Other Issues

ImportError: No module named

sudo -H pip install

Convert Audio:
http://superuser.com/questions/23930/how-to-decode-aac-m4a-audio-files-into-wav

speech-to-text-wavenet's People

Contributors

Stargazers

Watchers

speech-to-text-wavenet's Issues

How to add my own data

How can I Use my own data sample

Pls if you have any Information send me by [email protected]

train.py error

I was having a error "ValueError: Shape must be rank 1 but is rank 0 for 'CTCLoss' (op: 'CTCLoss') with input shapes: [?,16,28], [?,2], [?], []."

I just chaged
"seq_len = tf.not_equal(x.sg_sum(dims=2), 0.).sg_int().sg_sum(dims=1)"
to "seq_len = tf.not_equal(x.sg_sum(axis=2), 0.).sg_int().sg_sum(axis=1)"

It worked

recognize.py error

Hi I used your code it is working fine with the pre-trained dataset. But when I train in my system and then try to run throwing lots of error. Please guide.

NotFoundError (see above for traceback): Key block_0_8/conv_gate/variance not found in checkpoint
[[Node: save/RestoreV2_69 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_69/tensor_names, save/RestoreV2_69/shape_and_slices)]]

Error while running recognize.py

health@health-desktop:~/Desktop/lang_detec/Speech-to-Text-WaveNet-master$ python recognize.py --file test.wav
/usr/local/lib/python2.7/dist-packages/numba/errors.py:104: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
warnings.warn(msg)
INFO:tensorflow:0121:18:16:26.521:data.py:41] VCTK vocabulary loaded.
Traceback (most recent call last):
File "recognize.py", line 92, in
mfcc = np.transpose(np.expand_dims(librosa.feature.mfcc(wav, sr), axis=0), [0, 2, 1])
File "/usr/local/lib/python2.7/dist-packages/librosa/feature/spectral.py", line 1279, in mfcc
S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/librosa/feature/spectral.py", line 1371, in melspectrogram
mel_basis = filters.mel(sr, n_fft, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/librosa/filters.py", line 238, in mel
lower = -ramps[i] / fdiff[i]
ValueError: operands could not be broadcast together with shapes (1,1025) (0,)

Recommend Projects

en10 / speech-to-text-wavenet Goto Github PK

speech-to-text-wavenet's Introduction

Dependencies

Working Dependancies

Changing Dependancies

Run

Other Issues

speech-to-text-wavenet's People

Contributors

Stargazers

Watchers

Forkers

speech-to-text-wavenet's Issues

How to add my own data

train.py error

recognize.py error

Error while running recognize.py

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent