Giter Club home page Giter Club logo

speech-to-text-wavenet's Introduction

#Speech-to-Text-WaveNet#

Based on: https://github.com/buriburisuri/speech-to-text-wavenet
I have included the asset folder with pre-trained model which is not included in original repository.

The pre-trained model is from here:
https://github.com/buriburisuri/speech-to-text-wavenet#pre-trained-models
The model was trained on the CSTR VCTK Corpus:
http://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html

Dependencies

The original dependancies are not 100% correct, as described here:
https://github.com/buriburisuri/speech-to-text-wavenet#dependencies
It seems to break with newer versions of tensorflow or sugartensor.

My Updated Dependancies File: https://github.com/EN10/STT/blob/master/requirements.txt

Working Dependancies

Works with:
pandas 0.19.2 (latest)
librosa to 0.5.0 (latest)
tqdm to 4.11.2 (latest)
tensorflow 1.0.0, 0.12.1 & 0.12.0 doesn't work, only tensorflow 0.11.0.
sugartensor version > 0.0.1.9 doesn't work, only 0.0.1.9 does.

Changing Dependancies

To see which version installed use:

pip freeze
pip show tensorflow

If a newer version is installed then uninstall:

sudo pip uninstall sugartensor

Then install correct version:

sudo pip install sugartensor==0.0.1.9

To install correct version of tensorflow:

sudo pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.11.0-cp27-none-linux_x86_64.whl

Run

Use recognise using test file:

python recognize.py --file test.wav

Other Issues

ImportError: No module named

sudo -H pip install

Convert Audio:
http://superuser.com/questions/23930/how-to-decode-aac-m4a-audio-files-into-wav

speech-to-text-wavenet's People

Contributors

eniompw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

kennethkhmoon

speech-to-text-wavenet's Issues

train.py error

I was having a error "ValueError: Shape must be rank 1 but is rank 0 for 'CTCLoss' (op: 'CTCLoss') with input shapes: [?,16,28], [?,2], [?], []."

I just chaged
"seq_len = tf.not_equal(x.sg_sum(dims=2), 0.).sg_int().sg_sum(dims=1)"
to "seq_len = tf.not_equal(x.sg_sum(axis=2), 0.).sg_int().sg_sum(axis=1)"

It worked

recognize.py error

Hi I used your code it is working fine with the pre-trained dataset. But when I train in my system and then try to run throwing lots of error. Please guide.

NotFoundError (see above for traceback): Key block_0_8/conv_gate/variance not found in checkpoint
[[Node: save/RestoreV2_69 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_69/tensor_names, save/RestoreV2_69/shape_and_slices)]]

Error while running recognize.py

health@health-desktop:~/Desktop/lang_detec/Speech-to-Text-WaveNet-master$ python recognize.py --file test.wav
/usr/local/lib/python2.7/dist-packages/numba/errors.py:104: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
warnings.warn(msg)
INFO:tensorflow:0121:18:16:26.521:data.py:41] VCTK vocabulary loaded.
Traceback (most recent call last):
File "recognize.py", line 92, in
mfcc = np.transpose(np.expand_dims(librosa.feature.mfcc(wav, sr), axis=0), [0, 2, 1])
File "/usr/local/lib/python2.7/dist-packages/librosa/feature/spectral.py", line 1279, in mfcc
S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/librosa/feature/spectral.py", line 1371, in melspectrogram
mel_basis = filters.mel(sr, n_fft, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/librosa/filters.py", line 238, in mel
lower = -ramps[i] / fdiff[i]
ValueError: operands could not be broadcast together with shapes (1,1025) (0,)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.