rakeshvar / rnn_ctc Goto Github PK

Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.

License: Apache License 2.0

Python 100.00%

recurrent-neural-networks python theano ctc rnn rnn-ctc ctc-loss neural-network ocr speech-recognition

rnn_ctc's People

Contributors

Stargazers

Watchers

rnn_ctc's Issues

float32 vs float64 on GPU

Using gpu device 0: GeForce GTX TITAN Black

Config: 0
   Midlayer: <class 'reccurent.RecurrentLayer'> {'nunits': 9}
Input Dim: 8
Num Classes: 10
Num Samples: 1000
FloatX: float32
Using log space: True

Preparing the Data
Building the Network
Traceback (most recent call last):
  File "train.py", line 115, in <module>
    ntwk = NeuralNet(nDims, nClasses, midlayer, midlayerargs, log_space)
  File "/home/rakesha/rnn_ctc/neuralnet.py", line 16, in __init__
    layer3 = CTCLayer(layer2.output, labels, n_classes, logspace)
  File "/home/rakesha/rnn_ctc/ctc.py", line 68, in __init__
    self.log_ctc()
  File "/home/rakesha/rnn_ctc/ctc.py", line 117, in log_ctc
    outputs_info=[safe_log(_1000)]
  File "/home/rakesha/.virtualenvs/python3.4/lib/python3.4/site-packages/theano/scan_module/scan.py", line 1017, in scan
    scan_outs = local_op(*scan_inputs)
  File "/home/rakesha/.virtualenvs/python3.4/lib/python3.4/site-packages/theano/gof/op.py", line 481, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/home/rakesha/.virtualenvs/python3.4/lib/python3.4/site-packages/theano/scan_module/scan_op.py", line 339, in make_node
    inner_sitsot_out.type.dtype))
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (`outputs_info` in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has dtype float32, while the result of the inner function (`fn`) has dtype float64. This can happen if the inner function of scan results in an upcast or downcast.

tiny bug in readme.md!

python3 rnn_ctc.py data.pkl [configuration_num]

See configurations.py for various configurations.

shoud be ctc.py , not rnn_ctc.py

Better GPU support.

Currently training is slower on GPUs than on CPUs, because the training data is not a shared variable.

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2588' in position 0: ordinal not in range(256)

Hello, I ran 'python3 gen_data.py data.pkl' and had error messages as follow:

(0, 1, 2)  !"
 0¦              Traceback (most recent call last):
  File "gen_data.py", line 27, in <module>
    utils.slab_print(x)
  File "/home/speech/wudan14/rnn_ctc-master/utils.py", line 20, in slab_print
    elif val <= 1.:  print('\u2588', end=''),
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2588' in position 0: ordinal not in range(256)

how can i solve this problem?

is the ctc implementation gpu runnable?

@rakeshvar I was wondering if this ctc implementation is gpu runnable. Do you know if there are any such implementations in Theano/Keras. Except for warp_ctc in Torch I could not find any such implementations which run on GPUUs

Question about the CTC and the data structure used for the target values

Hi,

I was wondering if you could explain how you stored the target values. Did you one-hot encode them? I am trying to better understand so that I could apply it to speech recognition (TIMIT Dataset), where my target values are phonemes that I am trying to align.

saving the trained model for future use -

Thanks for posting your code, I played with it to understand LSTM implementation. I just have a quick question regarding saving the models for later use. Right now, when I run train.py, I can see the model getting trained and I see some outputs. But is there a way to save the model to file and later use it to retrain/predict on future data? I tried using pickle but I get an error saying maximum recursion reached. Please post your thoughts.

Thanks!

speech recognition

whether rnn_ctc can be used for speech recognition

TypeError: super() takes at least 1 argument (0 given)

if I use Python2 to run this program error：
TypeError: super() takes at least 1 argument (0 given)

I know this error improve by Python2，but how to fix this？

Momentum based SGD

May help train on longer sequences.

Long time (over 300 frames) problem in speech recognition (in TIMIT data)

Hello!

We have a long time (over 300 frames) problem in speech recognition (in TIMIT data).

In general, speech recognition used a feature data with long time, for example 300 frames for 3 second utterance. When we analyzed your code in 'ctc.py' scan function, it seems to be calculated as zero in probabilities variable for over 300 frames. And 'cost' variable showed as 'Inf'.

How can we treat the problem?
Do you have any suggestions?

I will wait your comments.

Best regards.

float32 failure

float32 does not work.

$python train_online.py 
Arguments:
FloatX         : float32
Num Epochs     : 1000
Num Samples    : 1000
Scribe:
  Alphabet:  !"#$%&'()*+,-./0123456789:;=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
  Noise: 0.05
  Buffers (vert, horz): 5, 3
  Characters per sample: Depends on the random length
  Length: Avg:60 Range:(45, 75)
  Height: 11
Building the Network
Traceback (most recent call last):
  File "train_online.py", line 26, in <module>
    ntwk = nn.NeuralNet(scriber.nDims, scriber.nClasses, **nnet_args)
  File "/home/rakesha/rnn_ctcs/rnn_ctc/nnet/neuralnet.py", line 23, in __init__
    layer3 = CTCLayer(layer2.output, labels, n_classes, use_log_space)
  File "/home/rakesha/rnn_ctcs/rnn_ctc/nnet/ctc.py", line 64, in __init__
    self._log_ctc()
  File "/home/rakesha/rnn_ctcs/rnn_ctc/nnet/ctc.py", line 115, in _log_ctc
    outputs_info=[safe_log(_1000)]
  File "/home/rakesha/.local/lib/python3.3/site-packages/Theano-0.7.0-py3.3.egg/theano/scan_module/scan.py", line 1044, in scan
    scan_outs = local_op(*scan_inputs)
  File "/home/rakesha/.local/lib/python3.3/site-packages/Theano-0.7.0-py3.3.egg/theano/gof/op.py", line 600, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/home/rakesha/.local/lib/python3.3/site-packages/Theano-0.7.0-py3.3.egg/theano/scan_module/scan_op.py", line 550, in make_node
    inner_sitsot_out.type.dtype))
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (`outputs_info` in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has dtype float32, while the result of the inner function (`fn`) has dtype float64. This can happen if the inner function of scan results in an upcast or downcast.

nDims, nClasses, image transposition, etc.

hi,I have a question for ask

When i use the code, it perform very nice on the data set "hindu" with the default configuration , but the result on the "ascii" is very bad. I think it's wrong with my network config. Can you tell me what' the config on the data set of "ascii".thank you.

rakeshvar / rnn_ctc Goto Github PK

rnn_ctc's People

Contributors

Stargazers

Watchers

Forkers

rnn_ctc's Issues

See configurations.py for various configurations.

Recommend Projects

Recommend Topics

Recommend Org