sherjilozair / char-rnn-tensorflow Goto Github PK
View Code? Open in Web Editor NEWMulti-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow
License: MIT License
Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow
License: MIT License
Hi,
Looks like your code doesn't have validation or testing part. Hope it can use some fraction of input.txt for valid/test purpose. Any plan?
When sampling a model, whose vocab does not contain a space I get the following error:
Traceback (most recent call last):
File "sample.py", line 46, in <module>
main()
File "sample.py", line 27, in main
sample(args)
File "sample.py", line 43, in sample
args.sample).encode('utf-8'))
File "E:\Projekte\tensorflow\rnn\char-rnn-tensorflow\model.py", line 107, in sample
x[0, 0] = vocab[char]
KeyError: ' '
Simplest test case is that Hello world!
works, but Helloworld!
dose not.
I'm using tensorflow-gpu on python 3.5.0.
Hi!
I just downloaded the code and tried to run it with the train.py with tf-v1.0. I think it should train the Shakespeare example. Readme says default parameters. However it fails
Traceback (most recent call last):
File "train.py", line 11, in <module>
from model import Model
File "/opt/gpu-project/char-rnn-tensorflow/model.py", line 3, in <module>
from tensorflow.contrib import legacy_seq2seq
ImportError: cannot import name legacy_seq2seq
Traceback (most recent call last):
File "train.py", line 114, in <module>
main()
File "train.py", line 48, in main
train(args)
File "train.py", line 66, in train
saved_model_args = cPickle.load(f)
File "C:\Users\User\AppData\Local\Programs\Python\Python35\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 25: character maps to <undefined>
I'd stopped a run with ctrl+c and I assumed the checkpoint would work to restart it. Instead I get the above error. Running TF 0.12 on Windows 10 with CUDA 8 and cuDNN, Python v3.5.2.
Hi
First of great effort to put char rnn into tensorflow. I am just curious if we can get word vector using this approach?
Thanks
root@ip-172-31-23-174:/home/ubuntu/char-rnn# python train.py
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
loading preprocessed files
Traceback (most recent call last):
File "train.py", line 75, in
main()
File "train.py", line 39, in main
train(args)
File "train.py", line 42, in train
data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length)
File "/home/ubuntu/char-rnn/utils.py", line 22, in init
self.create_batches()
File "/home/ubuntu/char-rnn/utils.py", line 52, in create_batches
ydata[-1] = xdata[0]
IndexError: index 0 is out of bounds for axis 0 with size 0
Line 53 of model.py contains the code:
self.final_state = states[-1]
This throws the following exception. Tensorflow does not support Tensors with negative indices. (At least in the publicly available version.) What is the workaround? So many thanks.
File "/Library/Python/2.7/site-packages/tensorflow/python/ops/array_ops.py", line 124, in _SliceHelper
raise NotImplementedError("Negative indices are currently unsupported")
NotImplementedError: Negative indices are currently unsupported
Exception TypeError: TypeError("'NoneType' object is not callable",) in <function _remove at 0x101c9b488> ignored
Is there a way to tune the temperature parameter ?
Temperature. An important parameter you may want to play with is -temperature, which takes a number in range (0, 1] (0 not included), default = 1. The temperature is dividing the predicted log probabilities before the Softmax, so lower temperature will cause the model to make more likely, but also more boring and conservative predictions. Higher temperatures cause the model to take more chances and increase diversity of results, but at a cost of more mistakes.
https://github.com/karpathy/char-rnn
loading preprocessed files
Traceback (most recent call last):
File "train.py", line 75, in
main()
File "train.py", line 39, in main
train(args)
File "train.py", line 50, in train
model = Model(args)
File "/home/tensorflow/tests/char-lstm-16/model.py", line 53, in init
self.final_state = states[-1]
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 126, in _SliceHelper
raise NotImplementedError("Negative indices are currently unsupported")
NotImplementedError: Negative indices are currently unsupported
The following lines transform the xdata to tensors with the correct dimensions, but the output data are not in the correct order anymore.
self.x_batches = np.split(xdata.reshape(self.batch_size, -1), self.num_batches, 1)
self.y_batches = np.split(ydata.reshape(self.batch_size, -1), self.num_batches, 1)
I think the correct transformation should be the following:
self.x_batches = xdata.reshape(-1, self.batch_size, self.seq_length)
self.y_batches = ydata.reshape(-1, self.batch_size, self.seq_length)
Here is an example:
xdata = np.array(range(100))
xdata => array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
batch_size = 5
seq_length = 5
num_batches = 4
m = np.split(xdata.reshape(batch_size, -1), num_batches, 1)
m => [array([[ 0, 1, 2, 3, 4],
[20, 21, 22, 23, 24],
[40, 41, 42, 43, 44],
[60, 61, 62, 63, 64],
[80, 81, 82, 83, 84]]), array([[ 5, 6, 7, 8, 9],
[25, 26, 27, 28, 29],
[45, 46, 47, 48, 49],
[65, 66, 67, 68, 69],
[85, 86, 87, 88, 89]]), array([[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34],
[50, 51, 52, 53, 54],
[70, 71, 72, 73, 74],
[90, 91, 92, 93, 94]]), array([[15, 16, 17, 18, 19],
[35, 36, 37, 38, 39],
[55, 56, 57, 58, 59],
[75, 76, 77, 78, 79],
[95, 96, 97, 98, 99]])]
and
n = xdata.reshape(-1, batch_size, seq_length)
n => array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]],
[[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39],
[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49]],
[[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59],
[60, 61, 62, 63, 64],
[65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]],
[[75, 76, 77, 78, 79],
[80, 81, 82, 83, 84],
[85, 86, 87, 88, 89],
[90, 91, 92, 93, 94],
[95, 96, 97, 98, 99]]])
My goal is to predict a probability of a new sentence, could you give a example of hot to calculate the probability of a new sentence?
I am trying to build my network with 21M text file, but whatever I do, it gets stuck at train_loss ~1.6 and does not progress any more. I tried changing:
But nothing helps and I always get my network to stop learning and stuck at about 1.6-1.7 train_loss.
How can I diagnose the problem? Can someone advise?
Output produced is in paragraph form, i want to have it in dialogues form separated by line. how to do so?
In def create_batches(self) (util.py):
ydata[:-1] = xdata[1:]
ydata[-1] = xdata[0]
The first line is fair enough. However, why we need the second line? Say our data is "Hello", then
x = "hello"
y="elloh"
So when h is given we expect e (h->e), e->l, etc. But why o->h (ydata[-1] = xdata[0])? Perhaps this hurts the training model.
Did I miss something here? Or you think this is only one char, so we ignore?
weighted_pick(weights)
in model.py can return an index which is larger than len(chars)-1
this happens if sum(weights)<1
, and at the same time np.random.rand(1)>sum(weights)
then, int(np.searchsorted(t, np.random.rand(1)*s) )==len(t)
, which leads to an IndexError
import numpy as np
p=np.array([ 0.1, 0.2, 0.699 ], dtype=np.float32)
t = np.cumsum(p)
s = np.sum(p)
randval=0.9999
print int(np.searchsorted(t, randval)) # gives 3, which is too large, as len(t)==3
so probably numpy.random.choice() is the better choice, despite being slower
I got this error when trying to train on a preprocessed input.
The error comes from here: https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/utils.py#L42
self.chars is a tuple with (1) a list of chars and (2) a dict of char to integer mapping. I changed the function to just get it running:
def load_preprocessed(self, vocab_file, tensor_file):
with open(vocab_file) as f:
self.chars = cPickle.load(f)[0][0]
print self.chars
self.vocab_size = len(self.chars)
self.vocab = dict(zip(self.chars, range(len(self.chars))))
self.tensor = np.load(tensor_file)
self.num_batches = self.tensor.size / (self.batch_size * self.seq_length)
I am getting error while running train.py
Traceback (most recent call last):
file "train.py", line 111,in
main()
File "train.py", line 48, in main
train(args)
File "train.py", line 98, in train
train_loss, state, _= sess.run([model.cost,model.final_state,model.train_op],feed)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 372, in run
run_metadata_ptr)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 636, in _run
feed_dict_string,options,run_metadata)
When I do "print repr(sum(p))" it always gives me numbers very close to 1.0 like 1.00000052 0.9999999248 and so on
Traceback (most recent call last):
File "sample.py", line 38, in
main()
File "sample.py", line 21, in main
sample(args)
File "sample.py", line 35, in sample
print model.sample(sess, chars, vocab, args.n, args.prime)
File "/home/patro/Documents/Programming/NN/char-rnn-tensorflow/model.py", line 77, in sample
sample = int(np.random.choice(len(p),p=p))
File "mtrand.pyx", line 1094, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:10565)
ValueError: probabilities do not sum to 1
Hi,
I tried this implementation of char-rnn and I have an issue with the train script:
Traceback (most recent call last):
File "train.py", line 111, in
main()
File "train.py", line 48, in main
train(args)
File "train.py", line 93, in train
state = model.initial_state.eval()
AttributeError: 'tuple' object has no attribute 'eval'
I'm using the last version of tensorflow and Python 3.4.
Thanks
After building from HEAD of TF, I get some import errors for various things (almost every example on the web has this problem!)
I had to change imports in model.py to
from tensorflow.python.ops import rnn_cell
from tensorflow.python.ops import seq2seq
Downside is that importing tensorflow.models.rnn
raises ImportError, so something like this is probably needed to preserve backwards compat:
try:
from tensorflow.models.rnn import blah
except ImportError:
from tensorflow.python.ops import blah
When training on large files, I get a MemoryError despite having more than enough memory to hold the file:
reading text file
Traceback (most recent call last):
File "train.py", line 111, in
main()
File "train.py", line 48, in main
train(args)
File "train.py", line 51, in train
data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length)
File "/home/ren/Projects/char-rnn-tensorflow/utils.py", line 18, in init
self.preprocess(input_file, vocab_file, tensor_file)
File "/home/ren/Projects/char-rnn-tensorflow/utils.py", line 35, in preprocess
self.tensor = np.array(list(map(self.vocab.get, data)))
MemoryError
I am receiving a deprecation warning when I run train.py or sample.py: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True.
Is there a workaround for this?
Thanks,
CJ
Hi,
There are such lines of code at train.py
file:
train_loss, state, _ = sess.run([model.cost, model.final_state, model.train_op], feed)
summ, train_loss, state, _ = sess.run([summaries, model.cost, model.final_state, model.train_op], feed)
Means, we run 2 train steps on a single batch. Why?
Hi, does weighted_pick() return random characters?
with sampling_type=1
, return(int(np.searchsorted(t, np.random.rand(1)*s)))
looks like giving a random choice out of the array t. And that's what I'm having right now. Is it intentional? Or am I doing something wrong?
What a wonderful project! I have used it to solve some problems.
But there is one problem that always bothers me.
In one of the cases, I have to use rnn_size=512
, num_layers=2
, seq_length=1200
.
Other arguments: batch_size=10
, num_epochs=50
, grad_clip=5.0
, and so on.
But it will allocate 7.23GiB in GPU, which is only 8GB-free.
So I just wonder if I can reduce GPU memory to 7GiB or less. If so, I can run it on GPU.
rnn_size
, num_layers
, seq_length
cannot be modified.
Here is some of the ouputs.
I tensorflow/core/common_runtime/bfc_allocator.cc:689] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 22 Chunks of size 256 totalling 5.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 512 totalling 2.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 7499 Chunks of size 2048 totalling 14.65MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1087 Chunks of size 4096 totalling 4.25MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 4608 totalling 4.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 6144 totalling 6.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 616 Chunks of size 8192 totalling 4.81MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 9984 totalling 9.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 4 Chunks of size 10240 totalling 40.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 2 Chunks of size 12288 totalling 24.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 303 Chunks of size 14336 totalling 4.14MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 198656 totalling 970.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 208384 totalling 203.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 919 Chunks of size 8388608 totalling 7.18GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 10775552 totalling 10.28MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 14428160 totalling 13.76MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 7.23GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats:
Limit: 7967745639
InUse: 7764832256
MaxInUse: 7764842496
NumAllocs: 60834
MaxAllocSize: 14428160W tensorflow/core/common_runtime/bfc_allocator.cc:270] ****************************************************************************************************
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 8.00MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:968] Resource exhausted: OOM when allocating tensor with shape[1024,2048]
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 8.00G (8589934592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 8.00G (8589934592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Sorry for my poor English, and thanks a lot!
Using python 2.7 & the following libraries & the latest 'master' branch. This leads to the error below (which doesn't happen when I roll back to the earlier branch:ae-rnn)
(venv)root@# python train.py
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally
....
File "train.py", line 64, in train
train_loss, state, _ = sess.run([model.cost, model.final_state, model.train_op], feed)
File "/usr/share/venv/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 330, in run
% (subfetch, fetch, type(subfetch), str(e)))
TypeError: Fetch argument [<tf.Tensor 'zeros:0' shape=(?, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_1:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_2:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_3:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_4:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_5:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_6:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_7:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_8:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_9:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_10:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_11:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_12:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_13:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_14:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_15:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_16:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_17:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_18:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_19:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_20:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_21:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_22:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_23:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_24:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_25:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_26:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_27:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_28:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_29:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_30:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_31:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_32:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_33:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_34:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_35:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_36:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_37:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_38:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_39:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_40:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_41:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_42:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_43:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_44:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_45:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_46:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_47:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_48:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_49:0' shape=(50, 512) dtype=float32>] of [<tf.Tensor 'zeros:0' shape=(?, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_1:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_2:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_3:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_4:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_5:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_6:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_7:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_8:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_9:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_10:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_11:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_12:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_13:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_14:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_15:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_16:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_17:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_18:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_19:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_20:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_21:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_22:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_23:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_24:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_25:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_26:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_27:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_28:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_29:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_30:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_31:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_32:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_33:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_34:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_35:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_36:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_37:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_38:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_39:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_40:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_41:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_42:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_43:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_44:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_45:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_46:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_47:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_48:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_49:0' shape=(50, 512) dtype=float32>] has invalid type <type 'list'>, must be a string or Tensor. (Can not convert a list into a Tensor or Operation.)
Hi,
I'm using char-rnn
for computing sentence probability, which is main functionality of language modeling. This piece of code feeds sentence chars one by one and finds out probability of correctly predicting next char:
state = self.cell.zero_state(1, tf.float32).eval(session=session)
char_probas = []
input = np.zeros((1, 1))
for c, char in enumerate(sentence[:-1]):
input[0, 0] = vocab[char]
feed = {self.input_data: input, self.initial_state: state}
[probs, state] = session.run([self.probs, self.final_state], feed)
char_probas.append(probs[0][vocab[sentence[c+1]]])
probability = np.mean(char_probas)
It works fine and prefers well written sentences from sick ones. But I think it's not optimized for performance. Is it possible to feed one sequence of chars and receive generation probabilities for each one of them given previous chars? Currently, it seems that data transfer between host and device is a major bottleneck.
I want to see what the activations are for individual neurons of a given layer for a given input character.
Any suggestions?
Hi
I wanted to see whether I could learn things about other types of text but it seems to be problematic. Is there something specific about the headline chapter format of these plays that somehow made it into the model? Curious, when I give it other texts the loss never decreases below 1.2.
Just curious
Andy
This works fine for me with Tensorflow 0.10 but does not with Tensorflow 0.11
Some issue with Tuple. Is anyone else having this issue?
state_is_tuple=true
might be the solution but not sure where to use it in both train.py and model.py
There should be an option to add a bidirectional recurrent neural network using the three core RNN cells.
Hi, sorry for lame question. How i should run subsequent (continue from last point) trains?
And which files should i save? is it enough 'data' and 'save' directories?
Can't test by myself, due to space limit in aws.
These show as modified:
data/tinyshakespeare/data.npy
data/tinyshakespeare/vocab.pkl
model.pyc
sample.pyc
train.pyc
utils.pyc
can you post an example of the exact data you are feeding into the placeholders.
In my tensorflow version 0.12.1, I couldn't use "from tensorflow.contrib import rnn" and "from tensorflow.nn import legacy_seq2seq".
they were moved.
So I used " tf.nn.rnn_cell" and "tf.nn.seq2seq" instead of above.
When I use --init_from=./save to restore my model, it starts to init a new model, is there anything wrong in my command?
python3 train.py --data_dir=./data/mydata --init_from=./save
Hi, I was wondering if someone could confirm my suspicion. I think this code in model.py is not ever used with the way sampling is done currently.
def loop(prev, _):
prev = tf.matmul(prev, softmax_w) + softmax_b
prev_symbol = tf.stop_gradient(tf.argmax(prev, 1))
return tf.nn.embedding_lookup(embedding, prev_symbol)
outputs, last_state = seq2seq.rnn_decoder(inputs, self.initial_state, cell, loop_function=loop if infer else None, scope='rnnlm')
When I change to this, training and sampling seems to work fine
# def loop(prev, _):
# prev = tf.matmul(prev, softmax_w) + softmax_b
# prev_symbol = tf.stop_gradient(tf.argmax(prev, 1))
# return tf.nn.embedding_lookup(embedding, prev_symbol)
outputs, last_state = seq2seq.rnn_decoder(inputs, self.initial_state, cell, scope='rnnlm')
Looking at the source for seq2seq.rnn_decoder, if input has length 1 (which it does when infer == True), the loop function is never used. Am I missing something? It almost looks like this code could replicate this paper.
loading preprocessed files
Traceback (most recent call last):
File "train.py", line 75, in
main()
File "train.py", line 39, in main
train(args)
File "train.py", line 42, in train
data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length)
File "/home/onetipp/char-rnn/char-rnn-tensorflow/utils.py", line 22, in init
self.create_batches()
File "/home/onetipp/char-rnn/char-rnn-tensorflow/utils.py", line 52, in create_batches
ydata[-1] = xdata[0]
IndexError: index 0 is out of bounds for axis 0 with size 0
Hello,
If I understand correctly, the tridimensional inputs
tensor is built by looking up the n-th row of embeddings
for each number in the bidimensional self.input_data
tensor. The rows of embeddings
have the same size as the RNN's internal layers. This seems to be the way to input the different characters to the network.
The Tensorflow variable "embeddings"
has nothing assigned to it explicitly, therefore it is drawn from a uniform distribution each time train.py
is ran. Why is that? I would have expected embeddings to be a matrix of one-hot row vectors, encoding the different characters; and having that mapped to the internal layer by weights as in https://gist.github.com/karpathy/d4dee566867f8291f086 .
Also, printing embeddings
at the end of every run, I notice that its value changes every time.
I would be very grateful if someone would explain to me what is going on here.
Yours truly,
rhaps0dy
hi,
I want to use dynamic_rnn to train my convLSTM, the original data should be videos with dimension: [batch_size, max_time_step, high, width,channel]. But i failed to feed the data to dynamic_rnn.
I get such error:
ValueError: Dimension must be 5 but is 3 for 'transpose' (op: 'Transpose') with input shapes: [16,?,11,40,1], [3].
what should i do to use dynamic rnn?
version: tf 1.0
thanks
Python opens files as bytes by default. It would be nice if TextLoader.preprocess
read the file and encoded it as unicode.
I'll submit a PR.
Thanks.
I am on tensorflow 1.0, however it failed on 0.12 too.
Traceback (most recent call last):
File "train.py", line 114, in <module>
main()
File "train.py", line 48, in main
train(args)
File "train.py", line 98, in train
for i, (c, h) in enumerate(model.initial_state):
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 516, in __iter__
raise TypeError("'Tensor' object is not iterable.")
TypeError: 'Tensor' object is not iterable.
Regards.
Also, dropout doesn't seem to be there. Any way we can use some regularization?
First and foremost thanks to everybody involved in this. I really appreciate the work you are putting into this.
Previously I was using Karpathy's char-rnn but I couldn't get torch running with my gpu after updating my hardware so I was looking for a different solution and that has brought me here. Using Karpathy's rnn I was getting beautiful results with even very small datasets (around 1MB). With your tensorflow implementation the results are not so good and I wonder why. I tried fiddle around with the parameters (rnn_size, num_layers, etc) but the improvements were little or nonexistent.
It would be really cool if you could add some explanatory comments to the different parameters aka how they will affect the result. For me being relatively new to NNs, this would help a lot in getting better results.
Thanks again for your efforts!
I drafted this Dockerfile with parsey mcparseface.
https://github.com/johndpope/DockerParseyMcParsefaceAPI/blob/master/docker/dsparseyapi/Dockerfile
(there's a script that goes off and builds parsey with grpc api - takes > 90 mins on mac)
But you could cherry pick base file.
The interface has been changed from char-rnn, so I didn't set the save checkpoint correctly. I've got a larger nn size started with tensor-rnn and it looks like it won't do a checkpoint till 50 epochs.
It's currently on 36 and is calculating 2 epochs per day (250 buffer), ie another week.
Looking at the learning rate it would have been great to save a checkpoint "on demand" i.e. via keyboard command at certain points, 18 epochs, being the point where the training leveled of, in this case.
What if I'd like to use attention_decoder
instead of rnn_decoder
?
I wonder how to modify outputs, last_state = seq2seq.rnn_decoder(inputs, self.initial_state, cell, loop_function=loop if infer else None, scope='rnnlm')
.
What should attention_states
be?
C02QH2D7G8WM:char-rnn-tensorflow userone$ python train.py
Traceback (most recent call last):
File "train.py", line 3, in
import tensorflow as tf
File "/usr/local/lib/python2.7/site-packages/tensorflow/init.py", line 23, in
from tensorflow.python import *
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/init.py", line 35, in
from tensorflow.core.framework.graph_pb2 import *
File "/usr/local/lib/python2.7/site-packages/tensorflow/core/framework/graph_pb2.py", line 16, in
from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
File "/usr/local/lib/python2.7/site-packages/tensorflow/core/framework/attr_value_pb2.py", line 16, in
from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
File "/usr/local/lib/python2.7/site-packages/tensorflow/core/framework/tensor_pb2.py", line 16, in
from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
File "/usr/local/lib/python2.7/site-packages/tensorflow/core/framework/tensor_shape_pb2.py", line 22, in
serialized_pb=_b('\n,tensorflow/core/framework/tensor_shape.proto\x12\ntensorflow"z\n\x10TensorShapeProto\x12-\n\x03\x64im\x18\x02 \x03(\x0b\x32 .tensorflow.TensorShapeProto.Dim\x12\x14\n\x0cunknown_rank\x18\x03 \x01(\x08\x1a!\n\x03\x44im\x12\x0c\n\x04size\x18\x01 \x01(\x03\x12\x0c\n\x04name\x18\x02 \x01(\tB/\n\x18org.tensorflow.frameworkB\x11TensorShapeProtosP\x01\x62\x06proto3')
TypeError: init() got an unexpected keyword argument 'syntax'
I've searched the repository, read the documentation a few times, and tried invoking python sample.py
on anything that looked remotely interesting in the data directory. Calling python sample.py
works great on the model that's currently in training. How do I call an arbitrary model? Is there some parameter keyword I need to pass along with the filename argument to sample.py? "--sample" seemed like a good bet, but on closer inspection that option doesn't look related.
chars_vocab.pkl model.ckpt-1000.index model.ckpt-3000.meta model.ckpt-62000.data-00000-of-00001 model.ckpt-64000.index
checkpoint model.ckpt-1000.meta model.ckpt-4000.data-00000-of-00001 model.ckpt-62000.index model.ckpt-64000.meta
config.pkl model.ckpt-2000.data-00000-of-00001 model.ckpt-4000.index
(tensorflow) Nobodys-MacBook-Pro:char-rnn-tensorflow kz$ python sample.py model.ckpt-64149.index
usage: sample.py [-h] [--save_dir SAVE_DIR] [-n N] [--prime PRIME]
[--sample SAMPLE]
sample.py: error: unrecognized arguments: model.ckpt-64149.index
sample.py: error: unrecognized arguments: model.ckpt-64149.data-00000-of-00001
(tensorflow) Nobodys-MacBook-Pro:char-rnn-tensorflow kz$ python sample.py -h
usage: sample.py [-h] [--save_dir SAVE_DIR] [-n N] [--prime PRIME]
[--sample SAMPLE]
(tensorflow) Nobodys-MacBook-Pro:save kz$ python ../sample.py --sample model.ckpkt-64149.meta
usage: sample.py [-h] [--save_dir SAVE_DIR] [-n N] [--prime PRIME]
[--sample SAMPLE]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.