kyubyong / expressive_tacotron Goto Github PK
View Code? Open in Web Editor NEWTensorflow Implementation of Expressive Tacotron
Tensorflow Implementation of Expressive Tacotron
I really want to know the audio quality will be achieved after 200 steps.
(tf-gpu) [pranaw@login expressive_tacotron-master]$ python train.py
Traceback (most recent call last):
File "train.py", line 101, in
g = Graph(); print("Training Graph loaded")
File "train.py", line 35, in init
self.x, self.y, self.z, self.fnames, self.num_batch = get_batch()
File "/home/pranaw/TTS/expressive_tacotron-master/data_load.py", line 65, in get_batch
fpaths, texts = load_data() # list
File "/home/pranaw/TTS/expressive_tacotron-master/data_load.py", line 40, in load_data
lines = codecs.open(transcript, 'r', 'utf-8').readlines()
File "/home/pranaw/TTS/anaconda3/envs/tf-gpu/lib/python3.6/codecs.py", line 897, in open
file = builtins.open(filename, mode, buffering)
FileNotFoundError: [Errno 2] No such file or directory: '/data/private/voice/LJSpeech-1.0/metadata.csv'
I have trained the model on LJ Speech for around 670k steps. When using python synthesize.py
command I have received this error.
Graph loaded
2018-04-25 19:19:45.131340: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-04-25 19:19:45.188585: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-25 19:19:45.188789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 114.62MiB
2018-04-25 19:19:45.188802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-04-25 19:19:45.189340: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 114.62M (120193024 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Restored!
0%| | 0/200 [00:00<?, ?it/s]Exception KeyError: KeyError(<weakref at 0x7fc729ad8100; to 'tqdm' at 0x7fc729b2e350>,) in <bound method tqdm.__del__ of 0%| | 0/200 [00:00<?, ?it/s]> ignored
Traceback (most recent call last):
File "synthesize.py", line 69, in <module>
synthesize()
File "synthesize.py", line 59, in synthesize
_y_hat = sess.run(g.y_hat, {g.x: texts, g.y: y_hat, g.ref: ref})
File "/home/m/anaconda3/envs/ttse/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/home/m/anaconda3/envs/ttse/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1096, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (16, 188) for Tensor u'Placeholder:0', which has shape '(32, 188)'
Can you suggest any solution?
Just like the tacotron can someone share the pretrained model(on LJ Speech Dataset) for this as well?
Thank you very much for your contribution. I have trained the model on LJ Speech
for 835k. However, the results are not as good as the samples you provided for 420k. Maybe some problem with my training? Below you can find the attention plot and the sample audio at 835k. What kind of attention plot signals a good checkpoint for the synthesizer?
And the progress was like this:
The samples synthesized from this checkpoint can be found here:
https://www.dropbox.com/sh/n5ld72rn9otxl7a/AAACyplZMtxiYtuUgvWN8OGaa?dl=0
Also, the trained model (checkpoint), is uploaded here:
https://www.dropbox.com/sh/ks91bdputl5ujo7/AABRIqpviRDBgWuFIJn1yuhba?dl=0
Also, I was wondering if you have any plans to release your trained model.
Another thing is the tf.save
keeps the last 5 checkpoints by default, and the wrapper used here (i.e. tf.train.Supervisor
) does not easily allow changing max_to_keep
property of the saver
.
PS. The hyperparameters are kept as default.
# signal processing
sr = 22050 # Sample rate.
n_fft = 2048 # fft points (samples)
frame_shift = 0.0125 # seconds
frame_length = 0.05 # seconds
hop_length = int(sr*frame_shift) # samples.
win_length = int(sr*frame_length) # samples.
n_mels = 80 # Number of Mel banks to generate
power = 1.2 # Exponent for amplifying the predicted magnitude
n_iter = 50 # Number of inversion iterations
preemphasis = .97 # or None
max_db = 100
ref_db = 20
# model
embed_size = 256 # alias = E
encoder_num_banks = 16
decoder_num_banks = 8
num_highwaynet_blocks = 4
r = 5 # Reduction factor.
dropout_rate = .5
# training scheme
lr = 0.001 # Initial learning rate.
logdir = "logdir"
sampledir = 'samples'
batch_size = 32
num_iterations = 1000000
from where shall i run step 3 as mentioned by you :
STEP 3. Run python eval.py regularly during training.
I have checked your sample result of 420k trainning,and tried align the referr sound and the target sound,it seems that they are much different.
So i really confused how the auther of the paper done that.Maybe he used so large data set containing 100+ hours so that it can get good result.
Thanks for your contribution. ๐
I follow the steps of Training: Run python train.py.
However, I got this error: :(
0%| | 0/409 [00:00<?, ?b/s]2018-04-08 11:58:13.819432:
W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: TypeError: a bytes-like object is required, not 'str'
python synthesize.py
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
@Kyubyong I'm able to run train.py successfully, but there is no checkpoints available in logdir, so unable to run synthesize.py
I have changed parameters such as:
lr = 0.9 # Initial learning rate.
logdir = "logdir"
sampledir = 'samples'
batch_size = 16
and also i made some changes in train.py
if name == 'main':
g = Graph(); print("Training Graph loaded")
sv = tf.train.Supervisor(logdir=hp.logdir, save_summaries_secs=60, save_model_secs=0)
with sv.managed_session() as sess:
if len(sys.argv) == 2:
sv.saver.restore(sess, sys.argv[1])
print("Model restored.")
#while 1:
for _ in tqdm(range(g.num_batch), total=g.num_batch, ncols=70, leave=False, unit='b'):
_, gs = sess.run([g.train_op, g.global_step])
# Write checkpoint files
if gs % 100 == 0:
sv.saver.save(sess, hp.logdir + '/model_gs_{}k'.format(gs//100))
# plot the first alignment for logging
al = sess.run(g.alignments)
plot_alignment(al[0], gs)
#if gs > hp.num_iterations:
#break
print("Done")
In addition to rhythm, can this model control the emotion of the audio? Is there an example?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.