Giter Club home page Giter Club logo

Comments (4)

ruclion avatar ruclion commented on August 17, 2024 2

By the way, your code is very beautiful. Even the logs are clean. Thank you for your hard.

from fastspeech.

xcmyz avatar xcmyz commented on August 17, 2024

in train.py:

                if args.frozen_learning_rate:
                    scheduled_optim.step_and_update_lr_frozen(
                        args.learning_rate_frozen)
                else:
                    scheduled_optim.step_and_update_lr()

I am now optimizing the quality of the slower generated waves by train more times.
but:

  1. why always use the learning_rate = 1e-3?
  2. why the batch_size is so small, should I increase it up to my GPU's memory?
    Thank you~
    It's my reimplement of your code's process, may it a little help for us.
    https://blog.csdn.net/u013625492/article/details/103076158
  1. 这里使用frozen的learning rate是测试使用的,因为在debug阶段要测试loss是否在下降,如果采用scheduled learning rate,loss的下降需要更长时间才能观测。在实际的training中,还是使用scheduled learning rate
  2. 你可以去增加你的batch size,这里用较小的batch size也是因为我的服务器的显存比较小🤦‍
  3. 非常感谢你的blog,如果可以的话,我可以把它加到readme里面,供大家参考。

from fastspeech.

ruclion avatar ruclion commented on August 17, 2024

in train.py:

                if args.frozen_learning_rate:
                    scheduled_optim.step_and_update_lr_frozen(
                        args.learning_rate_frozen)
                else:
                    scheduled_optim.step_and_update_lr()

I am now optimizing the quality of the slower generated waves by train more times.
but:

  1. why always use the learning_rate = 1e-3?
  2. why the batch_size is so small, should I increase it up to my GPU's memory?
    Thank you~
    It's my reimplement of your code's process, may it a little help for us.
    https://blog.csdn.net/u013625492/article/details/103076158
  1. 这里使用frozen的learning rate是测试使用的,因为在debug阶段要测试loss是否在下降,如果采用scheduled learning rate,loss的下降需要更长时间才能观测。在实际的training中,还是使用scheduled learning rate
  2. 你可以去增加你的batch size,这里用较小的batch size也是因为我的服务器的显存比较小🤦‍
  3. 非常感谢你的blog,如果可以的话,我可以把它加到readme里面,供大家参考。

谢谢回答~
1.关于frozen或scheduled, 明白了.
2.batch size我会根据情况来调一下, 其实我也动过, 甚至设置到192(根据gpu内存的倍数)过, 不过受限于我对pytorch分布式训练不太理解的原因🤦‍, 内存占用始终只能从2000M=>4000M晃动, 不然就是跑不起来🤦‍:

model = nn.DataParallel(FastSpeech()).to(device)

还有:

batch_size=hp.batch_size**2,

对比原论文是: 64 (16 * 4GPUs), 可能需要我再琢磨下.
不过总的来说batch size调就可以了. 本质上的疑问已经解决了: batch size不是必须为8, 而是可以传统意义上的"越大越好". 我再试试.
3. blog的话还没有跑通中文版~ 等中文的有sample了我更新到这个issue上吧~

思路上:
其实还有个另外的关键问题, 如果特别的只关注于慢语速合成的话, 我测试的1.5倍, 甚至是2.0倍, 都可以从韵律/语气, 甚至是连读的感觉上, 很满意得达到效果, 还是很惊讶的(毕竟变慢的手段如此简单粗暴, 说明可能有内在潜藏的原理吧). 但是硬伤是音质或者说声音有点滋啦感.
比如:
https://drive.google.com/drive/folders/1cgdSarm-AXLAqbfCdAX9VePH9SUYW_2p?usp=sharing

  1. 我能想到是的继续训练FastSpeech, 让它loss小一点, mel谱能更细致些.
  2. 替换更厉害或训练步数更多的WaveGlow或别的Vocoder.
    元芳您怎么看~
    有没有思路优化质量, 让它不仅是科研级别, 可以到商品级别(Tacotron2那样的).
    有空时再回吧~ thank you!

from fastspeech.

xcmyz avatar xcmyz commented on August 17, 2024

in train.py:

                if args.frozen_learning_rate:
                    scheduled_optim.step_and_update_lr_frozen(
                        args.learning_rate_frozen)
                else:
                    scheduled_optim.step_and_update_lr()

I am now optimizing the quality of the slower generated waves by train more times.
but:

  1. why always use the learning_rate = 1e-3?
  2. why the batch_size is so small, should I increase it up to my GPU's memory?
    Thank you~
    It's my reimplement of your code's process, may it a little help for us.
    https://blog.csdn.net/u013625492/article/details/103076158
  1. 这里使用frozen的learning rate是测试使用的,因为在debug阶段要测试loss是否在下降,如果采用scheduled learning rate,loss的下降需要更长时间才能观测。在实际的training中,还是使用scheduled learning rate
  2. 你可以去增加你的batch size,这里用较小的batch size也是因为我的服务器的显存比较小🤦‍
  3. 非常感谢你的blog,如果可以的话,我可以把它加到readme里面,供大家参考。

谢谢回答~
1.关于frozen或scheduled, 明白了.
2.batch size我会根据情况来调一下, 其实我也动过, 甚至设置到192(根据gpu内存的倍数)过, 不过受限于我对pytorch分布式训练不太理解的原因🤦‍, 内存占用始终只能从2000M=>4000M晃动, 不然就是跑不起来🤦‍:

model = nn.DataParallel(FastSpeech()).to(device)

还有:

batch_size=hp.batch_size**2,

对比原论文是: 64 (16 * 4GPUs), 可能需要我再琢磨下.
不过总的来说batch size调就可以了. 本质上的疑问已经解决了: batch size不是必须为8, 而是可以传统意义上的"越大越好". 我再试试.
3. blog的话还没有跑通中文版~ 等中文的有sample了我更新到这个issue上吧~

思路上:
其实还有个另外的关键问题, 如果特别的只关注于慢语速合成的话, 我测试的1.5倍, 甚至是2.0倍, 都可以从韵律/语气, 甚至是连读的感觉上, 很满意得达到效果, 还是很惊讶的(毕竟变慢的手段如此简单粗暴, 说明可能有内在潜藏的原理吧). 但是硬伤是音质或者说声音有点滋啦感.
比如:
https://drive.google.com/drive/folders/1cgdSarm-AXLAqbfCdAX9VePH9SUYW_2p?usp=sharing

  1. 我能想到是的继续训练FastSpeech, 让它loss小一点, mel谱能更细致些.
  2. 替换更厉害或训练步数更多的WaveGlow或别的Vocoder.
    元芳您怎么看~
    有没有思路优化质量, 让它不仅是科研级别, 可以到商品级别(Tacotron2那样的).
    有空时再回吧~ thank you!
  1. 这里使用hp.batch_size**2的目的是为了对hp.batch_size进行从大到小的排序,再去划分为hp.batch_size个组,这样每次的transformer的长度都可以变小,显著加快了training的速度;
  2. 至于让FastSpeech的效果变得更好,再往后训练的效果可能提升会很小,我之后有一个想法是利用RL,由于length regulator的target是Tacotron2或者是TransformerTTS,而上述两个模型的参数分布和FastSpeech的参数分布有不同,必然会导致FastSpeech的performance不如Tacotron2和TransformerTTS,如果可以利用RL,去除原来的target,将合成的mel spec的质量作为reward,让模型自主去寻找length regulator的target,可能会取得更好的效果,可以去尝试一下(如果有更多想讨论的,欢迎去知乎私信我)

from fastspeech.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.