atomicoo / fch-tts Goto Github PK

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

License: MIT License

Python 100.00%

tts english tibetan mandarin japanese russian dctts tacotron fastspeech korean

fch-tts's People

Contributors

Stargazers

Watchers

fch-tts's Issues

Data Error！！！

dear atomicoo：
There is error url : https://open-speech-data.oss-cn-hangzhou.aliyuncs.com, can not download data when run prepare_dataset.py

and could U pls share the the structure of directory “datasets” ，it's differece between your script
dataset_path = osp.join(datasets_path, dataset_dir)
wavfile_path = osp.join(dataset_path, "wavs")
melspec_path = osp.join(dataset_path, "mels")
and office data of BiaoBei
PhoneLabeling
ProsodyLabeling
Wave

请问有开源的中文模型吗？

请问中文的模型开源了吗

MelGan 模型

请问有普通话的bbspeech-melgan-epoch*.pth 预训练模型吗或者是要跑哪个脚本自己训练

合成速度有办法加速吗？

比如tflite，比如换一个backbone是否有方法可以加速一下

请问给定合成语音和文本，可以对语音的质量评分吗？

如题，我已经同过其他方法生成语音。

日文转语音

你好，请问模型有日文的吗？我看现在项目上的模型是ljspeech的，用这个来转日文是会报错的，报错如下：
RuntimeError: Calculated padded input size per channel: (5). Kernel size: (7). Kernel size can't be greater than actual input size
假如要自己训练日文模型，下载 JPSpeech-1.1.tar.bz2时失败，链接无法打开，有其他方式可以获取吗？目前只能下载 jsut_ver1.1 数据集，里面只有wav音频，下载对应的basic5000的lab标注信息，并没有metedata.csv文件

Synthesize - MelGan: Run out of Memory with CUDA

Env: （Nvidia T4，torch 1.9.0）
Tried the quick start with steps:
$ conda create -n ParallelTTS python=3.7.9
$ conda activate ParallelTTS
$ pip install -r requirements.txt
$ python synthesize.py
--checkpoint ./pretrained/ljspeech-parallel-epoch0100.pth
--melgan_checkpoint ./pretrained/ljspeech-melgan-epoch3200.pth
--input_texts ./samples/english/synthesize.txt
Failed with:

我根据bbspeech.yaml的配置自己训练了中文模型，但是效果不好，不出声音

我根据bbspeech.yaml的配置自己训练了中文模型，但是效果不好，不出声音。

GPU ：3060RTX12G
时长：24小时
数据集：Baker
BS：64

目前不知道是不是训练的步骤有问题，导致没有效果

untimeError

untimeError: Calculated padded input size per channel: (5). Kernel size: (7). Kernel size can't be greater than actual input size

藏语文本的处理

您好！很兴奋能够看到有一个支持藏语语音合成的tts系统，我很好奇您是如何处理藏语文本的。具体来说就是您是使用了藏语字符作为输入，还是说是将藏语转成了音素进行输入？如果转成音素的话，您是基于规则做的，使用的是由了lexion还是您选择了别的方法转成了音素？最后关于藏语转成音素以及对应声调的问题，您这边处理了吗，或者说是您这边有什么好的方法进行藏语转音素以及对应的声调呢！

请问能否公开一下 TWLSpeech 数据集

如题

train-parallel.py 训练中有错误

Traceback (most recent call last):
File "/home/gaol/codes/Voices/FCH-TTS/train-parallel.py", line 69, in
loggers=loggers
File "/home/gaol/codes/Voices/FCH-TTS/helpers/trainer.py", line 319, in fit
valid_losses = self._validate(valid_loader)
File "/home/gaol/codes/Voices/FCH-TTS/helpers/trainer.py", line 419, in _validate
loss.item(), l1_loss.item(), ssim_loss.item(), drn_loss.item()
AttributeError: 'float' object has no attribute 'item'

atomicoo / fch-tts Goto Github PK

fch-tts's People

Contributors

Stargazers

Watchers

Forkers

fch-tts's Issues

Recommend Projects

Recommend Topics

Recommend Org