karim23657 / persian-tts-coqui Goto Github PK

View Code? Open in Web Editor NEW

91.0 5.0 16.0 31 KB

Persian/Farsi text to speech(TTS) training using coqui tts

License: MIT License

Jupyter Notebook 83.69% Python 16.31%

farsi farsi-datasets persian persian-dataset persian-language text-to-speech tts coqui coqui-ai coqui-tts

persian-tts-coqui's People

Contributors

Stargazers

Watchers

Forkers

serahsoltani soebb jubintgh zahraghasemi-ai shahzebali42 shosseini811 akbarazimifar pouriaomrani telvideo bijangit rezalahmi erfan1996 zahraashkani mahboube-askarian a-kasra toozande

persian-tts-coqui's Issues

[W NNPACK.cpp:64] Could not initialize NNPACK! Reason: Unsupported hardware.

سلام . من روی سرور مجازی نصب کردم موقع اجرای دستور تبدیل متن به گفتار خطای زیر میده

[W NNPACK.cpp:64] Could not initialize NNPACK! Reason: Unsupported hardware.

ممنون میشم راهنمایی کنید .

البته تبدیل با موفقیت انجام میشه

problem in train

hi ,
i have a error when i ran the last line of https://github.com/karim23657/Persian-tts-coqui/blob/main/recepies/glowtts/01-glowtts-train.ipynb

and it is :
AssertionError: 24000 vs 22050

During handling of the above exception, another exception occurred:

SystemExit Traceback (most recent call last)
[... skipping hidden 1 frame]

SystemExit: 1

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
[... skipping hidden 1 frame]

/usr/local/lib/python3.10/dist-packages/IPython/core/ultratb.py in find_recursion(etype, value, records)
380 # first frame (from in to out) that looks different.
381 if not is_recursion_error(etype, value, records):
--> 382 return len(records), 0
383
384 # Select filename, lineno, func_name to track frames with

TypeError: object of type 'NoneType' has no len()

how can i fix that ????

متوقف شدن fine-tune در اپک های نهایی

وقتی از مدل vista برای train و fine-tune استفاده میکنیم ، در اپک های نهایی مانند ۹۴۰از ۱۰۰۰ اپک مورد نظر متوقف میشود و دیگر ادامه نمیدهد.
این اتفاق هر بار تکرار میشود و در اپک های مختلفی مانند ۹۲۰ و یا ۹۴۰ متوقف میشود و هر بار لاگ usage cpu , gpu , RAM مورد برسی و رصد قرار میگیرد اما،
هیچکدام دارای انامولی یا ناهنجاری نیستند و از نصف هم بیشتر نمیشوند.
ما برای عملیات train از کد زیر استفاده کرده ایم.

import os

from trainer import Trainer, TrainerArgs

from TTS.tts.configs.shared_configs import BaseDatasetConfig , CharactersConfig
from TTS.config.shared_configs import BaseAudioConfig
from TTS.tts.configs.vits_config import VitsConfig
from TTS.tts.datasets import load_tts_samples
from TTS.tts.models.vits import Vits, VitsAudioConfig
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.utils.audio import AudioProcessor

output_path = "./TTSCoqui/V11/Outputs"

dataset_config = BaseDatasetConfig(
    formatter="mozilla", meta_file_train="metadata.csv", path="./TTSCoqui/Data/dataset" 
)

audio_config = BaseAudioConfig(
    sample_rate=22050,
    do_trim_silence=True,
    resample=False,
    mel_fmin=0,
    mel_fmax=None 
)

character_config=CharactersConfig(
    characters='ءابتثجحخدذرزسشصضطظعغفقلمنهويِپچژکگیآأؤإئ',
    punctuations='!(),-.:;? ̠،؛؟‌<>',
    phonemes='ˈˌːˑpbtdʈɖcɟkɡqɢʔɴŋɲɳnɱmʙrʀⱱɾɽɸβfvθðszʃʒʂʐçʝxɣχʁħʕhɦɬɮʋɹɻjɰlɭʎʟaegiouwyɪʊ̩æɑɔəɚɛɝɨ̃ʉʌʍ0123456789"#$%*+/=ABCDEFGHIJKLMNOPRSTUVWXYZ[]^_{}',
    pad="<PAD>",
    eos="<EOS>",
    bos="<BOS>",
    blank="<BLNK>",
    characters_class="TTS.tts.utils.text.characters.IPAPhonemes",
)

config = VitsConfig(
    audio=audio_config,
    run_name="vits_fa_female_finetune",
    batch_size=8,
    eval_batch_size=4,
    batch_group_size=5,
    num_loader_workers=0,
    num_eval_loader_workers=2,
    run_eval=True,
    test_delay_epochs=-1,
    epochs=1000,
    save_step=1000,
    text_cleaner="basic_cleaners",
    use_phonemes=True,
    phoneme_language="fa",
    characters=character_config,
    phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
    compute_input_seq_cache=True,
    print_step=25,
    print_eval=True,
    mixed_precision=False,
    test_sentences=[
        ["سلام وقت بخیر عزیزی هستم چطور میتونم کمکتون کنم"],
        ["من بهتون اطلاع میدم عذرخواهی می کنم"],
        ["خواهش می کنم خداحافظ"],
        ["الو صدای بنده رو دارید فرمایید"]
    ],
    output_path=output_path,
    datasets=[dataset_config],
    lr = 0.00001,
)

# INITIALIZE THE AUDIO PROCESSOR
# Audio processor is used for feature extraction and audio I/O.
# It mainly serves to the dataloader and the training loggers.
ap = AudioProcessor.init_from_config(config)

# INITIALIZE THE TOKENIZER
# Tokenizer is used to convert text to sequences of token IDs.
# config is updated with the default characters if not defined in the config.
tokenizer, config = TTSTokenizer.init_from_config(config)

# LOAD DATA SAMPLES
# Each sample is a list of ```[text, audio_file_path, speaker_name]```
# You can define your custom sample loader returning the list of samples.
# Or define your custom formatter and pass it to the `load_tts_samples`.
# Check `TTS.tts.datasets.load_tts_samples` for more details.
train_samples, eval_samples = load_tts_samples(
    dataset_config,
    eval_split=True,
    eval_split_max_size=config.eval_split_max_size,
    eval_split_size=config.eval_split_size,
)

# init model
model = Vits(config, ap, tokenizer, speaker_manager=None)

# init the trainer and 🚀
trainer = Trainer(
    TrainerArgs(),
    config,
    output_path,
    model=model,
    train_samples=train_samples,
    eval_samples=eval_samples,
)
trainer.fit()

منابع مورد استفاده:

- Python version 3.9.18
- CUDA Version: 12.2
- GPU RTX 4060 Ti
- GPU Memory 16G
- PyTorch version 2.2.0+cu121

Share your TensorBoard logs

Would you please share your TensorBoard logs?
I want to check if my training procedure is going well by comparing the losses of my training with yours.
Thanks a lot!

مشکل در تبدیل حرف "و"

سلام

من دارم از https://huggingface.co/Kamtera/persian-tts-male-vits استفاده میکنم
وقتی فقط "و" رو خروجی میگیرم 8 ثانیه نویز خروجی میده
99d151ef66088fafaa4168dc0f65a878.zip

درخواست پیاده سازی روی سرور

سلام. من نیاز دارم این سرویس رو روی سرورم پیاده سازی کنین که به صورت افلاین بتونم ازش استفاده کنم. اگر بتونید اموزش بدین تا خودم train کنم و صداهای بیشتری به برنامه اضافه کنم ممنون میشم. زحمت کشیدین برای ساخت این برنامه و ممنون میشم کمکم کنین
در خصوص هزینه ها و وقتی که میذارید هم باهم کنار میایم. ممنون میشم جواب بدین.

Share your models

If you trained a model you can share it here.

Add support for Python 3.11

مشکل در ایجاد کد پایتون تی تی اس TTS

ببینید من یک مدل 300 مگابایتی دارم که اصلا به درد نمیخوره اصلا متن فارسی رو پردازش نمیکنه فقط اعداد فارسی رو میخونه , و متن انگلیسی رو هم انگار که داره خفه میشه
کاری ندارم

یک مدل 1 گیگابایتی دارم که اونم اصلا همش ارور میده در خروجی
from TTS.config import load_config
from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer
basepath="C:\Users\computer\Desktop\Persian_TTS_models\1_73_GB"
config=basepath+"\config.json"
model=basepath+"\model"
model_path =model # Absolute path to the model checkpoint.pth
config_path =config # Absolute path to the model config.json
text=".زندگی فقط یک بار است؛ از آن به خوبی استفاده کن"
synthesizer = Synthesizer(model_path, config_path)
wavs = synthesizer.tts(text)
synthesizer.save_wav(wavs, 'C:\Users\computer\Desktop\output.wav')

علت اینکه در متغییر مدل من فقط نوشم مدل اینه که یک پوشه ای به نام مدل هست که خودش model.pth رو در اون پیدا میکنه
و همش میگه یک زبان انتخاب کن و زبان رو که وارد میکنم باز یک ارور دیگه میده که نباید زبان وارد کنی
خروجی:

Text splitted to sentences.
['.زندگی فقط یک بار است؛ از آن به خوبی استفاده کن']
Traceback (most recent call last):
File "C:\Users\computer\Desktop\Persian_TTS_models\TTS\vid1.py", line 72, in
wavs = synthesizer.tts(text)
File "C:\Program Files\Python310\lib\TTS\utils\synthesizer.py", line 378, in tts
outputs = self.tts_model.synthesize(
File "C:\Program Files\Python310\lib\TTS\tts\models\xtts.py", line 392, in synthesize
return self.inference_with_config(text, config, ref_audio_path=speaker_wav, language=language, **kwargs)
File "C:\Program Files\Python310\lib\TTS\tts\models\xtts.py", line 399, in inference_with_config
"zh-cn" if language == "zh" else language in self.config.languages
AssertionError: ❗ Language is not supported. Supported languages are ['en', 'es', 'fr', 'de', 'it', 'pt', 'pl', 'tr', 'ru', 'nl', 'cs', 'ar', 'zh-cn', 'hu', 'ko', 'ja', 'hi']

واضح نبودن کار با فایل config.json

سلام.
مدل همه چیزش عالی هست ولی من نیاز به یه سری تنظیمات شخصی دارم.
مثلا اینکه یکم اروم تر کلمات رو بخونهَ یا کثلا وقتی به علامت نگارششی مثل : ؟ - ! - : - . و ... میرسه این هارو نخونه.
اگه امکانش هست یکم فایل کافنیگ رو تنظیماتش رو واضح تر بفرمایید.
خیلی ممنون از زحماتتون.

Preparing phoneme based dataset.

i'm dealing with Arabic text mapped to phoneme using my grapheme to phonemes model
eg: این مخزن شامل نمونه mapped to ' E N - M KH Z N - SH AE M L - N M W N HH '.
my phonemes list is the following: pho_ids = {'-':0, ' ZH':1, 'AE':2, 'SS':3, 'AE':4,'IY':5,.....,'eos': 55} where i have two letters representing one phoneme.

character_config=CharactersConfig(
  characters='ءابتثجحخدذرزسشصضطظعغفقلمنهويِپچژکگیآأؤإئ',
  punctuations='!(),-.:;? ̠،؛؟‌<>',
  phonemes='ˈˌːˑpbtdʈɖcɟkɡqɢʔɴŋɲɳnɱmʙrʀⱱɾɽɸβfvθðszʃʒʂʐçʝxɣχʁħʕhɦɬɮʋɹɻjɰlɭʎʟaegiouwy',
  pad="<PAD>",
  eos="<EOS>",
  bos="<BOS>",
  blank="<BLNK>",
  characters_class="TTS.tts.utils.text.characters.IPAPhonemes",
  )

I want to fix character_config to make it suits my experiment.
Many thanks

مشکل در نصب tts توی کولب

من توی نصب tts توی کولب این خطا میگیرم چیکار باید بکنم

Share your datasets

If you'v created a dataset or found any good datasets you can share with us here.

Voice Cloning With YourTTS

Hi Karim,
I want to use Model for Voice Cloning Task, could you help me how to use it?
for example :
https://huggingface.co/spaces/ismot/1802t1
best Regards.

append android tts version for more usability

train مدل XTTS

برای train کردن مدل xtts کدوم Dataset رو پیشنهاد می کنید ؟

فتحه و کسره

سلام خسته نباشید. من توی هوش مصنوعی lovo ai یا همون genny از tts فارسی استفاده میکنم که فکر کنم با همین مدل هاست، یعنی همین صدای مرد و زن رو داره، ولی این سایت یه فرقی که داره اینه تقریبا 90 درصد مواقع فتحه کسره و ضمه رو متوجه میشه و تلفظ میکنه.

نمیشه توی این coqui tts این کارو کرد؟

Hi, Multi Speaker Tutorial

Hi Karim,
How to use Multi Speaker ViTS Train.py (Kamtera/persian-tts-multispeaker-vits) for training Multi Speaker or Fine-Tuning Model?
could you help me?
Best Regard.

Error while training

Hello! I am trying to train vits using the same dataset and recipes you have provided exactly, only thing I have changed is the paths in the recipes for the datasets. But for some reason I am getting this error:

/workspace/TTS/TTS/tts/models/vits.py:1454: UserWarning: The use of `x.T` on tensors of dimension other than 2 to reverse their shape is deprecated and it will throw an error in a future release. Consider `x.mT` to transpose batches of matrices or `x.permute(*torch.arange(x.ndim - 1, -1, -1))` to reverse the dimensions of a tensor. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3571.)
  test_figures["{}-alignment".format(idx)] = plot_alignment(alignment.T, output_fig=False)
 ! Run is removed from /workspace/Persian-tts-coqui/recepies/vits/vits_fa_female-April-24-2023_09+16AM-9a4e5f8
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/trainer/trainer.py", line 1591, in fit
    self._fit()
  File "/opt/conda/lib/python3.8/site-packages/trainer/trainer.py", line 1548, in _fit
    self.test_run()
  File "/opt/conda/lib/python3.8/site-packages/trainer/trainer.py", line 1466, in test_run
    test_outputs = self.model.test_run(self.training_assets)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/TTS/TTS/tts/models/vits.py", line 1454, in test_run
    test_figures["{}-alignment".format(idx)] = plot_alignment(alignment.T, output_fig=False)
  File "/workspace/TTS/TTS/tts/utils/visual.py", line 18, in plot_alignment
    im = ax.imshow(
  File "/opt/conda/lib/python3.8/site-packages/matplotlib/__init__.py", line 1447, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/matplotlib/axes/_axes.py", line 5523, in imshow
    im.set_data(X)
  File "/opt/conda/lib/python3.8/site-packages/matplotlib/image.py", line 711, in set_data
    raise TypeError("Invalid shape {} for image data"
TypeError: Invalid shape (10,) for image data```

Could you help me figure out what the issue is? Thank you, and great work on the model!

تولید نویز و زمان طولانی اینفرنس

سلام
چرا خروجی نویزه؟ هر بار اینفرنس هم ۱۰ دقسقه طول میکشه؟
اگه امکان داره یه راهنمای اینفرس جامع تر قرار بدید ممنون
@karim23657

!tts --text "زندگی فقط یک بار است؛ از آن به خوبی استفاده کن" \ --config_path "/content/persian-tts-female-tacotron2/config-0.json" \ --model_path "/content/persian-tts-female-tacotron2/best_model_305416.pth" \ --vocoder_config_path "/content/persian-tts-female-Hifigan/config.json" \ --vocoder_path "/content/persian-tts-female-Hifigan/best_model_222302.pth" \ --out_path "speech2.wav"