rishikksh20 / istftnet-pytorch Goto Github PK

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

License: Apache License 2.0

Python 100.00%

istftnet-pytorch's Issues

Single frequency line problem

Thanks for the implemention of ISTFT. It has better inference speed than hifigan v1.However, I found that there is a single frequency line which would cause little noise.I use 16KHZ dataset for training.And all the line is extractly at 4k which is the middle of the all frequency.I'm trying to fix this problem, do you have the same problem?

Different sample rate

Hi @rishikksh20 , thanks for your work.

I have a question. If I want to use the 16K sampling rate, how do I modify the configuration file?
It should not just modify sampling_rate in json.

Fix TypeError: 'torch.device' object is not callable

As the issue #1, the line 164 in stft.py was changed to

iSTFTNet-pytorch/stft.py

Line 164 in e928a6b

 window_sum = window_sum.to(inverse_transform.device()) if magnitude.is_cuda else window_sum 

But inverse_transform.device() will raise the exception mentioned in the title. So it can be changed to inverse_transform.device to fix the problem.

A multi-gpu training bug

stft.py line 164->165:
window_sum = window_sum.cuda() if magnitude.is_cuda else window_sum
inverse_transform[:, :, approx_nonzero_indices] /= window_sum[approx_nonzero_indices],
would get errors . Because, inverse_transform might in cuda1 while window_sum in cuda0.
Change line 164 to window_sum = window_sum.to(inverse_transform.device()) if magnitude.is_cuda else window_sum will fix the problem.

The output channels of the final convolutional layer

iSTFTNet-pytorch/models.py

Line 97 in ecbf0f6

self.conv_post = weight_norm(Conv1d(ch, self.post_n_fft + 2, 7, 1, padding=3))

iSTFTNet-pytorch/config_v1.json

Line 16 in ecbf0f6

"gen_istft_n_fft": 16,

Why is fs 16?

A sample as good as HiFiGAN

tks very much!!!!

iSTFT_sample.mp4

can STFT module convert to onnx format ?

hi. does this repo implements tinyVITS?

https://arxiv.org/abs/2206.00208

Directly model complex numbers

Has anyone tried to directly model the complex numbers instead of the phase and magnitude? What would be the problem if we model the real and imaginary parts directly?

RuntimeError: istft input and window must be on the same device but got self on cuda:0 and window on cpu

My command to run:

python3 train.py --config config_v1.json --input_wavs_dir /home/yehor/iSTFTNet-pytorch/lada_wavs --input_training_file /home/yehor/iSTFTNet-pytorch/training_list.txt --input_validation_file /home/yehor/iSTFTNet-pytorch/validation_list.txt

Error:

...        (2): Conv1d(128, 128, kernel_size=(11,), stride=(1,), padding=(5,))
      )
    )
  )
  (conv_post): Conv1d(128, 18, kernel_size=(7,), stride=(1,), padding=(3,))
  (reflection_pad): ReflectionPad1d((1, 0))
)
checkpoints directory :  cp_hifigan
Epoch: 1
/home/yehor/.local/lib/python3.8/site-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/home/yehor/.local/lib/python3.8/site-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/home/yehor/.local/lib/python3.8/site-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/home/yehor/.local/lib/python3.8/site-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
Traceback (most recent call last):
  File "train.py", line 280, in <module>
    main()
  File "train.py", line 276, in main
    train(0, a, h)
  File "train.py", line 126, in train
    y_g_hat = stft.inverse(spec, phase)
  File "/home/yehor/iSTFTNet-pytorch/stft.py", line 198, in inverse
    inverse_transform = torch.istft(
RuntimeError: istft input and window must be on the same device but got self on cuda:0 and window on cpu

window_sum in stft is just a constant?

I print the window_sum in stft, line: 155， find that the value will a constant, except for the former and latter padding positions. the window function only plays the role of linear scaling. Does this result meet the windowing expectations?

how about the quality of this net

Have you got good audio?

Pretrained models

Hello, thank you very much for this repo

Can you please provide pre-trained models for tests?

Predicted phase not in range [-pi .. pi], but in range [-1 .. 1]

The phase output of the generator currently can only range from -1 to 1, which is not enough as full phase in radians is expected later in stft.inverse() (either 0..2*pi or -pi..pi).

The paper mentions somewhat cryptically that "we apply a sine activation function to represent the periodic characteristics of the phase spectrogram", but in any regard the current implementation is faulty since it can not represent the full range of possible phases.

iSTFTNet-pytorch/models.py

Line 118 in ecbf0f6

phase = torch.sin(x[:, self.post_n_fft // 2 + 1:, :])

As a suggestion, either try scaling the output by 2*pi, or directly predicting sin(phase) and cos(phase) in the generator (the predicted values can be normalized by dividing both by sin(phase)**2 + cos(phase)**2).

How about the audio quality?

Hi, thanks to the implement, the inference speed is impressive. How about the audio quality? And have you tried v2 config? Thanks in advance.

rishikksh20 / istftnet-pytorch Goto Github PK

istftnet-pytorch's Issues

Single frequency line problem

Different sample rate

Fix TypeError: 'torch.device' object is not callable

A multi-gpu training bug

The output channels of the final convolutional layer

A sample as good as HiFiGAN

can STFT module convert to onnx format ?

hi. does this repo implements tinyVITS?

Directly model complex numbers

RuntimeError: istft input and window must be on the same device but got self on cuda:0 and window on cpu

window_sum in stft is just a constant?

how about the quality of this net

Pretrained models

Predicted phase not in range [-pi .. pi], but in range [-1 .. 1]

How about the audio quality?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent