rishikksh20 / istftnet-pytorch Goto Github PK
View Code? Open in Web Editor NEWiSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
License: Apache License 2.0
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
License: Apache License 2.0
Thanks for the implemention of ISTFT. It has better inference speed than hifigan v1.However, I found that there is a single frequency line which would cause little noise.I use 16KHZ dataset for training.And all the line is extractly at 4k which is the middle of the all frequency.I'm trying to fix this problem, do you have the same problem?
Hi @rishikksh20 , thanks for your work.
I have a question. If I want to use the 16K sampling rate, how do I modify the configuration file?
It should not just modify sampling_rate in json.
stft.py line 164->165:
window_sum = window_sum.cuda() if magnitude.is_cuda else window_sum
inverse_transform[:, :, approx_nonzero_indices] /= window_sum[approx_nonzero_indices],
would get errors . Because, inverse_transform might in cuda1 while window_sum in cuda0.
Change line 164 to window_sum = window_sum.to(inverse_transform.device()) if magnitude.is_cuda else window_sum will fix the problem.
Has anyone tried to directly model the complex numbers instead of the phase and magnitude? What would be the problem if we model the real and imaginary parts directly?
My command to run:
python3 train.py --config config_v1.json --input_wavs_dir /home/yehor/iSTFTNet-pytorch/lada_wavs --input_training_file /home/yehor/iSTFTNet-pytorch/training_list.txt --input_validation_file /home/yehor/iSTFTNet-pytorch/validation_list.txt
Error:
... (2): Conv1d(128, 128, kernel_size=(11,), stride=(1,), padding=(5,))
)
)
)
(conv_post): Conv1d(128, 18, kernel_size=(7,), stride=(1,), padding=(3,))
(reflection_pad): ReflectionPad1d((1, 0))
)
checkpoints directory : cp_hifigan
Epoch: 1
/home/yehor/.local/lib/python3.8/site-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/home/yehor/.local/lib/python3.8/site-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/home/yehor/.local/lib/python3.8/site-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/home/yehor/.local/lib/python3.8/site-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
Traceback (most recent call last):
File "train.py", line 280, in <module>
main()
File "train.py", line 276, in main
train(0, a, h)
File "train.py", line 126, in train
y_g_hat = stft.inverse(spec, phase)
File "/home/yehor/iSTFTNet-pytorch/stft.py", line 198, in inverse
inverse_transform = torch.istft(
RuntimeError: istft input and window must be on the same device but got self on cuda:0 and window on cpu
I print the window_sum in stft, line: 155๏ผ find that the value will a constant, except for the former and latter padding positions. the window function only plays the role of linear scaling. Does this result meet the windowing expectations?
Have you got good audio?
Hello, thank you very much for this repo
Can you please provide pre-trained models for tests?
The phase output of the generator currently can only range from -1 to 1, which is not enough as full phase in radians is expected later in stft.inverse()
(either 0..2*pi or -pi..pi).
The paper mentions somewhat cryptically that "we apply a sine activation function to represent the periodic characteristics of the phase spectrogram", but in any regard the current implementation is faulty since it can not represent the full range of possible phases.
Line 118 in ecbf0f6
As a suggestion, either try scaling the output by 2*pi, or directly predicting sin(phase) and cos(phase) in the generator (the predicted values can be normalized by dividing both by sin(phase)**2 + cos(phase)**2
).
Hi, thanks to the implement, the inference speed is impressive. How about the audio quality? And have you tried v2 config? Thanks in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.