felixkreuk / hideandspeak Goto Github PK

View Code? Open in Web Editor NEW

32.0 32.0 6.0 27 KB

Python 100.00%

hideandspeak's People

Contributors

Stargazers

Watchers

Forkers

dongsig muzihuole jccc0328 eagomez2 ikm565 lwz519

hideandspeak's Issues

pretained model

I'm interested in your work and want to follow it. If you can provide pre-training models and audio sample(the link is valid) , I can directly test your samples. Thanks

Hello, I'm a graduate student from China, I'm interested in your paper and would like to try to reproduce it, but I've searched for a long time on the internet but I can't find the YOHO dataset, can you provide me with this dataset? I promise to use it for private use only, thank you very much!

Inquiry about Message Retrieval in Time Domain and SNR Calculation

Hi Felix,

I hope this message finds you well. I came across your repository and found your paper implementation to be very interesting. I'm quite intrigued by your research and would like to follow your work more closely. In this regard, I have a question regarding the calculation of the Signal-to-Noise Ratio (SNR) mentioned in the paper.

While examining your code, I noticed that it provides two methods for waveform recovery: one using the original phase and another using the griff-lim algorithm. However, the paper itself does not explicitly mention how the message is retrieved from the spectrogram and returned to the time domain waveform. I have performed some experiments using the VCTK dataset with the default settings, and I found that using the original phase method for message waveform recovery resulted in a time domain SNR better than the one mentioned in the paper (14.34 vs 8.76). However, when I utilized the griff-lim algorithm for waveform recovery, the obtained time domain SNR was -2.53. Although the message remains intelligible, there is a significant difference between these two results and the one mentioned in the paper (8.76).

I would greatly appreciate it if you could provide some clarification regarding how the message is returned to the time domain waveform and how the SNR is calculated in the context of your paper. Any additional insights or guidance you can provide would be highly valuable to me.

Thank you for your time and consideration. I look forward to your response.

STFT and exploding training

Hi Felix,

First of all thanks for sharing this repo, it's an amazingly interesting work. Since the datasets you used to my knowledge are paid and only at 16k and 8k, respectively, I wanted to train your network using the VCTK dataset. For training it, I only chose small files with high activity of the trimmed version of it to avoid long sections of silence. However, when I do so, the training goes well for a few epochs and the message becomes intelligible very fast, but then the loss (I am using L1) explodes to a value order of magnitudes higher as shown here. Have you observed this pattern during your trainings?

As a second question, I wanted to know what you used your own STFT implementation as opposed to the torchaudio ones that allow backprop.

Thanks!

felixkreuk / hideandspeak Goto Github PK

hideandspeak's People

Contributors

Stargazers

Watchers

Forkers

hideandspeak's Issues

pretained model

about YOHO dataset

Inquiry about Message Retrieval in Time Domain and SNR Calculation

STFT and exploding training

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent