Giter Club home page Giter Club logo

hideandspeak's People

Contributors

felixkreuk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

hideandspeak's Issues

pretained model

I'm interested in your work and want to follow it. If you can provide pre-training models and audio sample(the link is valid) , I can directly test your samples. Thanks

about YOHO dataset

Hello, I'm a graduate student from China, I'm interested in your paper and would like to try to reproduce it, but I've searched for a long time on the internet but I can't find the YOHO dataset, can you provide me with this dataset? I promise to use it for private use only, thank you very much!

Inquiry about Message Retrieval in Time Domain and SNR Calculation

Hi Felix,

I hope this message finds you well. I came across your repository and found your paper implementation to be very interesting. I'm quite intrigued by your research and would like to follow your work more closely. In this regard, I have a question regarding the calculation of the Signal-to-Noise Ratio (SNR) mentioned in the paper.

While examining your code, I noticed that it provides two methods for waveform recovery: one using the original phase and another using the griff-lim algorithm. However, the paper itself does not explicitly mention how the message is retrieved from the spectrogram and returned to the time domain waveform. I have performed some experiments using the VCTK dataset with the default settings, and I found that using the original phase method for message waveform recovery resulted in a time domain SNR better than the one mentioned in the paper (14.34 vs 8.76). However, when I utilized the griff-lim algorithm for waveform recovery, the obtained time domain SNR was -2.53. Although the message remains intelligible, there is a significant difference between these two results and the one mentioned in the paper (8.76).

I would greatly appreciate it if you could provide some clarification regarding how the message is returned to the time domain waveform and how the SNR is calculated in the context of your paper. Any additional insights or guidance you can provide would be highly valuable to me.

Thank you for your time and consideration. I look forward to your response.

STFT and exploding training

Hi Felix,

First of all thanks for sharing this repo, it's an amazingly interesting work. Since the datasets you used to my knowledge are paid and only at 16k and 8k, respectively, I wanted to train your network using the VCTK dataset. For training it, I only chose small files with high activity of the trimmed version of it to avoid long sections of silence. However, when I do so, the training goes well for a few epochs and the message becomes intelligible very fast, but then the loss (I am using L1) explodes to a value order of magnitudes higher as shown here. Have you observed this pattern during your trainings?

As a second question, I wanted to know what you used your own STFT implementation as opposed to the torchaudio ones that allow backprop.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.