Hi,
Do you have the links to the datasets you use? I am new to these datasets, but the paper is very interesting. I want to reproduce the results. However, it is not easy to find them and how you change the downloaded dataset to the formatted dataset. Though the dataset structure is described in README, it is still not clear to me that how a formatted dataset should be. Do you mind elaborating on that a little bit, like giving us a sample dataset in the repo?
Thanks!
Thanks for your excellent code. I am trying to reproduce your results, however, the training will stuck on loading some item into cache. Below is the running output log:
Do you have any idea about that? The program stuck on this line: self.update_next_cache_item(self.communication_queue.get())
I know this is a bit late, it would be nice if a pretrained model were available for download, to easily recreate the original results and for use on custom audio.
In lieu of that, I'm trying to recreate the experiment, but I'm having some difficulty. Although the readme helpfully explains what to do, I'm not sure if I can obtain the same datasets. iKala is apparently no longer available at all, and MedleyDB is only available on request. I guess I'll try training using only the other two...
As far as I know, the result of log1p(x) can be negative. You use this function to 'normalize' the spectrograms of target accompaniment and vocals and then use the difference between network outputs and these spectrograms in your loss function. However, network outputs after ReLU can't be negative.
I see the paper and I realize that it must work, so what do I miss? Please help me
Before I try it myself, I wanted to ask if you tried training the network without finetuning and starting from scratch with a fully adversarial training. Is that too hard to train? Did you try some other conditional GAN flavors?