Input: "Any text prompt" Duration: I tried 40,45,70 Result: AudioLDM gives me

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I try to make with different prompts audio using transfer mode but it seems that output audio is empty about audioldm HOT 30 CLOSED

haoheliu commented on July 19, 2024

I try to make with different prompts audio using transfer mode but it seems that output audio is empty

from audioldm.

Comments (30)

tonymacx86PRO commented on July 19, 2024 6

I will look paper to understand the prompting of this model
And by the way it continues to be a good model because there are actually no models similar to this one
Keep working, man!

from audioldm.

haoheliu commented on July 19, 2024 1

@tonymacx86PRO Should work now. But still shorter duration may yield better results.

from audioldm.

haoheliu commented on July 19, 2024

@tonymacx86PRO Hi could you please provide your linux command for that issue?

I just tested with the following command and the result looks fine to me:

audioldm --mode "transfer" --file_path trumpet.wav -t "Children Singing"

from audioldm.

tonymacx86PRO commented on July 19, 2024

audioldm --mode "transfer" -f mpmn.wav -t "Acapella" -dur 20
audioldm --mode "transfer" -f mpmn.wav -t "Drum kit and flute" -dur 10
audioldm --mode "transfer" -f mpmn.wav -t "Drum kit and flute" -dur 40
audioldm --mode "transfer" -f mpmn.wav -t "Guitar" -dur 40

from audioldm.

haoheliu commented on July 19, 2024

@tonymacx86PRO how long is the audio file mpmn.wav?

from audioldm.

tonymacx86PRO commented on July 19, 2024

00:01:09

from audioldm.

tonymacx86PRO commented on July 19, 2024

It's also very strange that sometimes I have an empty generation without CUDA errors, but when I tried 70 seconds again, he said you don't have 3 GB of video memory
I have Windows 11 with anaconda and RTX 3060 12GB

from audioldm.

haoheliu commented on July 19, 2024

OK thanks for reporting this. I'll look into it now.

from audioldm.

tonymacx86PRO commented on July 19, 2024

from audioldm.

tonymacx86PRO commented on July 19, 2024

I'm willing to be patient to turn my friend's music into a acappella for example

from audioldm.

haoheliu commented on July 19, 2024

@tonymacx86PRO AudioLDM is quite an initial version of Text-to-Audio. There is still a lot yet to improve. So may be better to lower the expectation a bit. :)

from audioldm.

tonymacx86PRO commented on July 19, 2024

But I'm generally freaked out that he's like, well, here's an a cappella for you: (gives an empty file)
Its funny but ok
But I, for example, threw him an audio recording of a gun and he makes a similar sound and that's cool
But the transfer is broken for something
I'm waiting for a fix :D

from audioldm.

haoheliu commented on July 19, 2024

@tonymacx86PRO I've spotted some problems in the code, but haven't found the root cause. For now, you can safely play with AudioLDM with a duration smaller than 20 seconds. Please reinstall the latest version of audioldm via pip and try again. Thanks.

from audioldm.

haoheliu commented on July 19, 2024

BTW latest version is 0.0.17

from audioldm.

tonymacx86PRO commented on July 19, 2024

git pull (repo) & pip install --upgrade audioldm
Console input and output

(audioldm) C:\Users\Coder\Documents\AI\AudioLDM>audioldm -f mpmn.wav --mode "transfer" -t "Acapella" -dur 20
DiffusionWrapper has 185.04 M params.
C:\Users\Coder\.conda\envs\audioldm\lib\site-packages\torchlibrosa\stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  fft_window = librosa.util.pad_center(fft_window, n_fft)
C:\Users\Coder\.conda\envs\audioldm\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:3191.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
C:\Users\Coder\Documents\AI\AudioLDM\audioldm\audio\stft.py:42: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  fft_window = pad_center(fft_window, filter_length)
C:\Users\Coder\Documents\AI\AudioLDM\audioldm\audio\stft.py:145: FutureWarning: Pass sr=16000, n_fft=1024, n_mels=64, fmin=0, fmax=8000 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  mel_basis = librosa_mel_fn(
Decoding image: 100%|████████████████████████████████████████████████████████████████| 100/100 [00:10<00:00,  9.14it/s]
Save audio to ./output\transfer\mpmn\01_03_2023_15_05_16_Acapella_0.wav

Another empty file

from audioldm.

tonymacx86PRO commented on July 19, 2024

Oh maybe choose 17.5 seconds
P.S: Doesn't work
Console input and output

(audioldm) C:\Users\Coder\Documents\AI\AudioLDM>audioldm -f mpmn.wav --mode "transfer" -t "Acapella" -dur 17.5
DiffusionWrapper has 185.04 M params.
C:\Users\Coder\.conda\envs\audioldm\lib\site-packages\torchlibrosa\stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  fft_window = librosa.util.pad_center(fft_window, n_fft)
C:\Users\Coder\.conda\envs\audioldm\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:3191.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
C:\Users\Coder\Documents\AI\AudioLDM\audioldm\audio\stft.py:42: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  fft_window = pad_center(fft_window, filter_length)
C:\Users\Coder\Documents\AI\AudioLDM\audioldm\audio\stft.py:145: FutureWarning: Pass sr=16000, n_fft=1024, n_mels=64, fmin=0, fmax=8000 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  mel_basis = librosa_mel_fn(
Decoding image: 100%|████████████████████████████████████████████████████████████████| 100/100 [00:09<00:00, 10.94it/s]
Save audio to ./output\transfer\mpmn\01_03_2023_15_11_15_Acapella_0.wav

from audioldm.

haoheliu commented on July 19, 2024

@tonymacx86PRO Oh, that's bad. Sorry about that. I'll keep looking into it. Also can you please share your audio file so that I can test on my side?

from audioldm.

haoheliu commented on July 19, 2024

Looks like there are some numerical problems in the VAE decoder that lead to NAN output. Will try to solve it today.

from audioldm.

tonymacx86PRO commented on July 19, 2024

I am sended in zip because i can't send wav
01_03_2023_15_11_15_Acapella_0.zip

from audioldm.

haoheliu commented on July 19, 2024

I am sended in zip because i can't send wav 01_03_2023_15_11_15_Acapella_0.zip

Hi I mean sharing the mpmn.wav

from audioldm.

tonymacx86PRO commented on July 19, 2024

First of all, I sent a file named DK_-_MP_Main_Menu right now because a friend asked me to leave it, but it's an mpmn.wav file
And the original file was in ogg and I threw the converted file to wav to the neural network, I don't know if there may be a converter problem, but here are the files.
Original (*.ogg) and Converted (*.wav)

from audioldm.

haoheliu commented on July 19, 2024

First of all, I sent a file named DK_-_MP_Main_Menu right now because a friend asked me to leave it, but it's an mpmn.wav file And the original file was in ogg and I threw the converted file to wav to the neural network, I don't know if there may be a converter problem, but here are the files. Original (.ogg) and Converted (.wav)

Thanks I'll look into it.

from audioldm.

tonymacx86PRO commented on July 19, 2024

How is it going?

from audioldm.

tonymacx86PRO commented on July 19, 2024

What audio formats support CLI?

from audioldm.

haoheliu commented on July 19, 2024

.wav file will work

from audioldm.

tonymacx86PRO commented on July 19, 2024

I will check out your fix
Reply: .wav file will work; Ok

from audioldm.

tonymacx86PRO commented on July 19, 2024

20 seconds is working but token "Acapella" doesn't do the best. My music is not appear acapella, i don't know i will try man voice

from audioldm.

haoheliu commented on July 19, 2024

Yes, the model does not guarantee good results. It might need some tuning on transfer_strength, seed, text, etc.

from audioldm.

tonymacx86PRO commented on July 19, 2024

Man voice very similar

from audioldm.

tonymacx86PRO commented on July 19, 2024

I'm going to rest well I think it's time to close this issue

from audioldm.

I try to make with different prompts audio using transfer mode but it seems that output audio is empty about audioldm HOT 30 CLOSED

Comments (30)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent