Giter Club home page Giter Club logo

Comments (30)

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024 6

I will look paper to understand the prompting of this model
And by the way it continues to be a good model because there are actually no models similar to this one
Keep working, man!

from audioldm.

haoheliu avatar haoheliu commented on July 19, 2024 1

@tonymacx86PRO Should work now. But still shorter duration may yield better results.

from audioldm.

haoheliu avatar haoheliu commented on July 19, 2024

@tonymacx86PRO Hi could you please provide your linux command for that issue?

I just tested with the following command and the result looks fine to me:

audioldm --mode "transfer" --file_path trumpet.wav -t "Children Singing" 

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

audioldm --mode "transfer" -f mpmn.wav -t "Acapella" -dur 20
audioldm --mode "transfer" -f mpmn.wav -t "Drum kit and flute" -dur 10
audioldm --mode "transfer" -f mpmn.wav -t "Drum kit and flute" -dur 40
audioldm --mode "transfer" -f mpmn.wav -t "Guitar" -dur 40

from audioldm.

haoheliu avatar haoheliu commented on July 19, 2024

@tonymacx86PRO how long is the audio file mpmn.wav?

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

00:01:09

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

It's also very strange that sometimes I have an empty generation without CUDA errors, but when I tried 70 seconds again, he said you don't have 3 GB of video memory
I have Windows 11 with anaconda and RTX 3060 12GB

from audioldm.

haoheliu avatar haoheliu commented on July 19, 2024

OK thanks for reporting this. I'll look into it now.

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

Ok

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

I'm willing to be patient to turn my friend's music into a acappella for example

from audioldm.

haoheliu avatar haoheliu commented on July 19, 2024

@tonymacx86PRO AudioLDM is quite an initial version of Text-to-Audio. There is still a lot yet to improve. So may be better to lower the expectation a bit. :)

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

But I'm generally freaked out that he's like, well, here's an a cappella for you: (gives an empty file)
Its funny but ok
But I, for example, threw him an audio recording of a gun and he makes a similar sound and that's cool
But the transfer is broken for something
I'm waiting for a fix :D

from audioldm.

haoheliu avatar haoheliu commented on July 19, 2024

@tonymacx86PRO I've spotted some problems in the code, but haven't found the root cause. For now, you can safely play with AudioLDM with a duration smaller than 20 seconds. Please reinstall the latest version of audioldm via pip and try again. Thanks.

from audioldm.

haoheliu avatar haoheliu commented on July 19, 2024

BTW latest version is 0.0.17

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

git pull (repo) & pip install --upgrade audioldm
Console input and output

(audioldm) C:\Users\Coder\Documents\AI\AudioLDM>audioldm -f mpmn.wav --mode "transfer" -t "Acapella" -dur 20
DiffusionWrapper has 185.04 M params.
C:\Users\Coder\.conda\envs\audioldm\lib\site-packages\torchlibrosa\stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  fft_window = librosa.util.pad_center(fft_window, n_fft)
C:\Users\Coder\.conda\envs\audioldm\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:3191.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
C:\Users\Coder\Documents\AI\AudioLDM\audioldm\audio\stft.py:42: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  fft_window = pad_center(fft_window, filter_length)
C:\Users\Coder\Documents\AI\AudioLDM\audioldm\audio\stft.py:145: FutureWarning: Pass sr=16000, n_fft=1024, n_mels=64, fmin=0, fmax=8000 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  mel_basis = librosa_mel_fn(
Decoding image: 100%|████████████████████████████████████████████████████████████████| 100/100 [00:10<00:00,  9.14it/s]
Save audio to ./output\transfer\mpmn\01_03_2023_15_05_16_Acapella_0.wav

Another empty file

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

Oh maybe choose 17.5 seconds
P.S: Doesn't work
Console input and output

(audioldm) C:\Users\Coder\Documents\AI\AudioLDM>audioldm -f mpmn.wav --mode "transfer" -t "Acapella" -dur 17.5
DiffusionWrapper has 185.04 M params.
C:\Users\Coder\.conda\envs\audioldm\lib\site-packages\torchlibrosa\stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  fft_window = librosa.util.pad_center(fft_window, n_fft)
C:\Users\Coder\.conda\envs\audioldm\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:3191.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
C:\Users\Coder\Documents\AI\AudioLDM\audioldm\audio\stft.py:42: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  fft_window = pad_center(fft_window, filter_length)
C:\Users\Coder\Documents\AI\AudioLDM\audioldm\audio\stft.py:145: FutureWarning: Pass sr=16000, n_fft=1024, n_mels=64, fmin=0, fmax=8000 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  mel_basis = librosa_mel_fn(
Decoding image: 100%|████████████████████████████████████████████████████████████████| 100/100 [00:09<00:00, 10.94it/s]
Save audio to ./output\transfer\mpmn\01_03_2023_15_11_15_Acapella_0.wav

from audioldm.

haoheliu avatar haoheliu commented on July 19, 2024

@tonymacx86PRO Oh, that's bad. Sorry about that. I'll keep looking into it. Also can you please share your audio file so that I can test on my side?

from audioldm.

haoheliu avatar haoheliu commented on July 19, 2024

Looks like there are some numerical problems in the VAE decoder that lead to NAN output. Will try to solve it today.

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

I am sended in zip because i can't send wav
01_03_2023_15_11_15_Acapella_0.zip

from audioldm.

haoheliu avatar haoheliu commented on July 19, 2024

I am sended in zip because i can't send wav 01_03_2023_15_11_15_Acapella_0.zip

Hi I mean sharing the mpmn.wav

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

First of all, I sent a file named DK_-_MP_Main_Menu right now because a friend asked me to leave it, but it's an mpmn.wav file
And the original file was in ogg and I threw the converted file to wav to the neural network, I don't know if there may be a converter problem, but here are the files.
Original (*.ogg) and Converted (*.wav)

from audioldm.

haoheliu avatar haoheliu commented on July 19, 2024

First of all, I sent a file named DK_-_MP_Main_Menu right now because a friend asked me to leave it, but it's an mpmn.wav file And the original file was in ogg and I threw the converted file to wav to the neural network, I don't know if there may be a converter problem, but here are the files. Original (.ogg) and Converted (.wav)

Thanks I'll look into it.

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

How is it going?

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

What audio formats support CLI?

from audioldm.

haoheliu avatar haoheliu commented on July 19, 2024

.wav file will work

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

I will check out your fix
Reply: .wav file will work; Ok

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

20 seconds is working but token "Acapella" doesn't do the best. My music is not appear acapella, i don't know i will try man voice

from audioldm.

haoheliu avatar haoheliu commented on July 19, 2024

Yes, the model does not guarantee good results. It might need some tuning on transfer_strength, seed, text, etc.

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

Man voice very similar

from audioldm.

tonymacx86PRO avatar tonymacx86PRO commented on July 19, 2024

I'm going to rest well I think it's time to close this issue

from audioldm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.