Comments (30)
I will look paper to understand the prompting of this model
And by the way it continues to be a good model because there are actually no models similar to this one
Keep working, man!
from audioldm.
@tonymacx86PRO Should work now. But still shorter duration may yield better results.
from audioldm.
@tonymacx86PRO Hi could you please provide your linux command for that issue?
I just tested with the following command and the result looks fine to me:
audioldm --mode "transfer" --file_path trumpet.wav -t "Children Singing"
from audioldm.
audioldm --mode "transfer" -f mpmn.wav -t "Acapella" -dur 20
audioldm --mode "transfer" -f mpmn.wav -t "Drum kit and flute" -dur 10
audioldm --mode "transfer" -f mpmn.wav -t "Drum kit and flute" -dur 40
audioldm --mode "transfer" -f mpmn.wav -t "Guitar" -dur 40
from audioldm.
@tonymacx86PRO how long is the audio file mpmn.wav?
from audioldm.
00:01:09
from audioldm.
It's also very strange that sometimes I have an empty generation without CUDA errors, but when I tried 70 seconds again, he said you don't have 3 GB of video memory
I have Windows 11 with anaconda and RTX 3060 12GB
from audioldm.
OK thanks for reporting this. I'll look into it now.
from audioldm.
Ok
from audioldm.
I'm willing to be patient to turn my friend's music into a acappella for example
from audioldm.
@tonymacx86PRO AudioLDM is quite an initial version of Text-to-Audio. There is still a lot yet to improve. So may be better to lower the expectation a bit. :)
from audioldm.
But I'm generally freaked out that he's like, well, here's an a cappella for you: (gives an empty file)
Its funny but ok
But I, for example, threw him an audio recording of a gun and he makes a similar sound and that's cool
But the transfer is broken for something
I'm waiting for a fix :D
from audioldm.
@tonymacx86PRO I've spotted some problems in the code, but haven't found the root cause. For now, you can safely play with AudioLDM with a duration smaller than 20 seconds. Please reinstall the latest version of audioldm via pip and try again. Thanks.
from audioldm.
BTW latest version is 0.0.17
from audioldm.
git pull (repo) & pip install --upgrade audioldm
Console input and output
(audioldm) C:\Users\Coder\Documents\AI\AudioLDM>audioldm -f mpmn.wav --mode "transfer" -t "Acapella" -dur 20
DiffusionWrapper has 185.04 M params.
C:\Users\Coder\.conda\envs\audioldm\lib\site-packages\torchlibrosa\stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
fft_window = librosa.util.pad_center(fft_window, n_fft)
C:\Users\Coder\.conda\envs\audioldm\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:3191.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
C:\Users\Coder\Documents\AI\AudioLDM\audioldm\audio\stft.py:42: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
fft_window = pad_center(fft_window, filter_length)
C:\Users\Coder\Documents\AI\AudioLDM\audioldm\audio\stft.py:145: FutureWarning: Pass sr=16000, n_fft=1024, n_mels=64, fmin=0, fmax=8000 as keyword args. From version 0.10 passing these as positional arguments will result in an error
mel_basis = librosa_mel_fn(
Decoding image: 100%|████████████████████████████████████████████████████████████████| 100/100 [00:10<00:00, 9.14it/s]
Save audio to ./output\transfer\mpmn\01_03_2023_15_05_16_Acapella_0.wav
Another empty file
from audioldm.
Oh maybe choose 17.5 seconds
P.S: Doesn't work
Console input and output
(audioldm) C:\Users\Coder\Documents\AI\AudioLDM>audioldm -f mpmn.wav --mode "transfer" -t "Acapella" -dur 17.5
DiffusionWrapper has 185.04 M params.
C:\Users\Coder\.conda\envs\audioldm\lib\site-packages\torchlibrosa\stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
fft_window = librosa.util.pad_center(fft_window, n_fft)
C:\Users\Coder\.conda\envs\audioldm\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:3191.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
C:\Users\Coder\Documents\AI\AudioLDM\audioldm\audio\stft.py:42: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
fft_window = pad_center(fft_window, filter_length)
C:\Users\Coder\Documents\AI\AudioLDM\audioldm\audio\stft.py:145: FutureWarning: Pass sr=16000, n_fft=1024, n_mels=64, fmin=0, fmax=8000 as keyword args. From version 0.10 passing these as positional arguments will result in an error
mel_basis = librosa_mel_fn(
Decoding image: 100%|████████████████████████████████████████████████████████████████| 100/100 [00:09<00:00, 10.94it/s]
Save audio to ./output\transfer\mpmn\01_03_2023_15_11_15_Acapella_0.wav
from audioldm.
@tonymacx86PRO Oh, that's bad. Sorry about that. I'll keep looking into it. Also can you please share your audio file so that I can test on my side?
from audioldm.
Looks like there are some numerical problems in the VAE decoder that lead to NAN output. Will try to solve it today.
from audioldm.
I am sended in zip because i can't send wav
01_03_2023_15_11_15_Acapella_0.zip
from audioldm.
I am sended in zip because i can't send wav 01_03_2023_15_11_15_Acapella_0.zip
Hi I mean sharing the mpmn.wav
from audioldm.
First of all, I sent a file named DK_-_MP_Main_Menu right now because a friend asked me to leave it, but it's an mpmn.wav file
And the original file was in ogg and I threw the converted file to wav to the neural network, I don't know if there may be a converter problem, but here are the files.
Original (*.ogg) and Converted (*.wav)
from audioldm.
First of all, I sent a file named DK_-_MP_Main_Menu right now because a friend asked me to leave it, but it's an mpmn.wav file And the original file was in ogg and I threw the converted file to wav to the neural network, I don't know if there may be a converter problem, but here are the files. Original (.ogg) and Converted (.wav)
Thanks I'll look into it.
from audioldm.
How is it going?
from audioldm.
What audio formats support CLI?
from audioldm.
.wav file will work
from audioldm.
I will check out your fix
Reply: .wav file will work; Ok
from audioldm.
20 seconds is working but token "Acapella" doesn't do the best. My music is not appear acapella, i don't know i will try man voice
from audioldm.
Yes, the model does not guarantee good results. It might need some tuning on transfer_strength, seed, text, etc.
from audioldm.
Man voice very similar
from audioldm.
I'm going to rest well I think it's time to close this issue
from audioldm.
Related Issues (20)
- Text-guided Audio-to-Audio Style Transfer
- Output generated via command line tool not saved if prompt is very long
- Is there any docs on how to run the Text to Speech? HOT 2
- Is there any docs on how to run Super Resolution? HOT 2
- Unexpected key(s) error HOT 2
- I am having this issue HOT 3
- cuda
- Status or info on Mac silicon GPU support ? HOT 2
- Error: RuntimeError: Error(s) in loading state_dict for LatentDiffusion: HOT 1
- How to contribute?
- audio-to-audio: is it possible to use more than one samples?
- AttributeError: module 'gradio' has no attribute 'Box' HOT 1
- WebUI not functioning HOT 1
- Training
- NameError: name '_C' is not defined HOT 2
- Error: Unexpected key(s) in state_dict: "cond_stage_model.model.text_branch.embeddings.position_ids". HOT 3
- Hope to release the training code HOT 2
- Question about the code during the text embedding calculation
- Infinite audio generation HOT 6
- No module named 'soundfile' HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from audioldm.