I am a bit confused how to do this. Any help would be appreciated! :)

How to convert to wav file to listen to result? about audio-diffusion-pytorch HOT 6 CLOSED

archinetai commented on August 25, 2024

How to convert to wav file to listen to result?

from audio-diffusion-pytorch.

Comments (6)

tralala87 commented on August 25, 2024

I would also like to know this. I can't seem to download the audio file after running the code.

After running this: `from audio_diffusion_pytorch import DiffusionModel, UNetV0, VDiffusion, VSampler
import torch

model = DiffusionModel(
# ... same as unconditional model
net_t=UNetV0, # The model type used for diffusion (U-Net V0 in this case)
in_channels=2, # U-Net: number of input/output (audio) channels
channels=[8, 32, 64, 128, 256, 512, 512, 1024, 1024], # U-Net: channels at each layer
factors=[1, 4, 4, 4, 2, 2, 2, 2, 2], # U-Net: downsampling and upsampling factors at each layer
items=[1, 2, 2, 2, 2, 2, 2, 4, 4], # U-Net: number of repeating items at each layer
attentions=[0, 0, 0, 0, 0, 1, 1, 1, 1], # U-Net: attention enabled/disabled at each layer
attention_heads=8, # U-Net: number of attention heads per attention item
attention_features=64, # U-Net: number of attention features per attention item
diffusion_t=VDiffusion, # The diffusion method used
sampler_t=VSampler, # The diffusion sampler used
use_text_conditioning=True, # U-Net: enables text conditioning (default T5-base)
use_embedding_cfg=True, # U-Net: enables classifier free guidance
embedding_max_length=64, # U-Net: text embedding maximum length (default for T5-base)
embedding_features=768, # U-Net: text mbedding features (default for T5-base)
cross_attentions=[0, 0, 0, 1, 1, 1, 1, 1, 1], # U-Net: cross-attention enabled/disabled at each layer
)

Train model with audio waveforms

audio_wave = torch.randn(1, 2, 2**18) # [batch, in_channels, length]
loss = model(
audio_wave,
text=['The audio description'], # Text conditioning, one element per batch
embedding_mask_proba=0.1 # Probability of masking text with learned embedding (Classifier-Free Guidance Mask)
)
loss.backward()

Turn noise into new audio sample with diffusion

noise = torch.randn(1, 2, 2**18)
sample = model.sample(
noise,
text=['The audio description'],
embedding_scale=5.0, # Higher for more text importance, suggested range: 1-15 (Classifier-Free Guidance Scale)
num_steps=2 # Higher for better quality, suggested num_steps: 10-100
)`

Where and how can I download the audio file?

from audio-diffusion-pytorch.

dillfrescott commented on August 25, 2024

I don't think the audio file is actually being written anywhere. Just the data stored in memory I believe.

from audio-diffusion-pytorch.

mangoleaf commented on August 25, 2024

I have played around trying to save the tensors as wav files (sample below for others interested), however I seem to only receive audio files out that are complete static.

I would really appreciate it if someone can offer a correct solution to this. I would be happy to submit code adding a utility for this as well.

# Turn noise into new audio sample with diffusion
noise = torch.randn(1, 2, 2**18)
sample = model.sample(
    noise,
    text=['Bird chirping'],
    embedding_scale=5.0, # Higher for more text importance, suggested range: 1-15 (Classifier-Free Guidance Scale)
    num_steps=15 # Higher for better quality, suggested num_steps: 10-100
)


import soundfile as sf
def save_wav(tensor, path):
    tensor = tensor.squeeze()
    tensor = tensor / tensor.max()
    nparray = tensor.squeeze().numpy(force=True).astype('float32').T
    sf.write(path, nparray, samplerate=44100, format='wav')
    print("Done saving file")

save_wav(sample, "test_generated_sound.wav")

from audio-diffusion-pytorch.

flavioschneider commented on August 25, 2024

This is a library for researchers to train audio diffusion models, no pre-trained models are provided here -- that's why you are getting only static, the model is not trained

from audio-diffusion-pytorch.

mangoleaf commented on August 25, 2024

Good to know, I'll admit I thought I saw it pulling down a pre-trained model when installing everything. That said, you ignored my question which is still relevant as others have already asked and I will be looking into training this later today.

I was asking if I am correctly interpreting the tensor data and converting it to a wav file, this should be included in the repo as a utility and I was offering to help by opening a pull request to add it to the utility package.

from audio-diffusion-pytorch.

flavioschneider commented on August 25, 2024

The correct way to save a tensor to .wav file is as follows:

import torchaudio
sample_rate = 48000
torchaudio.save('test_generated_sound.wav', sample[0], sample_rate)

Where sample[0] indicates that you want to save the first element of the batch.

No additional utility or library is required for that

from audio-diffusion-pytorch.

How to convert to wav file to listen to result? about audio-diffusion-pytorch HOT 6 CLOSED

Comments (6)

Train model with audio waveforms

Turn noise into new audio sample with diffusion

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent