Giter Club home page Giter Club logo

Comments (6)

tralala87 avatar tralala87 commented on August 25, 2024

I would also like to know this. I can't seem to download the audio file after running the code.

After running this: `from audio_diffusion_pytorch import DiffusionModel, UNetV0, VDiffusion, VSampler
import torch

model = DiffusionModel(
# ... same as unconditional model
net_t=UNetV0, # The model type used for diffusion (U-Net V0 in this case)
in_channels=2, # U-Net: number of input/output (audio) channels
channels=[8, 32, 64, 128, 256, 512, 512, 1024, 1024], # U-Net: channels at each layer
factors=[1, 4, 4, 4, 2, 2, 2, 2, 2], # U-Net: downsampling and upsampling factors at each layer
items=[1, 2, 2, 2, 2, 2, 2, 4, 4], # U-Net: number of repeating items at each layer
attentions=[0, 0, 0, 0, 0, 1, 1, 1, 1], # U-Net: attention enabled/disabled at each layer
attention_heads=8, # U-Net: number of attention heads per attention item
attention_features=64, # U-Net: number of attention features per attention item
diffusion_t=VDiffusion, # The diffusion method used
sampler_t=VSampler, # The diffusion sampler used
use_text_conditioning=True, # U-Net: enables text conditioning (default T5-base)
use_embedding_cfg=True, # U-Net: enables classifier free guidance
embedding_max_length=64, # U-Net: text embedding maximum length (default for T5-base)
embedding_features=768, # U-Net: text mbedding features (default for T5-base)
cross_attentions=[0, 0, 0, 1, 1, 1, 1, 1, 1], # U-Net: cross-attention enabled/disabled at each layer
)

Train model with audio waveforms

audio_wave = torch.randn(1, 2, 2**18) # [batch, in_channels, length]
loss = model(
audio_wave,
text=['The audio description'], # Text conditioning, one element per batch
embedding_mask_proba=0.1 # Probability of masking text with learned embedding (Classifier-Free Guidance Mask)
)
loss.backward()

Turn noise into new audio sample with diffusion

noise = torch.randn(1, 2, 2**18)
sample = model.sample(
noise,
text=['The audio description'],
embedding_scale=5.0, # Higher for more text importance, suggested range: 1-15 (Classifier-Free Guidance Scale)
num_steps=2 # Higher for better quality, suggested num_steps: 10-100
)`

Where and how can I download the audio file?

from audio-diffusion-pytorch.

dillfrescott avatar dillfrescott commented on August 25, 2024

I don't think the audio file is actually being written anywhere. Just the data stored in memory I believe.

from audio-diffusion-pytorch.

mangoleaf avatar mangoleaf commented on August 25, 2024

I have played around trying to save the tensors as wav files (sample below for others interested), however I seem to only receive audio files out that are complete static.

I would really appreciate it if someone can offer a correct solution to this. I would be happy to submit code adding a utility for this as well.

# Turn noise into new audio sample with diffusion
noise = torch.randn(1, 2, 2**18)
sample = model.sample(
    noise,
    text=['Bird chirping'],
    embedding_scale=5.0, # Higher for more text importance, suggested range: 1-15 (Classifier-Free Guidance Scale)
    num_steps=15 # Higher for better quality, suggested num_steps: 10-100
)


import soundfile as sf
def save_wav(tensor, path):
    tensor = tensor.squeeze()
    tensor = tensor / tensor.max()
    nparray = tensor.squeeze().numpy(force=True).astype('float32').T
    sf.write(path, nparray, samplerate=44100, format='wav')
    print("Done saving file")

save_wav(sample, "test_generated_sound.wav")

from audio-diffusion-pytorch.

flavioschneider avatar flavioschneider commented on August 25, 2024

This is a library for researchers to train audio diffusion models, no pre-trained models are provided here -- that's why you are getting only static, the model is not trained

from audio-diffusion-pytorch.

mangoleaf avatar mangoleaf commented on August 25, 2024

Good to know, I'll admit I thought I saw it pulling down a pre-trained model when installing everything. That said, you ignored my question which is still relevant as others have already asked and I will be looking into training this later today.

I was asking if I am correctly interpreting the tensor data and converting it to a wav file, this should be included in the repo as a utility and I was offering to help by opening a pull request to add it to the utility package.

from audio-diffusion-pytorch.

flavioschneider avatar flavioschneider commented on August 25, 2024

The correct way to save a tensor to .wav file is as follows:

import torchaudio
sample_rate = 48000
torchaudio.save('test_generated_sound.wav', sample[0], sample_rate)

Where sample[0] indicates that you want to save the first element of the batch.

No additional utility or library is required for that

from audio-diffusion-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.