Would love to see code to reproduce the paper's super resolution

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Hey, I tried using the new method: <div class="snippet-clipboard-content notransla

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Super resolution example? about audioldm HOT 13 OPEN

haoheliu commented on July 19, 2024

Super resolution example?

from audioldm.

Comments (13)

haoheliu commented on July 19, 2024 3

@galfaroth Super-resolution and inpainting will be available this Friday. Thanks for your patience.

from audioldm.

galfaroth commented on July 19, 2024 3

Hey! Thanks for the reply! What if I wanted to test the super resolution? Can you provide an example too? And possibly sample in and out example.

from audioldm.

haoheliu commented on July 19, 2024

Sure. We will open-source that part, which is also in the TODO list.

from audioldm.

galfaroth commented on July 19, 2024

Could you possibly just send the Audio Super Resolution model you used so that we don't have to download the dataset and train ourselves?

from audioldm.

devilismyfriend commented on July 19, 2024

Awesome! Excited to test it out

…

On Tue, Feb 21, 2023, 2:56 p.m. haoheliu ***@***.***> wrote: @galfaroth <https://github.com/galfaroth> Super-resolution and inpainting will be available this Friday. Thanks for your patience. — Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUYC4ICC7NC7NXAYRMPKIU3WYVB2HANCNFSM6AAAAAAURAJXVY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

from audioldm.

galfaroth commented on July 19, 2024

@galfaroth Super-resolution and inpainting will be available this Friday. Thanks for your patience.

No way!

from audioldm.

haoheliu commented on July 19, 2024

Hi all, the code related to super-resolution and inpainting is available here: https://github.com/haoheliu/AudioLDM/blob/main/audioldm/pipeline.py#L223

It has not been integrated into the command line usage yet because I haven't come up with an elegant and simple interface. I'm just trying to avoid making this tool exceedingly heavy. And maybe super-resolution and inpainting are not that of board interest from my perspective (correct me if I'm wrong). So I'll temporarily leave super-resolution and inpainting in this python function form. You can still play with the function though. I've already tested it out and it all works fine.

from audioldm.

galfaroth commented on July 19, 2024

Hey, I tried using the new method:

def upsample(original_filepath,text, duration, guidance_scale, random_seed, n_candidates, steps):
  waveform = super_resolution_and_inpainting(audioldm,text,original_filepath,
                                  seed=random_seed,ddim_steps=steps,
                                  duration=duration, batchsize=1,
                                  guidance_scale=guidance_scale,
                                  n_candidate_gen_per_text=int(n_candidates),
                                  time_mask_ratio_start_and_end=(1.0, 1.0), # no inpainting,
                                  freq_mask_ratio_start_and_end=(0.75, 1.0), # regenerate the higher 75% to 100% mel bins
                                  )
  if(len(waveform) == 1):
    waveform = waveform[0]
  return waveform

but then I get:

[<ipython-input-11-eac161f8fca7>](https://localhost:8080/#) in upsample(original_filepath, text, duration, guidance_scale, random_seed, n_candidates, steps)
      8 
      9 def upsample(original_filepath,text, duration, guidance_scale, random_seed, n_candidates, steps):
---> 10   waveform = super_resolution_and_inpainting(audioldm,text,original_filepath,
     11                                   seed=random_seed,ddim_steps=steps,
     12                                   duration=duration, batchsize=1,

[/content/AudioLDM/audioldm/pipeline.py](https://localhost:8080/#) in super_resolution_and_inpainting(latent_diffusion, text, original_audio_file_path, seed, ddim_steps, duration, batchsize, guidance_scale, n_candidate_gen_per_text, time_mask_ratio_start_and_end, freq_mask_ratio_start_and_end, config)
    258     )
    259 
--> 260     batch = make_batch_for_text_to_audio(text, fbank=mel[None,...], batchsize=batchsize)
    261 
    262     # latent_diffusion.latent_t_size = duration_to_latent_t_size(duration)

[/content/AudioLDM/audioldm/pipeline.py](https://localhost:8080/#) in make_batch_for_text_to_audio(text, waveform, fbank, batchsize)
     26     else:
     27         fbank = torch.FloatTensor(fbank)
---> 28         fbank = fbank.expand(batchsize, 1024, 64)
     29         assert fbank.size(0) == batchsize
     30

RuntimeError: The expanded size of the tensor (1024) must match the existing size (512) at non-singleton dimension 1. Target sizes: [1, 1024, 64]. Tensor sizes: [1, 512, 64]

I know the base SR = 16000, where do I specify the target SR? Can it upscale to 96000 for example?

from audioldm.

haoheliu commented on July 19, 2024

@galfaroth The super-resolution means upsample a sampling rate (<16 kHz) to 16 kHz. A higher sampling rate will be another research.

from audioldm.

galfaroth commented on July 19, 2024

@galfaroth The super-resolution means upsample a sampling rate (<16 kHz) to 16 kHz. A higher sampling rate will be another research.

Apart from upsample resolution, why do I get the error? Can you post an example of how to do the upsampling with this method?

from audioldm.

haoheliu commented on July 19, 2024

You can use the following script (sr_inpainting.py) @galfaroth

#!/usr/bin/python3
import os
from audioldm import text_to_audio, style_transfer, build_model, save_wave, get_time, super_resolution_and_inpainting
import argparse

CACHE_DIR = os.getenv(
    "AUDIOLDM_CACHE_DIR",
    os.path.join(os.path.expanduser("~"), ".cache/audioldm"))

parser = argparse.ArgumentParser()

parser.add_argument(
    "-t",
    "--text",
    type=str,
    required=False,
    default="",
    help="Text prompt to the model for audio generation",
)

parser.add_argument(
    "-f",
    "--file_path",
    type=str,
    required=False,
    default=None,
    help="(--mode transfer): Original audio file for style transfer; Or (--mode generation): the guidance audio file for generating simialr audio",
)

parser.add_argument(
    "--transfer_strength",
    type=float,
    required=False,
    default=0.5,
    help="A value between 0 and 1. 0 means original audio without transfer, 1 means completely transfer to the audio indicated by text",
)

parser.add_argument(
    "-s",
    "--save_path",
    type=str,
    required=False,
    help="The path to save model output",
    default="./output",
)

parser.add_argument(
    "-ckpt",
    "--ckpt_path",
    type=str,
    required=False,
    help="The path to the pretrained .ckpt model",
    default=os.path.join(
                CACHE_DIR,
                "audioldm-s-full.ckpt",
            ),
)

parser.add_argument(
    "-b",
    "--batchsize",
    type=int,
    required=False,
    default=1,
    help="Generate how many samples at the same time",
)

parser.add_argument(
    "--ddim_steps",
    type=int,
    required=False,
    default=200,
    help="The sampling step for DDIM",
)

parser.add_argument(
    "-gs",
    "--guidance_scale",
    type=float,
    required=False,
    default=2.5,
    help="Guidance scale (Large => better quality and relavancy to text; Small => better diversity)",
)

parser.add_argument(
    "-dur",
    "--duration",
    type=float,
    required=False,
    default=10.0,
    help="The duration of the samples",
)

parser.add_argument(
    "-n",
    "--n_candidate_gen_per_text",
    type=int,
    required=False,
    default=3,
    help="Automatic quality control. This number control the number of candidates (e.g., generate three audios and choose the best to show you). A Larger value usually lead to better quality with heavier computation",
)

parser.add_argument(
    "--seed",
    type=int,
    required=False,
    default=42,
    help="Change this value (any integer number) will lead to a different generation result.",
)

args = parser.parse_args()
assert args.duration % 2.5 == 0, "Duration must be a multiple of 2.5"

mode = "super_resolution_and_inpainting"
        
save_path = os.path.join(args.save_path, mode)

if(args.file_path is not None):
    save_path = os.path.join(save_path, os.path.basename(args.file_path.split(".")[0]))

text = args.text
random_seed = args.seed
duration = args.duration
guidance_scale = args.guidance_scale
n_candidate_gen_per_text = args.n_candidate_gen_per_text

os.makedirs(save_path, exist_ok=True)
audioldm = build_model(ckpt_path=args.ckpt_path)

waveform = super_resolution_and_inpainting(
    audioldm,
    text,
    args.file_path,
    random_seed,
    duration=duration,
    guidance_scale=guidance_scale,
    ddim_steps=args.ddim_steps,
    n_candidate_gen_per_text=n_candidate_gen_per_text,
    batchsize=args.batchsize,
    time_mask_ratio_start_and_end=(0.10, 0.15), # regenerate the 10% to 15% of the time steps in the spectrogram
    # time_mask_ratio_start_and_end=(1.0, 1.0), # no inpainting
    # freq_mask_ratio_start_and_end=(0.75, 1.0), # regenerate the higher 75% to 100% mel bins
    freq_mask_ratio_start_and_end=(1.0, 1.0), # no super-resolution
)


save_wave(waveform, save_path, name="%s_%s" % (get_time(), text))

in the command line, run this script by:

python3 sr_inpainting.py -f trumpet.wav

Then the script will do inpainting on audio between 10% to 15% time steps.

from audioldm.

bitnom commented on July 19, 2024

omg it's happening

from audioldm.

Hikari-Tsai commented on July 19, 2024

Hi @galfaroth,
Just modify this parameter freq_mask_ratio_start_and_end in @haoheliu 's sample code.
You can spend a little time to understand this repo. it's a good investt.

from audioldm.

Super resolution example? about audioldm HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent