images-that-sound's Introduction

Images that Sound

University of Michigan, Ann Arbor

arXiv 2024

This repository contains the code to generate images that sound, a special spectrogram that can be seen as images and played as sound.

Environment

To setup the environment, please simply run:

conda env create -f environment.yml
conda activate soundify

Pro tip: we highly recommend using mamba instead of conda for much faster environment solving and installation.

DeepFlyod: our repo also uses DeepFloyd IF. To use DeepFloyd IF, you must accept its usage conditions. To do so:

Sign up or log in to Hugging Face account.
Accept the license on the model card of DeepFloyd/IF-I-XL-v1.0.
Log in locally by running python huggingface_login.py and entering your Hugging Face Hub access token when prompted. It does not matter how you answer the Add token as git credential? (Y/n) question.

Usage

We use pretrained image latent diffusion Stable Diffusion v1.5 and pretrained audio latent diffusion Auffusion, which finetuned from Stable Diffusion. We provide the codes (including visualization) and instructions for our approach (multimodal denoising) and two proposed baselines: Imprint and SDS. We note that our code is based on the hydra, you can overwrite the parameters based on hydra.

Multimodal denoising

To create images that sound using our multimodal denoising method, run the code with config files under configs/main_denoise/experiment:

python src/main_denoise.py experiment=examples/bell

Note: our method does not have a high success rate since it's zero-shot and it highly depends on initial random noises. We recommend generating more samples such as N=100 to selectively hand-pick high-quality results.

Imprint baseline

To create images that sound using our proposed imprint baseline method, run the code with config files under configs/main_imprint/experiment:

python src/main_imprint.py experiment=examples/bell

SDS baseline

To create images that sound using our proposed multimodal SDS baseline method, run the code with config file under configs/main_sds/experiment:

python src/main_sds.py experiment=examples/bell

Note: we find that Audio SDS doesn't work for a lot of audio prompts. We hypothesize the reason is that latent diffusions don't work quite well as pixel-based diffusion for SDS.

Colorization

We also provide the colorization code under src/colorization which is adopted from Factorized Diffusion. To directly generate colorized videos with audio, run the code:

python src/colorization/create_color_video.py \
  --sample_dir /path/to/generated/sample/dir \
  --prompt "a colorful photo of [object]" \
  --num_samples 16 --guidance_scale 10 \
  --num_inference_steps 30 --start_diffusion_step 7

Note: since our generated images fall outside the distribution, we recommend running more trials (num_samples=16) to select best colorized results.

Acknowledgement

Our code is based on Lightning-Hydra-Template, diffusers, stable-dreamfusion, Diffusion-Illusions, Auffusion, and visual-anagrams. We appreciate their open-source codes.

images-that-sound's People

Contributors

Stargazers

Watchers

images-that-sound's Issues

Need Help Running Code - Hydra Path Issue on Mac with M-Series Chip

Hi,
Sorry for stupid question.

I'm encountering an issue while trying to run the code. I get the following error message:

 File "/opt/anaconda3/envs/soundify/lib/python3.10/site-packages/hydra/_internal/defaults_list.py", line 799, in config_not_found_error
    raise MissingConfigException(
hydra.errors.MissingConfigException: In 'hydra/config': Could not find 'hydra/job_logging/colorlog'

Available options in 'hydra/job_logging':
	default
	disabled
	none
	stdout
Config search path:
	provider=hydra, path=pkg://hydra.conf
	provider=main, path=file:///Users/username/Downloads/images-that-sound/configs/main_imprint
	provider=schema, path=structured:/

I am not sure your suggestion that “To create images that sound using our proposed imprint baseline method, run the code with config files under configs/main_imprint/experiment:”

I checked config file exists in the path indicated in this line of code. @hydra.main(version_base="1.3", config_path="../configs/main_imprint", config_name="main.yaml")

I run the command python src/main_imprint.py experiment=examples/bell in the root dir of the project.

Also, could you clarify can I run it on personal M-chip Mac? I saw pytorch-cuda in conda environment.yaml. I also found some code provide alternatives with cpu. Some requires cuda.

Additionally, if you could specify which tasks are possible to run with a Mac M-Series chip, that would be extremely helpful.

As a side note, I previously saw a video on Bilibili where mountain shapes were used to create music with ancient Chinese instruments. It was really impressive (sorry, I couldn't find the video link). I believe your project could achieve a similar effect. I suspect there might be a way to introduce specific instrument styles when generating audio. If you have read any related works, I would like to hear about them.

Thank you for your help!

Recommend Projects

ificl / images-that-sound Goto Github PK

images-that-sound's Introduction

Images that Sound

Environment

Usage

Multimodal denoising

Imprint baseline

SDS baseline

Colorization

Acknowledgement

images-that-sound's People

Contributors

Stargazers

Watchers

Forkers

images-that-sound's Issues

Need Help Running Code - Hydra Path Issue on Mac with M-Series Chip

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent