Giter Club home page Giter Club logo

images-that-sound's Introduction

Images that Sound

Ziyang Chen, Daniel Geng, Andrew Owens

University of Michigan, Ann Arbor

arXiv 2024

[Paper] [Project Page]


This repository contains the code to generate images that sound, a special spectrogram that can be seen as images and played as sound.

teaser

Environment

To setup the environment, please simply run:

conda env create -f environment.yml
conda activate soundify

Pro tip: we highly recommend using mamba instead of conda for much faster environment solving and installation.

DeepFlyod: our repo also uses DeepFloyd IF. To use DeepFloyd IF, you must accept its usage conditions. To do so:

  1. Sign up or log in to Hugging Face account.
  2. Accept the license on the model card of DeepFloyd/IF-I-XL-v1.0.
  3. Log in locally by running python huggingface_login.py and entering your Hugging Face Hub access token when prompted. It does not matter how you answer the Add token as git credential? (Y/n) question.

Usage

We use pretrained image latent diffusion Stable Diffusion v1.5 and pretrained audio latent diffusion Auffusion, which finetuned from Stable Diffusion. We provide the codes (including visualization) and instructions for our approach (multimodal denoising) and two proposed baselines: Imprint and SDS. We note that our code is based on the hydra, you can overwrite the parameters based on hydra.

Multimodal denoising

To create images that sound using our multimodal denoising method, run the code with config files under configs/main_denoise/experiment:

python src/main_denoise.py experiment=examples/bell

Note: our method does not have a high success rate since it's zero-shot and it highly depends on initial random noises. We recommend generating more samples such as N=100 to selectively hand-pick high-quality results.

Imprint baseline

To create images that sound using our proposed imprint baseline method, run the code with config files under configs/main_imprint/experiment:

python src/main_imprint.py experiment=examples/bell

SDS baseline

To create images that sound using our proposed multimodal SDS baseline method, run the code with config file under configs/main_sds/experiment:

python src/main_sds.py experiment=examples/bell

Note: we find that Audio SDS doesn't work for a lot of audio prompts. We hypothesize the reason is that latent diffusions don't work quite well as pixel-based diffusion for SDS.

Colorization

We also provide the colorization code under src/colorization which is adopted from Factorized Diffusion. To directly generate colorized videos with audio, run the code:

python src/colorization/create_color_video.py \
  --sample_dir /path/to/generated/sample/dir \
  --prompt "a colorful photo of [object]" \
  --num_samples 16 --guidance_scale 10 \
  --num_inference_steps 30 --start_diffusion_step 7

Note: since our generated images fall outside the distribution, we recommend running more trials (num_samples=16) to select best colorized results.

Acknowledgement

Our code is based on Lightning-Hydra-Template, diffusers, stable-dreamfusion, Diffusion-Illusions, Auffusion, and visual-anagrams. We appreciate their open-source codes.

images-that-sound's People

Contributors

ificl avatar

Stargazers

Wendy Mak avatar  avatar Josh Miller avatar Ubaid Seth avatar Matthew Biederman avatar  avatar  avatar Renan Costa Alencar avatar  avatar Julias Shaw avatar M1nd 3xpand3r avatar Sergi Pastor avatar Carlos Soria avatar Manoj Acharya avatar Chih-Cheng (CC) Chang avatar Ryan Yard avatar Nick Maggio avatar  avatar Tanisha Khurana avatar Rob avatar Amir M. Parvizi avatar Noah avatar Mathieu Drouet avatar  avatar Moritz August avatar Anubhav Tiwari avatar Celi28M avatar  avatar Thomas Gregoire avatar Vinit Unni avatar  avatar BG7JAF avatar Bilel Aroua avatar Alef Iury avatar Julien Dabert avatar Leo Foletto avatar Iver Jordal avatar Amantur Amatov avatar Jérémy N. Martin avatar Anush V. avatar Stelios Petrakis avatar Fan Wei avatar 0xfab1 avatar Kim Youwang avatar Steffen Röcker avatar haroonn avatar Yuseung (Phillip) Lee avatar ik5 avatar  avatar Minkyu Kim avatar Arnav Aggarwal avatar Ting-Shuo Yo avatar Jason Glass avatar Doug Ollerenshaw avatar  avatar MODAVIS avatar Petros Kataras avatar  avatar Greg DeCarlo avatar penguinway avatar carlen avatar Hai Carroll avatar Ethan John Walker avatar  avatar Gelez avatar tk avatar  avatar Boris avatar MaoYuxin avatar yearnyeen ho avatar  avatar  avatar Brad Bergeron avatar Book_A avatar  avatar Saurabh Davda avatar  avatar afrideva avatar  avatar  avatar Giant Clam avatar Egor Lynov avatar  avatar Stevezanto avatar Sean avatar Ayush Kumar avatar jordin avatar Brandon Mangold avatar  avatar justin avatar Theron Spiegl avatar Yuexi Du avatar Yong Liu avatar Huu Tuong Tu avatar HAESUNG JEON avatar  avatar Jameson Travers avatar  avatar  avatar Ken Vermeille avatar

Watchers

刘国友 avatar  avatar Daniel Geng avatar  avatar Yaoyao avatar

images-that-sound's Issues

Need Help Running Code - Hydra Path Issue on Mac with M-Series Chip

Hi,
Sorry for stupid question.

I'm encountering an issue while trying to run the code. I get the following error message:

 File "/opt/anaconda3/envs/soundify/lib/python3.10/site-packages/hydra/_internal/defaults_list.py", line 799, in config_not_found_error
    raise MissingConfigException(
hydra.errors.MissingConfigException: In 'hydra/config': Could not find 'hydra/job_logging/colorlog'

Available options in 'hydra/job_logging':
	default
	disabled
	none
	stdout
Config search path:
	provider=hydra, path=pkg://hydra.conf
	provider=main, path=file:///Users/username/Downloads/images-that-sound/configs/main_imprint
	provider=schema, path=structured:/

I am not sure your suggestion that “To create images that sound using our proposed imprint baseline method, run the code with config files under configs/main_imprint/experiment:

I checked config file exists in the path indicated in this line of code. @hydra.main(version_base="1.3", config_path="../configs/main_imprint", config_name="main.yaml")

I run the command python src/main_imprint.py experiment=examples/bell in the root dir of the project.

Also, could you clarify can I run it on personal M-chip Mac? I saw pytorch-cuda in conda environment.yaml. I also found some code provide alternatives with cpu. Some requires cuda.

Additionally, if you could specify which tasks are possible to run with a Mac M-Series chip, that would be extremely helpful.

As a side note, I previously saw a video on Bilibili where mountain shapes were used to create music with ancient Chinese instruments. It was really impressive (sorry, I couldn't find the video link). I believe your project could achieve a similar effect. I suspect there might be a way to introduce specific instrument styles when generating audio. If you have read any related works, I would like to hear about them.

Thank you for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.