Giter Club home page Giter Club logo

biggan-visualizer's Introduction

BigGAN Audio Visualizer

biggan-visualizer-example

Description

This visualizer explores BigGAN (Brock et al., 2018) latent space by using pitch/tempo of an audio file to generate and interpolate between noise/class vector inputs to the model. Classes are chosen manually or optionally using semantic similarity on BERT encodings of a lyrics corpus.

Usage:

usage: visualizer.py [-h] -s SONG [-r {128,256,512}] [-d DURATION]
                     [-ps [200-295]] [-ts [0.05-0.8]]
                     [-c CLASSES [CLASSES ...]] [-n NUM_CLASSES] [-j [0-1]]
                     [-fl i*2^6] [-t [0.1-1]] [-sf [10-30]] [-bs BATCH_SIZE]
                     [-o OUTPUT_FILE] [--use_last_vectors]
                     [--use_last_classes] [--sort_pitch] [-l LYRICS]
                     [-e {sbert,doc2vec}] [-es {best,random,ransac}]
  • In order to speed up runtime, code can be run on Google Colab GPUs (or other cloud notebook providers) using biggan_music_visualizer.ipynb (hosted here).
  • The [-n NUM_CLASSES] parameter selects the number of classes to interpolate between.
  • Default behavior is to select [-n NUM_CLASSES] random classes. The [-c CLASSES [CLASSES ...]] parameter can be used to select specific ImageNet classes. A full list can be found here, and a list categorized by coarse descriptors here. Be sure to use the int ids and not the string labels, and set [-n NUM_CLASSES] to the number of chosen classes.
  • Use the [--sort_by_power] flag to map classes to the [-n NUM_CLASSES] highest power pitches. By default, classes are mapped to a chromatic scale.
  • The [-d DURATION] parameter can be useful to generate short videos while tweaking other parameters. Once the desired parameters are set, use the [--use_last_vector] flag and remove the [-d DURATION] parameter to generate the same video at full length.
  • Reducing the output resolution with [-r {128,256,512}] and/or increasing the frame length with [-fl i*2^6] can help reduce the runtime.
  • To compute classes through semantic similarity to a lyrics file, use the [-l LYRICS] parameter. The embedding technique and strategy for choosing classes can be set with [-e {sbert,doc2vec}] and [-es {best,random,ransac}] respectively.
  • Pitch and tempo sensitivity can be set with [-ps [200-295]] and [-ts [0.05-0.8]] respectively. Jitter, truncation and smooth factor can be set with [-j [0-1]], [-t [0.1-1]] and [-sf [10-30]] respectively.
  • See the help column of the arguments section for details on all parameters.

Arguments

short long default range help
-h --help show this help message and exit
-s --song path to input audio file [REQUIRED]
-r --resolution 512 {128,256,512} output video resolution
-d --duration None int output video duration
-ps --pitch_sensitivity 220 [200-295] controls the sensitivity of the class vector to changes in pitch
-ts --tempo_sensitivity 0.25 [0.05-0.8] controls the sensitivity of the noise vector to changes in volume and tempo
-c --classes None manually specify [--num_classes] ImageNet classes
-n --num_classes 12 [1-12] number of unique classes to use
-j --jitter 0.5 [0-1] controls jitter of the noise vector to reduce repitition
-fl --frame_length 512 i*2^6 number of audio frames to video frames in the output
-t --truncation 1 [0.1-1] BigGAN truncation parameter controls complexity of structure within frames
-sf --smooth_factor 20 [10-30] controls interpolation between class vectors to smooth rapid flucations
-bs --batch_size 20 int BigGAN batch_size
-o --output_file name of output file stored in output/, defaults to [--song] path base_name
--use_last_vectors False bool set flag to use previous saved class/noise vectors
--use_last_classes False bool set flag to use previous classes
--sort_pitches False bool set flag to sort pitches by the ordering of classes
-l --lyrics None path to lyrics file; setting [--lyrics LYRICS] computes classes by semantic similarity under BERT encodings
-e --encoding sbert {sbert,doc2vec} controls choice of sentence embeddings technique
-es --encoding_strategy None {random,best,ransac} controls strategy for choosing classes: [-e sbert] can use best or random while [-e doc2vec] can use ransac

Acknowledgments

Thanks to Matt Siegelman for providing the inspiration as well as a boilerplate for the project.

References

biggan-visualizer's People

Contributors

rushk014 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

bellacaprino

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.