BigGAN Audio Visualizer

Description

This visualizer explores BigGAN (Brock et al., 2018) latent space by using pitch/tempo of an audio file to generate and interpolate between noise/class vector inputs to the model. Classes are chosen manually or optionally using semantic similarity on BERT encodings of a lyrics corpus.

Usage:

usage: visualizer.py [-h] -s SONG [-r {128,256,512}] [-d DURATION]
                     [-ps [200-295]] [-ts [0.05-0.8]]
                     [-c CLASSES [CLASSES ...]] [-n NUM_CLASSES] [-j [0-1]]
                     [-fl i*2^6] [-t [0.1-1]] [-sf [10-30]] [-bs BATCH_SIZE]
                     [-o OUTPUT_FILE] [--use_last_vectors]
                     [--use_last_classes] [--sort_pitch] [-l LYRICS]
                     [-e {sbert,doc2vec}] [-es {best,random,ransac}]

In order to speed up runtime, code can be run on Google Colab GPUs (or other cloud notebook providers) using biggan_music_visualizer.ipynb (hosted here).
The [-n NUM_CLASSES] parameter selects the number of classes to interpolate between.
Default behavior is to select [-n NUM_CLASSES] random classes. The [-c CLASSES [CLASSES ...]] parameter can be used to select specific ImageNet classes. A full list can be found here, and a list categorized by coarse descriptors here. Be sure to use the int ids and not the string labels, and set [-n NUM_CLASSES] to the number of chosen classes.
Use the [--sort_by_power] flag to map classes to the [-n NUM_CLASSES] highest power pitches. By default, classes are mapped to a chromatic scale.
The [-d DURATION] parameter can be useful to generate short videos while tweaking other parameters. Once the desired parameters are set, use the [--use_last_vector] flag and remove the [-d DURATION] parameter to generate the same video at full length.
Reducing the output resolution with [-r {128,256,512}] and/or increasing the frame length with [-fl i*2^6] can help reduce the runtime.
To compute classes through semantic similarity to a lyrics file, use the [-l LYRICS] parameter. The embedding technique and strategy for choosing classes can be set with [-e {sbert,doc2vec}] and [-es {best,random,ransac}] respectively.
Pitch and tempo sensitivity can be set with [-ps [200-295]] and [-ts [0.05-0.8]] respectively. Jitter, truncation and smooth factor can be set with [-j [0-1]], [-t [0.1-1]] and [-sf [10-30]] respectively.
See the help column of the arguments section for details on all parameters.

Arguments

short	long	default	range	help
`-h`	`--help`			show this help message and exit
`-s`	`--song`			path to input audio file `[REQUIRED]`
`-r`	`--resolution`	`512`	`{128,256,512}`	output video resolution
`-d`	`--duration`	`None`	`int`	output video duration
`-ps`	`--pitch_sensitivity`	`220`	`[200-295]`	controls the sensitivity of the class vector to changes in pitch
`-ts`	`--tempo_sensitivity`	`0.25`	`[0.05-0.8]`	controls the sensitivity of the noise vector to changes in volume and tempo
`-c`	`--classes`	`None`		manually specify `[--num_classes]` ImageNet classes
`-n`	`--num_classes`	`12`	`[1-12]`	number of unique classes to use
`-j`	`--jitter`	`0.5`	`[0-1]`	controls jitter of the noise vector to reduce repitition
`-fl`	`--frame_length`	`512`	`i*2^6`	number of audio frames to video frames in the output
`-t`	`--truncation`	`1`	`[0.1-1]`	BigGAN truncation parameter controls complexity of structure within frames
`-sf`	`--smooth_factor`	`20`	`[10-30]`	controls interpolation between class vectors to smooth rapid flucations
`-bs`	`--batch_size`	`20`	`int`	BigGAN batch_size
`-o`	`--output_file`			name of output file stored in `output/`, defaults to `[--song]` path base_name
	`--use_last_vectors`	`False`	`bool`	set flag to use previous saved class/noise vectors
	`--use_last_classes`	`False`	`bool`	set flag to use previous classes
	`--sort_pitches`	`False`	`bool`	set flag to sort pitches by the ordering of classes
`-l`	`--lyrics`	`None`		path to lyrics file; setting `[--lyrics LYRICS]` computes classes by semantic similarity under BERT encodings
`-e`	`--encoding`	`sbert`	`{sbert,doc2vec}`	controls choice of sentence embeddings technique
`-es`	`--encoding_strategy`	`None`	`{random,best,ransac}`	controls strategy for choosing classes: `[-e sbert]` can use `best` or `random` while `[-e doc2vec]` can use `ransac`

Acknowledgments

Thanks to Matt Siegelman for providing the inspiration as well as a boilerplate for the project.

rushk014 / biggan-visualizer Goto Github PK

biggan-visualizer's Introduction

BigGAN Audio Visualizer

Description

Usage:

Arguments

Acknowledgments

References

biggan-visualizer's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent