Giter Club home page Giter Club logo

text-to-music's Introduction

Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task

Model description

This language-music model takes BART-base fine-tunes on 282,870 English text-music pairs, where all scores are represented in ABC notation. It was introduced in the paper Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task by Wu et al. and released in huggingface.

It is capable of generating complete and semantically consistent sheet music directly from descriptions in natural language based on text. To the best of our knowledge, this is the first model that achieves text-conditional symbolic music generation which is trained on real text-music pairs, and the music is generated entirely by the model and without any hand-crafted rules.

Intended uses & limitations

You can use this model for text-conditional music generation. All scores generated by this model can be written on one stave (for vocal solo or instrumental solo) in standard classical notation, and are in a variety of styles, e.g., blues, classical, folk, jazz, pop, and world music. The generated tunes are in ABC notation, and can be converted to sheet music or audio using this website, or this software.

Its creativity is limited, can not perform well on tasks requiring a high degree of creativity (e.g., melody style transfer), and it is input-sensitive. For more information, please check our paper.

How to use

  1. Install dependencies:
torch                        1.9.1+cu111
Unidecode                    1.3.4
samplings                    0.1.7
transformers                 4.18.0
  1. Set the input text input_text.txt for conditional music generation.
This is a traditional Irish dance music.
Note Length-1/8
Meter-6/8
Key-D

You can use the following types of text as prompts: 1) musical analysis (e.g., tonal analysis and harmonic analysis), 2) meta-information (e.g., key and meter), 3) the context in which the piece was composed (e.g., history and story), and 4) subjective perceptions (e.g., sentiment and preference).

Of course, you can also mix several types of descriptions together. In the folder output_tunes we provided some samples of inputs and outputs.

  1. Run the script run_inference.py. When running a script for the first time, the downloaded files will be cached for future reuse.
python run_inference.py -num_tunes 3 -max_length 1024 -top_p 0.9 -temperature 1.0 -seed 0
  1. Enjoy tunes in the folder output_tunes! If you want to convert these ABC tunes to MIDI or audio, please refer to Intended uses & limitations.
X:1
L:1/8
M:6/8
K:D
 A | BEE BEE | Bdf edB | BAF FEF | DFA BAF | BEE BEE | Bdf edB | BAF DAF | FED E2 :: A |
 Bef gfe | faf edB | BAF FEF | DFA BAF | Bef gfe | faf edB | BAF DAF | FED E2 :|

X:2
L:1/8
M:6/8
K:D
 A |: DED F2 A | d2 f ecA | G2 B F2 A | E2 F GFE | DED F2 A | d2 f ecA | Bgf edc |1 d3 d2 A :|2
 d3 d2 a || a2 f d2 e | f2 g agf | g2 e c2 d | e2 f gfe | fed gfe | agf bag | fed cde | d3 d2 a |
 agf fed | Adf agf | gfe ecA | Ace gfe | fed gfe | agf bag | fed cde | d3 d2 ||

X:3
L:1/8
M:6/8
K:D
 BEE BEE | Bdf edB | BAF FEF | DFA dBA | BEE BEE | Bdf edB | BAF FEF |1 DED DFA :|2 DED D2 e |:
 faf edB | BAF DFA | BAF FEF | DFA dBA | faf edB | BAF DFA | BdB AFA |1 DED D2 e :|2 DED DFA ||

Usage

optional arguments:
  -h, --help            show this help message and exit
  -num_tunes NUM_TUNES  the number of independently computed returned tunes
  -max_length MAX_LENGTH
                        integer to define the maximum length in tokens of each
                        tune
  -top_p TOP_P          float to define the tokens that are within the sample
                        operation of text generation
  -temperature TEMPERATURE
                        the temperature of the sampling operation
  -seed SEED            seed for randomstate

BibTeX entry and citation info

@inproceedings{
wu2023exploring,
title={Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task}, 
author={Shangda Wu and Maosong Sun},
booktitle={The AAAI-23 Workshop on Creative AI Across Modalities},
year={2023},
url={https://openreview.net/forum?id=QmWXskBhesn}
}

text-to-music's People

Contributors

sander-wood avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.