Giter Club home page Giter Club logo

remi's Introduction

REMI

Authors: Yu-Siang Huang, Yi-Hsuan Yang

Paper (arXiv) | Blog | Audio demo (Google Drive) | Online interactive demo

REMI, which stands for REvamped MIDI-derived events, is a new event representation we propose for converting MIDI scores into text-like discrete tokens. Compared to the MIDI-like event representation adopted in exising Transformer-based music composition models, REMI provides sequence models a metrical context for modeling the rhythmic patterns of music. Using REMI as the event representation, we train a Transformer-XL model to generate minute-long Pop piano music with expressive, coherent and clear structure of rhythm and harmony, without needing any post-processing to refine the result. The model also provides controllability of local tempo changes and chord progression.

Citation

@inproceedings{10.1145/3394171.3413671,
  author = {Huang, Yu-Siang and Yang, Yi-Hsuan},
  title = {Pop Music Transformer: Beat-Based Modeling and Generation of Expressive Pop Piano Compositions},
  year = {2020},
  isbn = {9781450379885},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3394171.3413671},
  doi = {10.1145/3394171.3413671},
  pages = {1180–1188},
  numpages = {9},
  location = {Seattle, WA, USA},
  series = {MM '20}
}

Getting Started

Install Dependencies

  • python 3.6 (recommend using Anaconda)
  • tensorflow-gpu 1.14.0 (pip install tensorflow-gpu==1.14.0)
  • miditoolkit (pip install miditoolkit)

Download Pre-trained Checkpoints

We provide two pre-trained checkpoints for generating samples.

Obtain the MIDI Data

We provide the MIDI files including local tempo changes and estimated chord. (5 MB)

  • data/train: 775 files used for training models
  • data/evaluation: 100 files (prompts) used for the continuation experiments

Generate Samples

See main.py as an example:

from model import PopMusicTransformer
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

def main():
    # declare model
    model = PopMusicTransformer(
        checkpoint='REMI-tempo-checkpoint',
        is_training=False)
        
    # generate from scratch
    model.generate(
        n_target_bar=16,
        temperature=1.2,
        topk=5,
        output_path='./result/from_scratch.midi',
        prompt=None)
        
    # generate continuation
    model.generate(
        n_target_bar=16,
        temperature=1.2,
        topk=5,
        output_path='./result/continuation.midi',
        prompt='./data/evaluation/000.midi')
        
    # close model
    model.close()

if __name__ == '__main__':
    main()

Convert MIDI to REMI

You can find out how to convert the MIDI messages into REMI events in the midi2remi.ipynb.

FAQ

1. How to synthesize the audio files (e.g., mp3)?

We strongly recommend using DAW (e.g., Logic Pro) to open/play the generated MIDI files. Or, you can use FluidSynth with a SoundFont. However, it may not be able to correctly handle the tempo changes (see fluidsynth/issues/141).

2. What is the function of the inputs "temperature" and "topk"?

It is the temperature-controlled stochastic sampling methods are used for generating text from a trained language model. You can find out more details in the reference paper CTRL: 4.1 Sampling.

It is worth noting that the sampling method used for generation is very critical to the quality of the output, which is a research topic worthy of further exploration.

3. How to finetune with my personal MIDI data?

Please see issue/Training on custom MIDI corpus

Acknowledgement

remi's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

remi's Issues

Generated music quality with finetuning

Thank you for the amazing work on this project!

However, I have an issue with the generated music quality after finetuning on the MAESTRO dataset. For more information, I was using the "REMI-tempo-chord-checkpoint as the base model and trained it for 5 epochs of the whole dataset. My hypothesis after reading the paper is because of the time signatures of songs in the MAESTRO dataset that is not supported by the REMI encoding method.

Do you have any insights about this problem?

.pb file output

Hello,

Is it possible to save a .pb file instead of a .ckpt when the training is 'complete'? I am looking into porting this model into ONNX and PyTorch and would need a pb file to accomplish this. Many thanks!

i can't train from scratch after remove line 99

i removed this line but i can't train,it says:" self._traceback = tf_stack.extract_stack_for_node(self._c_op)"

You can see #99 in model.py which is used to restore the pre-trained checkpoint. You can remove this line if you want to train from scratch.

Originally posted by @remyhuang in #17 (comment)

module 'miditoolkit.pianoroll.parser' has no attribute 'get_pianoroll'

HI.
This is a great project.
But , I meet some wrong in chord_recognition.py
module 'miditoolkit.pianoroll.parser' has no attribute 'get_pianoroll' in line 34
Isn't the function here supposed to be miditoolkit.pianoroll.parser.notes2pianoroll instead of

miditoolkit.pianoroll.parser.get_pianoroll
Thank u ~

self.group_size*2 should be self.group_size in prepare_data()

Thank you for providing the code for finetuning the model on custom midi datasets!

I notice that at the line 251 in model.py, the "step" argument of np.arrage is set to self.group_size*2. I think it leads to skipping half of the segments in pairs. Should it be just self.group_size instead of self.group_size*2?

for i in np.arange(0, len(pairs)-self.group_size, self.group_size*2):
    data = pairs[i:i+self.group_size]
        if len(data) == self.group_size:
             segments.append(data)

questions about your data preprocessing

Hi, your work is really amazing! I have some questions about the def prepare_data() function in line 252.
`
# abandon the last

for i in np.arange(0, len(pairs)-self.group_size, self.group_size*2):
data = pairs[i:i+self.group_size]
if len(data) == self.group_size:
segments.append(data)
`

  1. why you abandon the last pairs_elements in one midi file? Will it improve the final result?
  2. In the for loop, the third parameter is self.group_size*2. It means you will skip 5 pairs each loop in one midi file. For example, if a "pairs" variable has shape [30,2,512], only the pairs[0:5], pairs[10:15]and pairs[20:25]will be added to the final "segments" variable. Could you tell me why you not feeded the whole pairs to "segment"?

Thank you!

midi2remi is not run

Hi, I have a question. Why my midi2remi is not running. When midi2remi is running ,the problem appears (FileNotFoundError: [Errno 2] No such file or directory: '. \pop data'), I hope you can help to solve it.

The code for evaluation(downbeat and so on)

Hi,

Thank you for your information about the dataset and I can generate pretty good result! Now I would like to evaluate the beat std, downbeat std and downbeat salience. After doing some research, I think you might use the madmom package to calculate the three scores(I am not sure.), and I am not familiar with that package. Therefore, could you please share your evaluation code with me?

image

Some questions about "Controllable CHORD and TEMPO"

Thanks for your amazing works! When I read your paper and blog, I feel confused about the details about the method for controlling the chord and tempo of the generated results. I can not find where to input or control the chords(which I want the output has) in codes. It would be helpful if you could give me some suggestions.

Please help with REMI to MIDI coversion

Hey guys,

I would really appreciate if you can help me make my REMI colab implementation of REMI encoding work.

Here is the link.
https://github.com/asigalov61/DOREMI/blob/main/DOREMI.ipynb

I have no idea how to convert REMI/ REMI model output back to midi. I would really appreciate any advice/help that you can offer/suggest.

Your proposal/encoding is very interesting and a lot of people want to try it but cuz you did not provide full implementation/colab, very few of us can eval/use your work.

Please help.

Thank you so much.

Error in training from scratch

When I try to train the model from scratch, errors occur when initializing some vars. The error is : Failed precondition: Error while reading resource variable transformer/layer_0/rel_attn/layer_normalization/gamma from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/transformer/layer_0/rel_attn/layer_normalization/gamma/N10tensorflow3VarE does not exist.

something is wrong! Tempo Value_57

Hello, I changed the original training music to my own and this problem appeared, how can I solve it? Thank you very much for your answer!

MIDI files end up their last note pushed up a position

Hello, thank you for publishing this repo. I noticed theres a bug that occurs at the end of a MIDI file when encoding to REMI.
The very last note, or multiple notes that occur after a certain position in the last bar will be "pushed up" a position making them off rhythmically. I'm not sure how to fix this. I assume there must be an issue in the item2event function but I'm not very experienced with numpy.

questions about model.py

I meet some wrong in model.py.
Here is my code:
from model import PopMusicTransformer
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
model = PopMusicTransformer(checkpoint='REMI-tempo-checkpoint', is_training=False)

And it turns out:
ValueError: Variable transformer/r_w_bias already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

      File "C:\Users\ailab502\anaconda3\envs\REMI\lib\site-packages\tensorflow\python\framework\ops.py", line 2005, in __init__
        self._traceback = tf_stack.extract_stack()
      File "C:\Users\ailab502\anaconda3\envs\REMI\lib\site-packages\tensorflow\python\framework\ops.py", line 3616, in create_op
        op_def=op_def)
      File "C:\Users\ailab502\anaconda3\envs\REMI\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
        return func(*args, **kwargs)
      File "C:\Users\ailab502\anaconda3\envs\REMI\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
        op_def=op_def)
      File "C:\Users\ailab502\anaconda3\envs\REMI\lib\site-packages\tensorflow\python\ops\gen_state_ops.py", line 2023, in variable_v2
        shared_name=shared_name, name=name)

How can I fix this error?

style question

Hi remy,

 Thanks for your amazing work.
 In your blog, I see different music style, when you finish your training, how do you control what style your model will generate?
 In my understanding, it depends on the style of training dataset, if we would like to generate different music style, we must use different to retrain the model? am I right?

Best,

More detailed information about finetuning data format

Thank you very much for your work and also especially for providing the finetune code!

However, I intend to finetune the model with data not in MIDI but in a format where I have information about notes in the form of the following properties:

  • for each note: Position, Pitch, Velocity, Duration (in positions)
  • additionally: overall time signature and chord tokens + their positions.

So what I attempt is trying to convert these events into a notation that the finetune method can work with. However, from the paper I am unfortunately unable to discern the meaning of each Event the method extracts from a MIDI file. Here's an example of Events I get from a MIDI file when running finetune:

 Event(name=Bar, time=None, value=None, text=1),
 Event(name=Position, time=0, value=1/16, text=0),
 Event(name=Tempo Class, time=0, value=mid, text=None),
 Event(name=Tempo Value, time=0, value=30, text=None),

 Event(name=Position, time=960, value=9/16, text=960), 
 Event(name=Note Velocity, time=960, value=9, text=38/36),
 Event(name=Note On, time=960, value=52, text=52),
 Event(name=Note Duration, time=960, value=63, text=5255/3840), ...

I understand much of it but the meaning of a few bits and pieces is unclear to me:

  • lines 3-4: Tempo class mid is clear, but what does the Tempo Value: value = 30 mean, 30 BPM?
  • line 6: What does text=38/36 mean in regards of velocity?
  • line 7: What does value/text = 52 mean?
  • line 8: What does value=63 mean, 63 32th note multiples? What does text=5255/3840 mean?

Kind regards and thank you in advance!

Training on custom MIDI corpus?

Hello! This is a great project, and I was able to get it running immediately. Thank you for sharing it!

Is there a way to train the model from scratch -- and/or fine-tune one of your published models -- on my own corpus of MIDI files?

Errors with some seeds/stems MIDIs

Hey guys,

Reporting you error that happens when you run continuation function:

Is there a specific requirement for seed/stems files for your models?


KeyError Traceback (most recent call last)
in ()
5 topk=5,
6 output_path='./result/continuation.midi',
----> 7 prompt='/content/remi/prompt.mid')
8
9 # close model

1 frames
/content/remi/model.py in (.0)
139 if prompt:
140 events = self.extract_events(prompt)
--> 141 words = [[self.event2word['{}_{}'.format(e.name, e.value)] for e in events]]
142 words[0].append(self.event2word['Bar_None'])
143 else:

KeyError: 'Note Velocity_31'

Training from scratch for Classical Piano

Hey guys,

Another tiny issue here...

Is there any way you can share/help to write a nice training from scratch code for Classical Piano (think MAESTRO or similar Datasets)?

I know you have suggested adjusting parameters of the training data (which is useful) but there is no code. You only provide a fine-tuning code.

So if you can do this/help with this, I will be very grateful to you and your help will be much appreciated.

Thank you again

Issues when training with other datasets

Hi,

I would like to train the transformer xl with other midi dataset such as Maestro. However, when I convert the Midi to Remi, and then convert the Remi back to midi, the rythm will have some inconsistency. In addition, only the first track of the multitrack piano midis will be obtained. Therefore, it will cause some polyphonic problem.

Could you give me any advice where I can find other suitable piano midi dataset without loss after remi conversion? Thanks!

Issues with evaluation (std and salience for beat/downbeat)

Hi! I was trying to realize the evaluation part of the code according to the paper (see the attached picture) Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions:

image

However, There must be some mistakes with my calculation because of the perceptable distincts between my result of beat std and downbeat salience and the one the paper. Also, I found difficulty in realizing the part of downbeat std because of the different description (perhaps?) of the paper and the file for madmom. The data presented in my result is based on the train set (Remi/data/train) and madmom, and I think I transferred the .midi files into correct .wav files:

image

image

I've viewed https://madmom.readthedocs.io/en/latest/modules/features/downbeats.html for related information but still failed to solve it by myself. Therefore, could you please share the evaluation part with me? Thank you so much!

Training Model from Scratch

Hello, thank you for making your source code available for this project. Is it possible to train the model from scratch using our own midi dataset without using one of the shared checkpoints? I've tried the finetune.py script you've included but the resulting model is still too biased toward the initial training set for what I'm trying to do. Thanks again for sharing your work in this space.

Questions about the training data

Hello,

I have tried to train transformer-xl from scratch. However, the generating result is not as good as yours. The theme of one generated midi file is not consistent, which means the latest slice of sequence hears different from the prompt. Therefore,could you tell me if the dataset you provide is the whole dataset? or if you used other bigger dataset for pre-training and used the provided one only for finetuning?

My experiment setting is:
n_layers: 12
x_len: 512
m_len:512
ff:2048

And could you tell me the testset cross-entropy loss you got in your experiment?

Thank you!

Checkpoints saved with a NAN value

Hi, thanks for your amazing work.

I'm trying to finetune your model on a smaller dataset (292 midi files). I'm using the REMI-tempo-chord-checkpoint as the base model. However, my checkpoints are saved with a NAN value, from the first epoch (e.g, model-000-nan.data)

I also have this two warning messages during the training process:

/miniconda3/envs/tfEnv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3440: RuntimeWarning: Mean of empty slice.
/miniconda3/envs/tfEnv/lib/python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars

I first checked if something was wrong during the event extraction. It seems that the extraction worked well: len(all_events)=292 that is the size of my dataset. Same with all_words.

However, the segments length is only 3. That means training_data = 3 and num_batches = 0

So I guess something went wrong in that part, but I don't know how to fix it:

# to training data
self.group_size = 5
segments = []
for words in all_words:
    pairs = []
    for i in range(0, len(words)-self.x_len-1, self.x_len):
        x = words[i:i+self.x_len]
        y = words[i+1:i+self.x_len+1]
        pairs.append([x, y])
    pairs = np.array(pairs)

    # abandon the last
    for i in np.arange(0, len(pairs)-self.group_size, self.group_size*2):
        data = pairs[i:i+self.group_size]
        if len(data) == self.group_size:
            segments.append(data)
segments = np.array(segments)

Does anyone have an idea how to fix it? Thanks a lot

problem when using REMI-tempo-chord-checkpoint

Thanks for this great work! I generate samples with REMI-tempo-checkpoint successfully. But, when I use REMI-tempo-chord-checkpoint, the model shows errors below :
image
image

anything I did wrong? Thanks!

Data Augmentation Used?

After looking at the repository closely, I failed to find any data augmentation (Such as transposition and time stretching) used. So, is there a way that I can add this in or enable this feature somehow?

Thank you very much!

Hey guys!

Just wanted to thank you for this awesome repo/paper/approach. I think you are really onto something here. I will test soon and let you know what I think if you are interested.

The only thing I wanted to humbly recommend is to create a nice Google Colab for your idea so that it can easily be tried and evaluated. This is what I am going to do with your code so that everyone can easily check it out. I hope you do not mind.

Otherwise, great job! Thank you again :)))

AS

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.