MusicScore-script

Official toolkit for paper: MusicScore: A Dataset for Music Score Modeling and Generation.

Yuheng Lin, Zheqi Dai and Qiuqiang Kong

This codebase contains two parts:

Two step scripts for cleaning sheet music score images, refers to ./data_process/.
Evaluation scripts for music score generation experiment for reproducing the Fréchet Inception Distance (FID) scores in Section 4.2, refers to ./evaluation/.

Dataset download itself is not included in this codebase, please jump through this portal.

Data process

This codebase maintains two steps of filtering sheet music score images.

1. Color depth filter

We distinguish whether an image is high quality or not by identifying color depth. The color depth of 1-bit corresponds to black and white images, while the color depth of 8-bit or 16-bit corresponds to color images. The color depth filter script is provided data_process/color_depth_filter/color_depth.py which multiprocess supported.

2. Non-score filter

We implement a classification model to filter score and non-score images, refers to data_process/non_cover_filter/cover.ipynb. The notebook contains:

Training and inferencing scripts of non-score filter model.
Processing script of restoring hd_data after applying the classification model.
Evaluation script of non-score filter model.

The training and testing dataset locates in ./cover_data/, containing 450 and 50 images respectively. We also provide trained model checkpoint which can be loaded for inference.

Our cover and non-cover classification achieved a 90% accuracy on our test dataset. The evaluation metrics are presented in the table below:

Class	Precision	Recall	F1-score	Support
Non-score	0.9524	1.0000	0.9756	20
Score	1.0000	0.9667	0.9831	30
Accuracy		0.9800		50
Macro avg	0.9762	0.9833	0.9793	50
Weighted avg	0.9810	0.9800	0.9801	50

*Add-on:

We also provide multi-processing enhanced pdf2img script that we used to slice music score PDF files into single page images. The script can be migrated to any tasks that requires PDF to image slicing.

Evaluation

In paper, we conduct music score generation experiment which is a image generation task driven by text. We fine-tuned Stable Diffusion 2.0 using stable-diffusion-2-base checkpoint. In Section 4.2, we performed evaluation by calculating FID-k scores on different amount of samples in three subsets, where k represents the amount of samples. We provide inferencing scripts of text-to-score generation, refers to t2i_eval.py. Example usage:

python evaluation/t2i_eval.py \
    --scale "MS-400"          \  # choose from ["MS-400","MS-14k","MS-200k"]
    --data_dir /path/to/your/real_images

The FID calculation requires pytorch-fid library which can installed by pip install pytorch-fid. For our use case, run:

python -m pytorch_fid         \
    /path/to/real_images      \
    /path/to/generated_images \
    --device cuda:0           \
    --num-workers 14

In our experiment, we perform all inferences under 512x512 resolution. We use DDIM Sampler with 250 DDIM sampling steps. We guide our generation using Classifier-Free Guidance with CFG = 4.0. The evaluation result in our paper refers to the table below.

Subset	MusicScore-400	MusicScore-14k	MusicScore-200k
FID-8	114.65	297.60	294.76
FID-16	85.81	221.42	314.06
FID-32	84.33	255.00	264.02
FID-64	74.46	229.16	261.28

A sample generated result refers to the figure below.

Prompt (starting from left):

a music score, instrumentation is violin, key is A major

a music score, instrumentation is violin

a music score, instrumentation is piano, key is A major

a music score, instrumentation is piano

License

The data, code and model weights are licensed under CC-BY 4.0.

BibTeX

If you use related contents about this work, do consider citing this work using the following BibTeX entries:

@misc{lin2024musicscore,
      title={MusicScore: A Dataset for Music Score Modeling and Generation},
      author={Yuheng Lin and Zheqi Dai and Qiuqiang Kong},
      year={2024},
      journal={arXiv preprint arXiv:2406.11462},
}

@misc{dai2024msscript,
  author={Zheqi Dai, Yuheng Lin and Qiuqiang Kong},
  title={{MusicScore-script: Data Processing Toolkit}},
  month={June},
  year={2024},
  note={Version 0.1.0},
  howpublished={\url{https://github.com/dzq84/MusicScore-script}},
}

dzq84 / musicscore-script Goto Github PK