icassp2021-a2s's Introduction

ICASSP-A2S

Accompanying code for paper

Lele Liu, Veronica Morfi and Emmanouil Benetos, "Joint Multi-pitch Detection and Score Transcription for Polyphonic Piano Music", IEEE International Conference on Acoustics, Speech and Signal Processing, Canada, Jun 2021.

Environment setup

This project uses pytorch and python 3, it's recommended you first create a python 3 virtual environment

python3 -m venv ICASSP2021-A2S-ENV
source ICASSP2021-A2S-ENV/bin/activate
git clone https://github.com/cheriell/ICASSP-A2S

Run the following command to install the python packages required in this project

cd ICASSP-A2S
pip install -r requirements.txt

In this project, we use the MV2H metric (McLeod et al., 2018) for Audio-to-Score transcription, please refer to the original github repository for installation details.

Before running, enable shell scripts

chmod +x runme.sh
chmod +x audio2score/utilities/evaluate_midi_mv2h.sh

Data

We use the MuseSyn dataset for our experiments. To download the dataset, please refer to: MuseSyn: A dataset for complete automatic piano music transcription research.

Running

Please refer to runme.sh for examples of relavant commands for model training and evaluation. Before you run the script, please remember to change the relavant path on top of the shell script to where you save your datasets, features, models and MV2H metric. Uncomment commands to run the script.

Multi-pitch detection with different time-frequency representations

To train a multi-pitch detection model, use the following command.

python train.py audio2pr --dataset_folder path/to/MuseSyn --feature_folder path/to/MuseSyn/features

For evaluation, run

python test.py audio2pr --dataset_folder path/to/MuseSyn --feature_folder path/to/MuseSyn/features --model_checkpoint model_checkpoint_file

For different time-frequency representations, please modify the spectrogram settings in file audio2score/settings.py. A list of tested spectrogram settings in the paper are given in metadata/spectrogram_settings.csv.

Audio-to-Score transcription with different score representations

To train a single-task audio-to-score transcription model, run

python train.py audio2score --score_type score_type --dataset_folder path/to/MuseSyn --feature_folder path/to/MuseSyn/features

score_type should be Reshaped or LilyPond.

For evaluation, run

python test.py audio2score --score_type score_type --MV2H_path path/to/MV2H/bin --dataset_folder path/to/MuseSyn --feature_folder path/to/MuseSyn/features --model_checkpoint model_checkpoint_file

Joint Transcription

To train a joint transcrition model, run

python train.py joint --score_type "Reshaped" --dataset_folder path/to/MuseSyn --feature_folder path/to/MuseSyn/features

For evaluation, run

python test.py joint --score_type "Reshaped" --MV2H_path path/to/MV2H/bin --dataset_folder path/to/MuseSyn --feature_folder path/to/MuseSyn/features --model_checkpoint model_checkpoint_file

Transcription output example

An example set of Transcription output can be found in folder output/scores_joint_transcription.

icassp2021-a2s's People

Contributors

Stargazers

Watchers

icassp2021-a2s's Issues

After traing and testing the outputs folder are empty, and how to inference with our own audio?

After read your paper I have interest in how to achieve this work.

After training and testing the outputs folder are empty, the training processing end at epoch 56, then run the test.py and get the evaluation like below attached text, but the outputs folder are empty.

And also I wonder how to inference our own midi file and get the transcription music score output here.

(base) ilc@ilc:~/Desktop/workplace/ICASSP2021-A2S$ python test.py audio2pr --dataset_folder ./MuseSyn --feature_folder ./MuseSyn/features --model_checkpoint ./tensorboard_logs/audio2pr-VQT-bins_per_octave=60-n_octaves=8-gamma=20/version_0/checkpoints/epoch=56-valid_loss=43.3211.ckpt
Get train metadata, 4 pianos
Get valid metadata, 4 pianos
Get test metadata, 4 pianos
GPU available: True, used: True
TPU available: None, using: 0 TPU cores
Preparing spectrogram 672/672
Preparing pianoroll 672/672
Preparing spectrogram 80/80
Preparing pianoroll 80/80
Preparing spectrogram 84/84
Preparing pianoroll 84/84
The following callbacks returned in LightningModule.configure_callbacks will override existing callbacks passed to Trainer: ModelCheckpoint
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Get test dataloader
Testing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 458/458 [07:21<00:00, 1.04it/s]

DATALOADER:0 TEST RESULTS
{'logs': {'test_accuracy': 0.7291517294943333,
'test_epoch': 0,
'test_f-score': 0.8169069737195969,
'test_f-score_n_on': 0.6852234803499391,
'test_f-score_n_onoff': 0.4228041814737893,
'test_loss': 89.91925048828125,
'test_precision': 0.9348128437995911,
'test_precision_n_on': 0.8518075491759702,
'test_precision_n_onoff': 0.5432599993745504,
'test_recall': 0.766946617513895,
'test_recall_n_on': 0.631439393939394,
'test_recall_n_onoff': 0.3829577285459639},
'loss': 89.91925048828125,
'test_accuracy': 0.8398997187614441,
'test_epoch': 0.0,
'test_f-score': 0.9031959772109985,
'test_f-score_n_on': 0.8432270884513855,
'test_f-score_n_onoff': 0.664776623249054,
'test_loss': 45.26031494140625,
'test_precision': 0.9279804229736328,
'test_precision_n_on': 0.920455813407898,
'test_precision_n_onoff': 0.7133415937423706,
'test_recall': 0.8941190838813782,
'test_recall_n_on': 0.8031598329544067,
'test_recall_n_onoff': 0.6364219784736633}

Thank you for your time for reading my question.

Recommend Projects

cheriell / icassp2021-a2s Goto Github PK

icassp2021-a2s's Introduction

ICASSP-A2S

Environment setup

Data

Running

Multi-pitch detection with different time-frequency representations

Audio-to-Score transcription with different score representations

Joint Transcription

Transcription output example

icassp2021-a2s's People

Contributors

Stargazers

Watchers

Forkers

icassp2021-a2s's Issues

After traing and testing the outputs folder are empty, and how to inference with our own audio?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent