keunwoochoi / music-auto_tagging-keras Goto Github PK

View Code? Open in Web Editor NEW

614.0 32.0 142.0 34.03 MB

Music auto-tagging models and trained weights in keras/theano

License: MIT License

Python 100.00%

music-auto_tagging-keras's Introduction

Music Auto-Tagger

Music auto-tagger using keras

WARNING! Alternatives available

IF YOU WANT A TAGGER, please also look at
- compact_cnn
IF YOU WANT A FEATURE EXTRACTOR, lookg at either
- compact_cnn
- transfer learning music

..because MusicTaggerCNN and MusicTaggerCRNN is based on an old (and a bit incorrect) implementation of Batch Normalization of old Keras (thanks god it worked anyway), it's quite tricky to fix.

Keras Versions

use keras == 1.0.6 for MusicTaggerCNN.
use 1.2 >= keras > 1.0.6 for MusicTaggerCRNN.
use 1.2 >= keras >= 1.1 for compact_cnn.

The prerequisite -- READ IT!

You need keras to run example.py.
- To use your own audio file, you need librosa.
The input data shape is (None, channel, height, width), i.e. following theano convention. If you're using tensorflow as your backend, you should check out ~/.keras/keras.json if image_dim_ordering is set to th, i.e.

"image_dim_ordering": "th",

To use compact_cnn, You need to install Kapre.

Files (1)

For MusicTaggerCNN and MusicTaggerCRNN.

example_tagging.py: tagging example, example_feat_extract.py: feature extraction example
music_tagger_cnn.py, music_tagger_crnn.py: Models

Files (2)

For compact_cnn

main.py for examples.
More info on the sub README.md.

Structures

Left: compact_cnn CNN, music_tager_cnn. Right: music_tagger_crnn

MusicTaggerCNN

5-layer 2D Convolutions
num_parameter: 865,950
AUC score of 0.8654
WARNING with keras >1.0.6, this model does not work properly. Please use MusicTaggerCRNN until it is updated! (FYI: with 3M parameter, a deeper ConvNet showed 0.8595 AUC.)

MusicTaggerCRNN

4-layer 2D Convolutions + 2 GRU
num_parameter: 396,786
AUC score: 0.8662

How was it trained?

Using 29.1s music files in Million Song Dataset
split setting: A repo for split setting for an identical setting.
See papers
The tags are...

['rock', 'pop', 'alternative', 'indie', 'electronic', 'female vocalists', 
'dance', '00s', 'alternative rock', 'jazz', 'beautiful', 'metal', 
'chillout', 'male vocalists', 'classic rock', 'soul', 'indie rock',
'Mellow', 'electronica', '80s', 'folk', '90s', 'chill', 'instrumental',
'punk', 'oldies', 'blues', 'hard rock', 'ambient', 'acoustic', 'experimental',
'female vocalist', 'guitar', 'Hip-Hop', '70s', 'party', 'country', 'easy listening',
'sexy', 'catchy', 'funk', 'electro' ,'heavy metal', 'Progressive rock',
'60s', 'rnb', 'indie pop', 'sad', 'House', 'happy']

Which is the better predictor?

UPDATE: The most efficient computation, use compact_cnn. Otherwise read below.
Training: MusicTaggerCNN is faster than MusicTaggerCRNN (wall-clock time)
Prediction: They are more or less the same.
Memory Usage: MusicTaggerCRNN have smaller number of trainable parameters. Actually you can even decreases the number of feature maps. The MusicTaggerCRNN still works quite well in the case - i.e., the current setting is a little bit rich (or redundant). With MusicTaggerCNN, you will see the performance decrease if you reduce down the parameters.

Therefore, if you just wanna use the pre-trained weights, use MusicTaggerCNN. If you wanna train by yourself, it's up to you. I would use MusicTaggerCRNN after downsizing it to, like, 0.2M parameters (then the training time would be similar to MusicTaggerCNN) in general. To reduce the size, change number of feature maps of convolution layers.

Which is the better feature extractor?

By setting include_top=False, you can get 256-dim (MusicTaggerCNN) or 32-dim (MusicTaggerCRNN) feature representation.

In general, I would recommend to use MusicTaggerCRNN and 32-dim feature as for predicting 50 tags, 256 features actually sound bit too large. I haven't looked into 256-dim feature but only 32-dim features. I thought of using PCA to reduce the dimension more, but ended up not applying it because mean(abs(recovered - original) / original) are .12 (dim: 32->16), .05 (dim: 32->24) - which don't seem good enough.

Probably the 256-dim features are redundant (which then you can reduce them down effectively with PCA), or they just include more information than 32-dim ones (e.g., features in different hierarchical levels). If the dimension size would not matter, it's worth choosing 256-dim ones.

Usage

$ python example_tagging.py
$ python example_feat_extract.py

Result

theano, MusicTaggerCRNN

data/bensound-cute.mp3
[('jazz', '0.444'), ('instrumental', '0.151'), ('folk', '0.103'), ('Hip-Hop', '0.103'), ('ambient', '0.077')]
[('guitar', '0.068'), ('rock', '0.058'), ('acoustic', '0.054'), ('experimental', '0.051'), ('electronic', '0.042')]

data/bensound-actionable.mp3
[('jazz', '0.416'), ('instrumental', '0.181'), ('Hip-Hop', '0.085'), ('folk', '0.085'), ('rock', '0.081')]
[('ambient', '0.068'), ('guitar', '0.062'), ('Progressive rock', '0.048'), ('experimental', '0.046'), ('acoustic', '0.046')]

data/bensound-dubstep.mp3
[('Hip-Hop', '0.245'), ('rock', '0.183'), ('alternative', '0.081'), ('electronic', '0.076'), ('alternative rock', '0.053')]
[('metal', '0.051'), ('indie', '0.028'), ('instrumental', '0.027'), ('electronica', '0.024'), ('hard rock', '0.023')]

data/bensound-thejazzpiano.mp3
[('jazz', '0.299'), ('instrumental', '0.174'), ('electronic', '0.089'), ('ambient', '0.061'), ('chillout', '0.052')]
[('rock', '0.044'), ('guitar', '0.044'), ('funk', '0.033'), ('chill', '0.032'), ('Progressive rock', '0.029')]

And...

More info - CNN:
- on this paper, or blog post.
- Also please take a look on the slide at ismir 2016. It includes some results that are not in the paper.
More info - RNN:
- paper, or blog post

Reproduce the experiment

A repo for split setting for an identical setting of experiments in two papers.
Audio file: find someone around you who happened to have the preview clips. or you have to crawl the files. I would recommend you to crawl your colleagues...

Credits

Compact CNN: will be updated.
Convnet: Automatic Tagging using Deep Convolutional Neural Networks, Keunwoo Choi, George Fazekas, Mark Sandler 17th International Society for Music Information Retrieval Conference, New York, USA, 2016
ConvRNN : Convolutional Recurrent Neural Networks for Music Classification, Keunwoo Choi, George Fazekas, Mark Sandler, Kyunghyun Cho, arXiv:1609.04243, 2016
Test music items are from http://www.bensound.com.

music-auto_tagging-keras's People

Contributors

Stargazers

Watchers

Forkers

fangzheng354 thierry-silbermann suraj-deshmukh techscientist awesome-archive ml-lab ohyeslk arasharchor aironashish bhaveshoswal beckgom teslasloan crosofg piefu pinglmlcv jpauwels cvertex rollingstone kafkafield diggerdu zx3xyy hotzenklotz sam-coln gustibimo sunnycat2013 ycheng30 rikima simmoncn vanova yyd19921214 laventura tspannhw exeex corajr ankitshah009 ericpresas megamanics tobby2002 tbarnier robotiko zhengyu19921215 vansdev nelca agangzz stevenlol mixcoder fx-cc signalogy ashiq24 kirill380 tonytongzhao jliuliu had27 gooliath12 enjoysport2022 allensmile deeplearningsky jiangyangbo flfjfj aihill nanfengpo llmin zhangyang5511 sxpistols omarun nemocpp capt4ce awalin shubhampachori12110095 colinsongf yizong98 jumper3 qqyes blank-wang ramananm philkuz veersingh007 excerebrose treerachai lym0302 hsmoonjohn bienkyo sushantjha8 super-louis fitrialif manojknit gatarelib tedoreve shuvro leihuayi zhwtf rntjrwn yuyangchen0122 hongsixin danirb97 lallilo94 liangtao123 trash-inc cookiekira vital95

music-auto_tagging-keras's Issues

sound file duration

Hi, If I want to extract a music file with duration of 2s.
Can I use compact_cnn to extract the music sound features? I see the duration in your code is 29.s.

Why this error happens? -Input dimension mis-match

When I run example_tagging.py ,
This error message shows up.

ValueError: Input dimension mis-match. (input[0].shape[2] = 96, input[1].shape[2] = 1366)
Apply node that caused the error: Elemwise{Composite{(((i0 - i1) * i2) + i3)}}(input_1, InplaceDimShuffle{x,x,0,x}.0, Elemwise{Composite{(i0 / sqrt((i1 + i2)))}}.0, InplaceDimShuffle{x,x,0,x}.0)
Toposort index: 56
Inputs types: [TensorType(float32, 4D), TensorType(float32, (True, True, False, True)), TensorType(float32, (True, True, False, True)), TensorType(float32, (True, True, False, True))]
Inputs shapes: [(4, 1, 96, 1366), (1, 1, 1366, 1), (1, 1, 1366, 1), (1, 1, 1366, 1)]
Inputs strides: [(524544, 524544, 5464, 4), (5464, 5464, 4, 4), (5464, 5464, 4, 4), (5464, 5464, 4, 4)]
Inputs values: ['not shown', 'not shown', 'not shown', 'not shown']
Outputs clients: [[if{inplace}(keras_learning_phase, Elemwise{Composite{(((i0 - i1) * i2 * i3) + i4)}}.0, Elemwise{Composite{(((i0 - i1) * i2) + i3)}}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

I have apply the setting in keras.json

    "backend": "theano",
    "image_dim_ordering": "th"

The enviroment is:

OS : macOS
Python version : 3.5.3 (with Pycharm venv)
Backend: Theano 0.9.0rc3

It looks like the tensor dim / input data type is not correctly set.
How do I fix it up?

issue related to cost function

In your paper you have specified to use binary cross-entropy function. Why is that and what will happen on using categorical cross-entropy function?

Compact cnn training precision

compact cnn에서 tag추출하기 위한 pretrained model을 구할 수 있나요?

compact cnn에서 tag을 추출하려고 하는데,
제공되는 pretrained model에는 마지막 레이어 weight 값은 없는 것 같습니다.
혹시 해당 파일 있으면 얻을 수 있는 지 궁금합니다.

Audioread module

Dear sir/madam,
When I run example_tagging.py I get the following error
Couldn't import dot_parser, loading of dot files will not be possible.
ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.
Traceback (most recent call last):
File "example_tagging.py", line 6, in
import audio_processor as ap
File "/user/HS224/mv00147/Desktop/py_env/lib/python2.7/site-packages/audio_processor.py", line 1, in
import librosa
File "/user/HS224/mv00147/Desktop/py_env/lib/python2.7/site-packages/librosa/init.py", line 15, in
from . import core
File "/user/HS224/mv00147/Desktop/py_env/lib/python2.7/site-packages/librosa/core/init.py", line 90, in
from .audio import * # pylint: disable=wildcard-import
File "/user/HS224/mv00147/Desktop/py_env/lib/python2.7/site-packages/librosa/core/audio.py", line 9, in
import audioread
ImportError: No module named audioread
Can anyone help?
Warm regards,
Mahi :)

How did you get the actual mp3s ?

the npy for mp3 is not the same with the logamplitude melspectrogram I got using the code audio_processor.py

I use the code audio_processor.py to get the logamplitude melspectrogram for
bensound-dubstep.mp3, but find not the same with the array in bensound-dubstep.npy.

Why is this?

How do I train my CNN with my own dataset

Dear Scott,
I wish to train the CNN presented by Keunwoochoi with my dataset which is available in the path_1 specified in this code. They are .wav files. How do I represent it in a suitable format so it can be input to the network for training? This is my code. Unfortunately most tutorials on the web only explain training CNN on image data and not audio.

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD, RMSprop
from keras.utils import np_utils
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import os
import theano
import wave
from numpy import *
from sklearn.utils import shuffle
from sklearn.cross_validation import train_test_split
from keras.layers.normalization import BatchNormalization
from keras.layers.advanced_activations import ELU, PReLU
from keras.utils.data_utils import get_file
from keras.models import Model
import time
from keras import backend as K
import audio_processor as ap
import pdb
import soundfile as sf
path_1 = 'C:\\Users\\Admin\\keras cnn tutorial\\input_data'
path_2 = 'C:\\Users\\Admin\\keras cnn tutorial\\input_data_resized'

listing = os.listdir(path_1)
num_samples=size(listing)
print(num_samples)
for file in listing:
    data, sr = sf.read(path_1 + "\\" + file)
    data_1=numpy.fromfile(file, dtype=float, count=-1, sep='')
    data_f=data.resize(200,200)

How to use compact_cnn for tagging?

Do you have any example about tagging with compact_cnn?
it seems that your example main.py is for feature extraction

what is the 29.1 file of million song?

do you refer the subset of one million songs?

What was the GPU that you used to train your model??

I am very new to this but which GPU did you use while training it on MTT?? My kernal dies constantly on Tensorflow-GPU with Nvidia 960m.

Epochs on MTT experiment

Hello! I wish to know, how many epochs did you require to train MTT on CNN to obtain an AUC score of 0.894 as mentioned in your paper?

Cannot use CRNN with Theano backend

Hi,

Do you know if the pre-trained CRNN model works for any combination of library versions? I've tried tensorflow, but I have seen another issue here saying that it is impossible to load tensorflow weights.
I'm using Keras 1.2.0 and Theano 0.9.0 and I'm getting the following error output during model.predict():

ValueError: GpuReshape: trying to reshape an array of total size 1440 into an array of total size 96. Apply node that caused the error: GpuReshape{4}(bn_0_freq_running_mean, TensorConstant{[ 1 1 96 1]}) Toposort index: 39 Inputs types: [GpuArrayType<None>(float32, (False,)), TensorType(int64, vector)] Inputs shapes: [(1440,), (4,)] Inputs strides: [(4,), (8,)] Inputs values: ['not shown', array([ 1, 1, 96, 1])] Outputs clients: [[GpuElemwise{sub,no_inplace}(GpuIncSubtensor{Set;::, ::, int64:int64:, int64:int64:}.0, GpuReshape{4}.0)]]

FCN-4 Convolve problem

hello my friends ,i am beginner and i have problem with convolve operation in FCN-4 in AUTOMATIC TAGGING USING DEEP CONVOLUTIONAL NEURAL NETWORKS (table1 page 3 -

)
i calculate output like below :

conv (3x3xkernel_size) reduce the input dimention 2 unit.
for example (250x250x3) with conv(3x3x3) output is ( (250-2=248) x (250-2=248) x 3)

i attach my caculation file below,

please help, thanks alot

No backend error

while runningexample_tagger.pyI am constantly getiing this error:
Traceback (most recent call last):
File "I:\Coding\music-auto_tagging-keras-master\example_feat_extract.py", line 431, in
main(net)
File "I:\Coding\music-auto_tagging-keras-master\example_feat_extract.py", line 408, in main
melgram = ap.compute_melgram(audio_path)
File "I:\Coding\music-auto_tagging-keras-master\audio_processor.py", line 24, in compute_melgram
src, sr = librosa.load(audio_path, sr=SR) # whole signal
File "C:\Python27\lib\site-packages\librosa\core\audio.py", line 107, in load
with audioread.audio_open(os.path.realpath(path)) as input_file:
File "C:\Python27\lib\site-packages\audioread_init_.py", line 116, in audio_open
raise NoBackendError()
NoBackendError
what should I do about this

this is my data set
['music/bensound-cute.mp3',
'music/bensound-actionable.mp3',
'music/bensound-dubstep.mp3',
'music/bensound-thejazzpiano.mp3',
'music/"24k Magic.mp3"',
'music/"7 Years.mp3"',
'music/"A-Sky-Full-Of-Stars-Hardwell-Remix.mp3"',
'music/"Addicted To You.mp3"',
'music/"Alesso-Heroes.mp3"',
'music/"All Time Low.mp3"',
'music/"Alone Marshmallow.mp3"',
'music/Alone.mp3''music/Animals_MartinGarrix.mp3',
'music/Arcadia.mp3',
'music/"Area 51".mp3',
'music/Ariana_Grande-Baby.mp3',
'music/"Avicii - Waiting For Love (Marshmello Remix).mp3"',
'music/"Avicii - Waiting for Love (Original Mix).mp3"',
'music/"Avicii feat Aluna George What Would I Change It To.mp3"',
'music/"Avicii feat Billy Raffoul You Be Love.mp3"',
'music/"Avicii feat Rita Ora Lonely Together.mp3"',
'music/"Avicii feat. Sandro Cavazza Without You.mp3"',
'music/"Avicii feat. Vargas and Lagola Friend Of Mine.mp3"',
'music/"avicii wake me up.mp3"',
'music/"Bad (feat. Vassy) [Radio Edit].mp3"',
'music/"Bad (feat. Vassy).mp3"',
'music/"Bang my Head (feat. Sia).mp3"',
'music/Barcelona.mp3',
'music/"Bibia Be Ye Ye.mp3"',
'music/"Birds Fly.mp3"',
'music/"Blank Space.mp3"',
'music/"Boulevard of broken dreams.mp3"',
'music/"Broken Arrows.mp3"',
'music/"Carry Me (feat. Julia Michaels).mp3"',
'music/"Castle On the Hill.mp3"',
'music/"City Lights.mp3"',
'music/classic-mkto.mp3',
'music/"Clean Bandit - Rockabye ft. Sean Paul Lyrics Video NEW 2016.mp3"',
'music/Colors.mp3',
'music/Dangerous (feat. Sam Martin).mp3"',
'music/Daylight.mp3',
'music/"Dear Boy.mp3"',
'music/"Despacito - Remix.mp3"',
'music/Dive.mp3',
'music/"DJ Snake - Let Me Love You ft Justin Bieber (Marshmello Remix).mp3"',
'music/"don t you worry child.mp3"',
'music/Echo.mp3',
'music/Eclipse.mp3',
'music/"Ed Sheeran - Perfect.mp3"',
'music/Edom.mp3',
'music/"Enrique Iglesias - Somebody s Me.mp3"',
'music/Eraser.mp3',
'music/example.mp3',
'music/Faded.mp3',
'music/"Fiction (feat. Tom Odell).mp3"',
'music/"Firestone (feat. Conrad Sewell).mp3"',
'music/"Florida whistle.mp3"',
'music/"Follow Me.mp3"',
'music/"For A Better Day.mp3"',
'music/Fragile.mp3',
'music/"Galway Girl.mp3"',
'music/"Give me love.mp3"',
'music/"Gonna Love Ya.mp3"',
'music/Good_Enough.mp3',
'music/"Goodbye Friend (feat. The Script).mp3"',
'music/Happier.mp3',
'music/"Happy Birthday (feat. John Legend).mp3"',
'music/"Hardwell feat ,Jake Reese-Mad World.MP3"',
'music/"Heart Attack .mp3"',
'music/"Heart Upon My Sleeve.mp3"'
'music/"Hearts Don t Break Around Here.mp3"',
'music/"Hello-marshmello-remix.mp3"',
'music/"Hey Baby.mp3"',
'music/"Hey Brother.mp3"',
'music/"Hey Mama (feat. Nicki Minaj & Afrojack).mp3"',
'music/Homeless.mp3',
'music/"How Deep Is Your Love.mp3"',
'music/"How Would You Feel (Paean).mp3"',
'music/"I Love It.mp3"',
'music/"I Took A Pill In Ibiza.mp3"',
'music/I_Bet_My_Life.mp3',
'music/"If I Never See Your Face Again.mp3"',
'music/"Imagine Dragons - Believer.mp3"',
'music/"Imagine Dragons - Walking The Wire (Audio).mp3"',
'music/"in the end.mp3"',
'music/"Intro.mp3',
'music/"Just the Way You Are .mp3"',
'music/"Kiss Me.mp3"',
'music/"Lay Me Down.mp3"',
'music/"Lego house .mp3"',
'music/"Let Me Be Your Home.mp3"',
'music/"Let Me Be Your Lover (feat. Pitbull).mp3"',
'music/"Let Me Love You - DJ Snake Justin Bieber (320kbps)-(MusicVilla.In).mp3"',
'music/"Levels.mp3"',
'music/"Liar Liar.mp3"',
'music/"Lift me up (feat. Nico & Vinz, Ladysmith Black Mambazo).mp3"',
'music/Linkin_Park_-Powerless.mp3',
'music/"Listen (feat. John Legend).mp3"',
'music/"Long Road To Hell.mp3"',
'music/love-story.mp3',
'music/"Lovers on the Sun (feat. Sam Martin).mp3"',
'music/"Major Lazer - Cold Water.mp3"',
'music/"Mark Ronson - Uptown Funk (Ft. Bruno Mars).mp3"',
'music/"Maroon 5 - Maps.mp3"',
'music/"Maroon 5 Misery.mp3"',
'music/"Maroon-5-Love-Somebody.mp3"',
'music/"Marshmello Ookay ft. Noah Cyrus - Chasing Colors.mp3"',
'music/"Marshmello - Summer.mp3"',
'music/"Marshmello_ft_Khalid-_Silence_Talkmuzik.mp3"',
'music/"Marvin Gaye.mp3"',
'music/"moves like jagger.mp3"',
'music/"Nancy Mulligan.mp3"',
'music/"New Man.mp3"',
'music/"No Money no Love (feat. Elliphant & Ms. Dynamite).mp3"',
'music/"Not Alone (feat. RHODES).mp3"',
'music/"Nothing Can Hold Us Down.mp3"',
'music/"Nothing Left (feat. Will Heard).mp3"',
'music/numb.mp3',
'music/"Oasis (feat. Foxes).mp3"',
'music/"One last time.mp3"',
'music/"Owl City - Fireflies2.mp3"',
'music/"Payphone (feat. Wiz Khalifa).mp3"',
'music/"payphone alex goot.mp3"',
'music/"Photograph.mp3"',
'music/"Pure Grinding.mp3"',
'music/"Raging (feat. Kodaline).mp3"',
'music/"Rap God.mp3"',
'music/"Rise (feat. Skylar Grey).mp3"',
'music/"S.T.O.P (feat. Ryan Tedder).mp3"',
'music/"Sally.mp3"',
'music/"Save Myself.mp3"',
'music/"See You Again.mp3"',
'music/"Selena Gomez - Hands To Myself.mp3"',
'music/"Selena Gomez - Kill Em With Kindness.mp3"',
'music/"Serious (feat. Matt Corby).mp3"',
'music/"Shame On Me.mp3"',
'music/"Shape of you.mp3"',
'music/"She Will Be Loved.mp3"',
'music/"Shot me Down (feat. Skylar Grey) [Radio Edit].mp3"',
'music/"Sia - Cheap Thrills (ft. Sean Paul) - 320 Kbps-(MusicVilla.In).mp3"',
'music/"Simple Plan - Perfect.mp3"',
'music/"Sing Me to Sleep (Marshmello Remix).mp3"',
'music/Sing Me to Sleep.mp3"',
'music/"Sing.mp3"',
'music/"Somewhere In Stockholm.mp3"',
'music/"Stand By Me.mp3"',
'music/"Stay (feat. Maty Noyes).mp3"',
'music/"Stereo Hearts.mp3"',
'music/"Stole the Show (feat. Parson James).mp3"',
'music/"Sugar.mp3"',
'music/"SUMMER OF 69.MP3"',
'music/"Sun Goes Down (feat. MAGIC! & Sonny Wilson).mp3"',
'music/"Sunset Jesus.mp3"',
'music/"Supermarket Flowers.mp3"',
'music/"Sweater Weather.mp3"',
'music/"Talk To Myself.mp3"',
'music/"Ten More Days.mp3"',
'music/"The A Team.mp3"',
'music/"The Chainsmokers Coldplay - Something Just Like This.mp3"',
'music/"The Chainsmokers - All We Know ft. Phoebe Ryan.mp3"',
'music/"The Chainsmokers - Roses.mp3"',
'music/"The Chainsmokers DJ Snake Ft. Zayn - I Know - 2017.mp3 (Mp3goo.com).mp3"',
'music/"The Whisperer (feat. Sia).mp3"',
'music/"Thinking Out Loud .mp3"',
'music/"Thousands years.mp3"',
'music/"Thunder - Imagine Dragons.mp3"',
'music/"Titanium Feat Sia.mp3"',
'music/"Tonight (I m Lovin You).mp3"',
'music/"Touch Me.mp3"',
'music/"Trouble.mp3"',
'music/"True Believer.mp3"',
'music/"United We Are.mp3',
'music/"Urban Cone Weekends.mp3"',
'music/"Versace On The Floor.mp3"',
'music/"Waiting For Love.mp3"',
'music/"waiting for love(prinston $ astrid s Acoustic version).MP3"',
'music/"Wake Me Up.mp3"',
'music/"What Do I Know.mp3"',
'music/"What I did for Love.mp3"',
'music/"When i was you man.mp3"',
'music/"Where Is Here Now.mp3"',
'music/"Wont Go Home Without You.mp3"',
'music/"Wrecking Ball.mp3"',
'music/"Yesterday (feat.Bebe Rexha).mp3"',
'music/"You Make Me.mp3"',
'music/"Youre Beautiful.mp3"',
'music/"Young Again.mp3"',
'music/"your song.mp3"']

these are the dependencies that i have installed
appdirs (1.4.3)
audioread (2.1.5)
cycler (0.10.0)
Cython (0.25.2)
decorator (4.2.1)
enum34 (1.1.6)
ffmpy (0.2.2)
funcsigs (1.0.2)
functools32 (3.2.3.post2)
h5py (2.7.0)
joblib (0.11)
Keras (2.1.3)
librosa (0.5.1)
llvmlite (0.21.0)
matplotlib (2.0.0)
numba (0.36.2)
numpy (1.14.0)
packaging (16.8)
pip (9.0.1)
pyparsing (2.2.0)
python-dateutil (2.6.0)
pytz (2017.2)
PyYAML (3.12)
resampy (0.2.0)
scikit-learn (0.19.1)
scipy (1.0.0)
setuptools (38.5.1)
singledispatch (3.4.0.3)
six (1.11.0)
subprocess32 (3.2.7)
Theano (0.9.0)
wheel (0.30.0)

Music similarity using music-auto_tagging for feature extraction

Hi ,

I have used the music-auto_tagging for feature extraction along with a classifier and come up with an approach to recommend music based on music similarity.

Accoustically similar music have some accoustic features which can be used to identify similar sounding music. We have considered the following four accoustic features in order to distinguish different kind of music.
1)Drag : Songs which are very slow moving
2)Beats : Songs in which rythmic beats are more prominent and highlighted
3)Melody : Songs without too much beats /slow or medium tempo (speed) /
4)Fast : Songs which have fast tempo (speed) .

We picked manually around 60-80 music samples under each of the above categories.
Then we formed a binary cluster for each category as follows :
For example in order to train for Drag category , we formed two groups with 65 samples in each group.
1)Drag group (65 samples) : all samples in this group are very slow moving songs.
2)Others group (65 samples ) : We mixed samples from beats melody and fast category which are not slow moving but with medium to fast tempo(speed).

Then we trained a convolutional neural network (CNN) model using normalized spectrograms of all samples from both groups. The trained model was used to identify the dragscore for a song.

Using the same approach as explained above , separate CNN models were trained in order to get beatsscore, melodyscore and fastscore from a song.

Now using the four separately trained CNN models , we obtained dragscore,beatsscore,melodyscore and fastscore for every song in the test data (around 3000+ music samples) which were not part of training data.

For any selected song from test set , based on the four scores of the selected sample song , songs which have similar scores in the test set were listed in the result page with option to listen to the selected and listed samples.

The listing based on score similarity seemed to match the selected song in terms of music similarity for 85-90% of the selections.

Input dimension mismatch

Hello,
I managed to debug the audio read related errors and now when I try to run the code (example_tagging.py), I get the following error
ValueError: Input dimension mismatch (input[0].shape[2]=96, input[1].shape[2]=1366)
Is there something you can suggest to correct this?
Warm regards,#Mahalakshmi

Low AUC for MTT

I trained this CNN model on MTT dataset and I got the results given in the files attached.
The acc is 90.69% but AUC is quite low being 0.58.
Can you view once and suggest any change for improvement?

Code.txt
output_cnn.txt

Having trouble running the compact CNN model

Hello, dear friends, I have a problem implementing this compact CNN code.
I will send you my pipe list.
I installed the kapre with tricks, I installed the keras.
I read what noticed about the other two models(CRNN and FCN)
I can't actually test model with my own model. Neither compact model, nor CRNN model, nor FCN one.
Do you know a way for me to use article results in my project without having to implement them myself ?

Input file data shape of .npy file in compact_cnn

Is the test file , 1100103.clip.npy , generated by the same way of other .npy files through audio_processor.py ?

When I use bensound-thejazzpiano.npy instead of 1100103.clip.npy main.py throw out a data shape mismatch msg:
ValueError: Error when checking : expected melspectrogram_input_1 to have 3 dimensions, but got array with shape (1, 1, 1, 1, 96, 1366)

How to modify the audio_processor.py to fix this error?

Thank you again.

Loss function and Optimizer

What kind of loss function and what kind of optimizer are you using for this? I couldn't find them in the code. According to the paper, it is cross entropy and adam resp.

Can i use MusicTaggerCRNN for extract feature from another feature audio like chromagram not mel-spectrogram?

I just want use your pretrained model of you to extract feature from another feature audio i get from librosa like chroma_stft, chrom_cens, mfcc ...

Reproduce and how to avoid overfitting

Hi Keunwoo,

This project is great. Thanks for making it reproducible.

What is the best way to train this network to reproduce the results?

What type of GPU's and how many do you recommend to avoid overfitting?

Many thanks,

Nicc

Keras 2.0

Are the CompactCnn weights compatible with the last version of keras?

Wrong Prediction with GTZAN Music Genre Dataset

Hi,

I downloaded GTZAN Music genre dataset from http://marsyasweb.appspot.com/download/data_sets/?_sm_au_=i7HSSSWqdVMd13T7.
I converted the GTZAN dataset from 22050hz to 12000 hz sampling rate using sox. (ex: sox inputfile.wav -b16 -r12000 out.wav)
When I ran the example tagging script with audio files from GTZAN/rock directory, most of the predictions are showing it as jazz.
What am I doing wrong?

regards
Srinidhi

Dockerfile for run experiments

Hi,
I am trying to run your code, but i have the problem with the different version that you need for run. So is possible if you create a Dockerfile with all dependencies to create an enviroment to run your examples?

thanks for your time! :)

audio_processor.py bug

In compute_melgram() function, just need to make sure you provide integer values for the indices

Adapt Line 31:
src = src[(n_sample - n_sample_fit) / 2:(n_sample + n_sample_fit) / 2]

To:
src = src[int((n_sample - n_sample_fit) / 2):(int(n_sample + n_sample_fit) / 2)]

Weights shape for CRNN for TF are incorrect

The CRNN takes a 96*1366 spectrogram image as input. However the weights (for input batch normalization layer) provided for Tensorflow for CRNN have the shape of the

bn_0_freq_beta_1 (1440,)
bn_0_freq_gamma_1 (1440,)
bn_0_freq_running_mean_1 (1440,)
bn_0_freq_running_std_1 (1440,)

which is incompatible with the input image size.

too many syntax error in, say compact cnn

compact cnn

preprocess.py

line 8 or 9
missing a "
and
src
should be src_loaded
too many errors
old

Filters kernel

I want to ask you: What kind filters can we use for Conv2D? I see only 3*3 (That's the dimension) but I don't know what this Matrix has as parameters or coeffecients. Or when I just give that function Conv2D with numbers of filters and the filter's dimension, does it choose random coeffecients for this Matrix? Can you give me some informations about it?

small error with argument 'by_name'

Hi, sorry but why?

rustam@rustam-ssd2:~/github/music-auto_tagging-keras$ python example_feat_extract.py
Using Theano backend.
Running main() with network: crnn and backend: theano
Traceback (most recent call last):
  File "example_feat_extract.py", line 64, in <module>
    main(net)
  File "example_feat_extract.py", line 52, in main
    model = MusicTaggerCRNN(weights='msd', include_top=False)
  File "/home/rustam/github/music-auto_tagging-keras/music_tagger_crnn.py", line 140, in MusicTaggerCRNN
    by_name=True)
TypeError: load_weights() got an unexpected keyword argument 'by_name'

ubuntu 16
anaconda python 2.7
keras 1.0.7
theano 0.8.2

Suggestion needed

I'm trying to use the compact CNN features for a music singing language classification. Do you have any suggestion about what layer should i choose?

Thanks!

Preview Audio files for training

Hi @keunwoochoi ! Great work on this repo! Seriously amazing.

I was just wondering how it is possible to get the short preview clips for the tracks in the MSD. I know that, in one of your repos, you stated that there are currently only 2 options: crawling the web or "crawling your colleagues"

But I just wanted to ask, do you know some place where I can get them, so that I can retrain the network on my own. If I am not mistaken, I need the preview clips to retrain the network, right? (The 7Digital API recommended by MSD to get the preview audio files is not working, so I do no know how to "scrape" the preview files from the internet? How were you able to get access to the preview audio files?

Thanks!

Further elaboration on Permute/Reshape for CNN/RNN stack

finally had some time to dig deeper into your auto-tagging network and compared it to what I am currently using for classifying music. I can confirm that CRNNs work nice, but I stacked them a bit differently. My main question is about how you actually connect the CNN and RNN layers.

Let me explain it closely related to your layers/code. These are the main ingredients for your network:

input             -> (nb_samples, 1, time, bins)
conv              -> (nb_samples, nb_filters, new_time, new_bins)
permute(2, 1, 3)  -> (nb_samples, new_time, nb_filters, new_bins)
reshape(15, 128)  -> (nb_samples, new_time * new_bins, nb_filters)

So essentially your RNN sees time*bins sequences of the filters.

Can you elaborate on your decision for your permutation and reshape?
Are there any other references you've found with a similar stacking architecture?
Why didn't you keep the time dimension as is and reshape to (nb_samples, new_time, new_bins*nb_filters)?

Another minor question: while comparing your code I wanted to adjust the reshape to my input, and thought: "hey, lets use an unknown dimension" so I don't have to calculate the exact shape from the pooling operations. Unfortunately Reshape((-1, 128)) does not work. Do you know a workaround in keras?

Shapes are not compatible

Hi @keunwoochoi

I got the following error:

  File "example.py", line 92, in <module>
    main(net)
  File "example.py", line 67, in main
    model = AudioConvnet()
  File "/home/rudy/AOITEK/DeepLearning/music-auto_tagging-keras/audio_convnet.py", line 121, in AudioConvnet
    model.load_weights('data/%s_weights_%s.h5' % ('cnn', K._BACKEND))
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2489, in load_weights
    self.load_weights_from_hdf5_group(f)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2563, in load_weights_from_hdf5_group
    K.batch_set_value(weight_value_tuples)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 926, in batch_set_value
    assign_op = x.assign(assign_placeholder)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 501, in assign
    return state_ops.assign(self._variable, value, use_locking=use_locking)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 45, in assign
    use_locking=use_locking, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 703, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2319, in create_op
    set_shapes_for_outputs(ret)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1711, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/state_ops.py", line 210, in _AssignShape
    return [op.inputs[0].get_shape().merge_with(op.inputs[1].get_shape())]
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_shape.py", line 570, in merge_with
    (self, other))
ValueError: Shapes (3, 3, 1, 32) and (32, 1, 3, 3) are not compatible

compact cnn how to obtain the dominant tags ?

Hello again,

i've implemented the compact-cnn neural network however the output is a vector of 0 & 1 corresponding to inditifed taggs. Is it possible like in the CRNN to only have the dominant taggs like :

[('jazz', '0.444'), ('instrumental', '0.151'), ('folk', '0.103'), ('Hip-Hop', '0.103'), ('ambient', '0.077')]

Kind regards

Emmanuel

Can't run any of models with the latest python packages

https://gist.github.com/anonymous/062721cae29f62ab7262ceb1d0e66a59 - stack trace
https://gist.github.com/anonymous/cfb146880d54675b213e125b1624f313 - requirements.txt
https://gist.github.com/anonymous/b30226c691a3d2bd7999b67356dd5461 - keras.conf

I tried both backends - Theano and Tensorflow.

Could you provide your requirements.txt?

I'm using only CPU.

Value Error


  File "example.py", line 62, in main
    model = AudioConvnet()
  File "/home/ubuntu/gg_code/music-auto_tagging-keras/audio_convnet.py", line 77, in AudioConvnet
    x = BatchNormalization(axis=time_axis, name='bn_0_freq')(melgram_input)
  File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 515, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 573, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 150, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/layers/normalization.py", line 131, in call
    epsilon=self.epsilon)
  File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 392, in batch_normalization
    'spatial', epsilon)
  File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/theano/sandbox/cuda/dnn.py", line 2770, in dnn_batch_normalization_test
    (gamma.ndim, beta.ndim, ndim))
ValueError: gamma and beta must be of the same dimensionality as inputs; got 1 and 1 instead of 4

Is there something wrong with the setup?

No Normalization2D for compact_cnn?

I see that you have disabled the normalization of melspec in compact_cnn. was there any reason?

problem with the model generation

when I run the example_tagging an error occurs in the MusicTaggerCNN:

ValueError: Dimension 0 in both shapes must be equal, but are 96 and 1366 for 'Assign' (op: 'Assign') with input shapes: [96], [1366].

How can we solve this problem ?

Kind regards

Emmanuel

Please provide crnn training example

Hi Keunwoo!
Thank you for this repo!
But can you please provide a code chunks for model training?
Number of epochs, batch size and number of samples, etc...
I would like to train my own model with this dataset:
https://github.com/mdeff/fma

For now I have created a custom generator but maybe you can point out few hints.
Thank you.

def MelGenerator(features, labels, batch_size):
 # Create empty arrays to contain batch of features and labels#
    batch_features = np.zeros((batch_size, 96, 1366, 1))
    batch_labels = np.zeros((batch_size,1))
    while True:
        for i in range(batch_size):
        # choose random index in features
            index = randint(0, np.size(batch_features,0))
            batch_features[i] = features[index]
            batch_labels[i] = labels[index]
        yield batch_features, batch_labels

parameter problem

the input is 96X1366, and as we multiply all the pooling param in that dimension, we get:
96=2x2x2x3x4， but in another dimension it doesn't fit at all: 4x4x4x5x4=1280，shouldn't it be1366？

I think the weight file for cnn_tensorflow is not the correct file.

I am pretty sure the weight file for tensorflow version of CNN is not correct. It seems it is the same as theano weight file. Do you still have access to tensorflow weight file to upload the right file? Thanks

h5py error

Dear sir,
Thank you so much for your immediate response. I installed audio read. Also, since my GPU is still not set up, I plan to run it on the CPU itself. But I get the following error when I try to install h5py module.

Downloading/unpacking h5py
Downloading h5py-2.7.0rc2.tar.gz (256Kb): 256Kb downloaded
Running setup.py egg_info for package h5py
zip_safe flag not set; analyzing archive contents...

Installed /user/HS224/mv00147/py_env/lib/python2.7/site-packages/build/h5py/pkgconfig-1.2.2-py2.7.egg
Traceback (most recent call last):
  File "<string>", line 14, in <module>
  File "/user/HS224/mv00147/py_env/lib/python2.7/site-packages/build/h5py/setup.py", line 165, in <module>
    cmdclass = CMDCLASS,
  File "/usr/lib/python2.7/distutils/core.py", line 112, in setup
    _setup_distribution = dist = klass(attrs)
  File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 221, in __init__
    self.fetch_build_eggs(attrs.pop('setup_requires'))
  File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 245, in fetch_build_eggs
    parse_requirements(requires), installer=self.fetch_build_egg
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 588, in resolve
    raise VersionConflict(dist,req) # XXX put more info here
pkg_resources.VersionConflict: (numpy 1.6.1 (/usr/lib/python2.7/dist-packages), Requirement.parse('numpy>=1.7'))
Complete output from command python setup.py egg_info:
zip_safe flag not set; analyzing archive contents...

Installed /user/HS224/mv00147/py_env/lib/python2.7/site-packages/build/h5py/pkgconfig-1.2.2-py2.7.egg

Traceback (most recent call last):

File "", line 14, in

File "/user/HS224/mv00147/py_env/lib/python2.7/site-packages/build/h5py/setup.py", line 165, in

cmdclass = CMDCLASS,

File "/usr/lib/python2.7/distutils/core.py", line 112, in setup

_setup_distribution = dist = klass(attrs)

File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 221, in init

self.fetch_build_eggs(attrs.pop('setup_requires'))

File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 245, in fetch_build_eggs

parse_requirements(requires), installer=self.fetch_build_egg

File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 588, in resolve

raise VersionConflict(dist,req) # XXX put more info here

pkg_resources.VersionConflict: (numpy 1.6.1 (/usr/lib/python2.7/dist-packages), Requirement.parse('numpy>=1.7'))

Command python setup.py egg_info failed with error code 1
Storing complete log in /user/HS224/mv00147/.pip/pip.log
Could you suggest a way forward?
Many thanks,

Mahalakshmi

ValueError: Input dimension mis-match.

keras backend도 theano로 설정하고 버전도 keras version도 =1.1로 설정했는데 ValueError: Input dimension mis-match.라는 오류문을 받았는데 해결방법을 알 수 있을까요 ?