bzamecnik / ml Goto Github PK

View Code? Open in Web Editor NEW

92.0 7.0 39.0 6.06 MB

Machine learning projects, often on audio datasets

License: MIT License

Python 2.47% Shell 0.12% Jupyter Notebook 97.27% HTML 0.09% Batchfile 0.05%

ml's Introduction

Machine learning projects

Machine learning projects, often on audio datasets.

Author: Bohumír Zámečník (@bzamecnik)
License: MIT (see the LICENSE file)

Projects

ml's People

Contributors

Stargazers

Watchers

ml's Issues

Compute the chromagram features as a sklearn transformer

Deploy the instrument classsification model webapp publicly.

Make it easy to use the predicted pitch classes eg. in Sonic Visualizer

take a trained model
take an audio file
pre-process the audio feature
predict the labels
save the pre-frame class labels into a TSV file
save the pre-frame probability labels into a TSV file
convert labels to segments
store the segment labels into another TSV file

For frame labels the format can be like:

C   Db  D   Eb  E   F   Gb  G   Ab  A   Bb  B
1   0   0   0   1   0   0   1   0   0   0   0
0   0   1   0   0   1   0   1   0   0   0   1
[...]

Ie. a TSV file with header. The columns represent pitch classes. Each value is 0 or 1 indicating if the pitch class is predicted as active or not. For probability labels there can be a float between 0.0 and 1.0 (before thresholding).

For segment labels the format is like:

start   end C   Db  D   Eb  E   F   Gb  G   Ab  A   Bb  B
0.0 2.612267    0   0   0   0   0   0   0   0   0   0   0   0
2.612267    11.45907    0   0   0   0   1   0   0   0   1   0   0   1

The frames as collapsed to time intervals. Each interval is represented by start and end time (in float seconds). The segments should be ordered increasingly and should not overlap. Ideally they should contain no holes (ie. be perfectly adjacent to each other).

compute ranking of models - which is the best so far?

Update the code to use tfr-0.2.1 API.

package the chromagram library

TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

I am currently trying to run the program with Python3 on my Mac OSX. But as I run it with the .sh script, I do not seem to be getting any output. In order to investigate more, I used a simple:

python3 predict.py sample.flac

This gave me the set of errors below:

/usr/local/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Traceback (most recent call last):
  File "predict.py", line 131, in <module>
    model = InstrumentClassifier(args.model_dir)
  File "predict.py", line 47, in __init__
    self.model = load_model_from_dir(model_dir)
  File "predict.py", line 44, in load_model_from_dir
    model_dir + '/model_arch.yaml',
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Could you please guide me as to the proper procedure to fix this error?

There's missing .py tools

There're import of tools that are missing on the repository. Could you please commit with these files??

For example:

"sys.path.append('../../tools/music-processing-experiments')" from ml-master/chord-recognition/notebooks/chord_classification_data_preparation.ipynb

There's no 'music-processing-experiments' folder....

generate audio data with chords

Produce a model evaluation report.

use tfr instead of music-processing-experiments

Add music-processing-experiments as git submodule

Right now it is just a manually added symlink.

Better package & publish it and use it via pip just as another Python package.

Describe how to prepare the features

rewrite the model to the Keras functional API

predict.py dropping different errors

Hi! I'm trying to test prediction on wav and flac, but both of them drop me different errors: running wav file, I get

File "C:/dipl0m/ml-master/instrument-classification/predict.py", line 98, in predict_class_label x_features = self.load_features(audio_file)
File "C:/dipl0m/ml-master/instrument-classification/predict.py", line 135, in <module> print(model.predict_class_label(args.audio_file))
File "C:/dipl0m/ml-master/instrument-classification/predict.py", line 87, in load_features
x_features = self.ch.transform(x0)
AttributeError: 'dict' object has no attribute 'transform'

and on flac file

Traceback (most recent call last): File "C:/dipl0m/ml-master/instrument-classification/predict.py", line 135, in <module>
print(model.predict_class_label(args.audio_file))
File "C:/dipl0m/ml-master/instrument-classification/predict.py", line 98, in predict_class_label
x_features = self.load_features(audio_file)
File "C:/dipl0m/ml-master/instrument-classification/predict.py", line 83, in load_features
x, fs = sf.read(audio_file)
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\soundfile.py", line 374, in read
frames = f._prepare_read(start, stop, frames)
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\soundfile.py", line 1447, in _prepare_read
self.seek(start, SEEK_SET)
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\soundfile.py", line 870, in seek
_error_check(self._errorcode)
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\soundfile.py", line 1455, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Internal psf_fseek() failed.

I'm using Python 3.6.2 on Windows and the model from here. Could you help me to figure out the issue?

Prepare a library for loading and saving data

high-level tasks:
- preparing training data (features + labels)
- preparing data for prediction (features)
low-level tasks
- audio recording to chromagram (frame-wise features)
- frame-wise features to shape suitable for various DNN architectures:
  - individual frames
- individual frames + 1D convolution
- individual frames with context
- individual frames with context + 2D convolution
- sequences of frames (for LSTM)
- sequences of frames (for LSTM) + 1D convolution
- splitting dataset to training, validation and test splits
- storing data
post-processing tasks
saving & loading models

Why are you using binary cross entropy

Hey this is really cool work. Is there are a reason you are using binary cross entropy loss over categorical cross entropy loss given that this seems to be a multiclass classification problem in this sample https://github.com/bzamecnik/ml/blob/master/chord-recognition/lstm_chord_classification_training.py

[chord-recognition]: Try CTC loss for the sequence label prediction task.

Convert the model notebook into a script

Git submodule init doesn't work

Analyze training via TensorBoard.

visualize the chromagrams

Visualize the input feature matrices so that we can more easily understand what the network learns or why it has problems.

chromagram for each data point
average chromagram for each class

Wrap the model into a web application.

Compare a few approaches to binary pitch class vector classification

Input: continuous chroma vector (many octaves, values in dB).
Output: binary vector of activity of pitch classes (merged to single octave).

Approches:

single frame (no context) + 1D convolution net
fixed window to history (some context) + 2D convolution net
"unlimited" history + RNN (eg. LSTM)
bidirectional RNN (also future context)
CNN+RNN (?)

Write a proper README

Update the chord classifier to current versions of Keras/TensorFlow

Produce a model evaluation report as a generated IPython notebook.

Analyze error with respect to pitch.

Wrap the model into an end-to-end tool.

Input: audio clip file (WAV/FLAC)
Output: instrument family (text label)

start the pitch classification task

generalize to variable length audio samples

Publish the dataset of computed chromagram features

put the models into unique directories

Evaluate the errors in pitch class vector prediction

overall metrics for the full dataset splits
- accuracy (on binary classes, full chords)
- hamming distance (on binary classes, individual tones)
- AUC (on probabilities, more informative)
metrics on separate songs
- how they differ?
confusion matrix (matrices)
- 12x 2x2 - for each pitch class separately
- 1x 4096x4096 - for whole pitch class vectors
plot a line of error per time frame
- which places are more erroneous?
How to take into consideration class imbalance?
- imbalance in tones vs. silence (1 vs. 0)
- imbalance across pitch classes

separate input data directory and model directory

Compare the model performance with trivial models

random guess
majority class

Figure out minimum reasonable metric values: accuracy, hamming score, AUC.

avoid duplication between the ml and deep-instrument-heroku repos

generate audio data with sequence of tones

write blog post describing the instrument classification task

Analyze and describe the dataset.

extract_features.py gives very weird error.

Hi again! I've tried to run whole preprocessing, but stuck on extracting features. I prepare dataset in FLAC and run extract_features.py {AUDIO_DIR} {FEATURE_DIR} without any changes in code, and I got this:

Traceback (most recent call last): File "C:/dipl0m/ml-master/instrument-classification/extract_features.py", line 68, in <module>
args.block_size, args.hop_size, args.bin_range, args.bin_division)
File "C:/dipl0m/ml-master/instrument-classification/extract_features.py", line 36, in extract_pitchgrams
for i in range(len(dataset.samples)))
File "C:\Users\jackb\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\shape_base.py", line 421, in dstack
return _nx.concatenate([atleast_3d(_m) for _m in tup], 2)
File "C:\Users\jackb\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\shape_base.py", line 421, in <listcomp>
return _nx.concatenate([atleast_3d(_m) for _m in tup], 2)
File "C:/dipl0m/ml-master/instrument-classification/extract_features.py", line 36, in <genexpr>
for i in range(len(dataset.samples)))
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\tfr\sklearn.py", line 32, in transform
bin_division=self.bin_division)
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\tfr\reassignment.py", line 298, in pitchgram
output_frame_size, PitchTransform(bin_range, bin_division), magnitudes=magnitudes)
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\tfr\reassignment.py", line 144, in reassigned
self.signal_frames.sample_rate)
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\tfr\reassignment.py", line 50, in transform_freqs
output_bin_count = (self.bin_range[1] - self.bin_range[0]) * self.bin_division
TypeError: 'int' object is not subscriptable

I checked reassignment.py in tfr, and I guess that problem could be in line 35
def __init__(self, bin_range = (-48, 67), bin_division=1, tuning=Tuning())
because print(type(bin_range)) gives me <class 'int'> , but it's initialised as tuple...

P.S. I'm running this on Windows, but I don't think that this is the root of evil.

Separate scripts for data preparation, training and evaluation.

Allow reading already written prepared data, transformers and model.

deploy in pure client-side via keras-js!

beatles time map - inverse polar mapping

In the classic time map the regularly spaces events are on the diagonal and accelerating/decelerating events are progressively more away the diagonal. The radius represents velocity and the angle from diagonal represents acceleration.

It might be more natural to transform this space via inverse polar mapping on the single quadrant. In this case the cartesian x axis would represent the velocity and the y axis the acceleration. y = 0 means steady velocity, y > 0 acceleration and y < 0 deceleration. It could be combined with logaritmic mapping of the radius.

Any quantization could be done after the inverse polar mapping.

Use Seaborn heatmap() instead of corrplot()

#38

http://seaborn.pydata.org/generated/seaborn.heatmap.html#seaborn.heatmap

bzamecnik / ml Goto Github PK

ml's Introduction

Machine learning projects

Projects

ml's People

Contributors

Stargazers

Watchers

Forkers

ml's Issues

Recommend Projects

Recommend Topics

Recommend Org