Giter Club home page Giter Club logo

ml's Introduction

ml's People

Contributors

bzamecnik avatar gf-michael avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ml's Issues

Make it easy to use the predicted pitch classes eg. in Sonic Visualizer

  • take a trained model
  • take an audio file
  • pre-process the audio feature
  • predict the labels
  • save the pre-frame class labels into a TSV file
  • save the pre-frame probability labels into a TSV file
  • convert labels to segments
  • store the segment labels into another TSV file

For frame labels the format can be like:

C   Db  D   Eb  E   F   Gb  G   Ab  A   Bb  B
1   0   0   0   1   0   0   1   0   0   0   0
0   0   1   0   0   1   0   1   0   0   0   1
[...]

Ie. a TSV file with header. The columns represent pitch classes. Each value is 0 or 1 indicating if the pitch class is predicted as active or not. For probability labels there can be a float between 0.0 and 1.0 (before thresholding).

For segment labels the format is like:

start   end C   Db  D   Eb  E   F   Gb  G   Ab  A   Bb  B
0.0 2.612267    0   0   0   0   0   0   0   0   0   0   0   0
2.612267    11.45907    0   0   0   0   1   0   0   0   1   0   0   1

The frames as collapsed to time intervals. Each interval is represented by start and end time (in float seconds). The segments should be ordered increasingly and should not overlap. Ideally they should contain no holes (ie. be perfectly adjacent to each other).

TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

I am currently trying to run the program with Python3 on my Mac OSX. But as I run it with the .sh script, I do not seem to be getting any output. In order to investigate more, I used a simple:

python3 predict.py sample.flac

This gave me the set of errors below:

/usr/local/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Traceback (most recent call last):
  File "predict.py", line 131, in <module>
    model = InstrumentClassifier(args.model_dir)
  File "predict.py", line 47, in __init__
    self.model = load_model_from_dir(model_dir)
  File "predict.py", line 44, in load_model_from_dir
    model_dir + '/model_arch.yaml',
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Could you please guide me as to the proper procedure to fix this error?

There's missing .py tools

There're import of tools that are missing on the repository. Could you please commit with these files??

For example:

"sys.path.append('../../tools/music-processing-experiments')" from ml-master/chord-recognition/notebooks/chord_classification_data_preparation.ipynb

There's no 'music-processing-experiments' folder....

predict.py dropping different errors

Hi! I'm trying to test prediction on wav and flac, but both of them drop me different errors: running wav file, I get

File "C:/dipl0m/ml-master/instrument-classification/predict.py", line 98, in predict_class_label x_features = self.load_features(audio_file)
File "C:/dipl0m/ml-master/instrument-classification/predict.py", line 135, in <module> print(model.predict_class_label(args.audio_file))
File "C:/dipl0m/ml-master/instrument-classification/predict.py", line 87, in load_features
x_features = self.ch.transform(x0)
AttributeError: 'dict' object has no attribute 'transform'

and on flac file

Traceback (most recent call last): File "C:/dipl0m/ml-master/instrument-classification/predict.py", line 135, in <module>
print(model.predict_class_label(args.audio_file))
File "C:/dipl0m/ml-master/instrument-classification/predict.py", line 98, in predict_class_label
x_features = self.load_features(audio_file)
File "C:/dipl0m/ml-master/instrument-classification/predict.py", line 83, in load_features
x, fs = sf.read(audio_file)
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\soundfile.py", line 374, in read
frames = f._prepare_read(start, stop, frames)
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\soundfile.py", line 1447, in _prepare_read
self.seek(start, SEEK_SET)
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\soundfile.py", line 870, in seek
_error_check(self._errorcode)
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\soundfile.py", line 1455, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Internal psf_fseek() failed.

I'm using Python 3.6.2 on Windows and the model from here. Could you help me to figure out the issue?

Prepare a library for loading and saving data

  • high-level tasks:
    • preparing training data (features + labels)
    • preparing data for prediction (features)
  • low-level tasks
    • audio recording to chromagram (frame-wise features)
    • frame-wise features to shape suitable for various DNN architectures:
      • individual frames
    • individual frames + 1D convolution
    • individual frames with context
    • individual frames with context + 2D convolution
    • sequences of frames (for LSTM)
    • sequences of frames (for LSTM) + 1D convolution
    • splitting dataset to training, validation and test splits
    • storing data
  • post-processing tasks
  • saving & loading models

visualize the chromagrams

Visualize the input feature matrices so that we can more easily understand what the network learns or why it has problems.

  • chromagram for each data point
  • average chromagram for each class

Compare a few approaches to binary pitch class vector classification

  • Input: continuous chroma vector (many octaves, values in dB).
  • Output: binary vector of activity of pitch classes (merged to single octave).

Approches:

  • single frame (no context) + 1D convolution net
  • fixed window to history (some context) + 2D convolution net
  • "unlimited" history + RNN (eg. LSTM)
  • bidirectional RNN (also future context)
  • CNN+RNN (?)

Evaluate the errors in pitch class vector prediction

  • overall metrics for the full dataset splits
    • accuracy (on binary classes, full chords)
    • hamming distance (on binary classes, individual tones)
    • AUC (on probabilities, more informative)
  • metrics on separate songs
    • how they differ?
  • confusion matrix (matrices)
    • 12x 2x2 - for each pitch class separately
    • 1x 4096x4096 - for whole pitch class vectors
  • plot a line of error per time frame
    • which places are more erroneous?
  • How to take into consideration class imbalance?
    • imbalance in tones vs. silence (1 vs. 0)
    • imbalance across pitch classes

extract_features.py gives very weird error.

Hi again! I've tried to run whole preprocessing, but stuck on extracting features. I prepare dataset in FLAC and run extract_features.py {AUDIO_DIR} {FEATURE_DIR} without any changes in code, and I got this:

Traceback (most recent call last): File "C:/dipl0m/ml-master/instrument-classification/extract_features.py", line 68, in <module>
args.block_size, args.hop_size, args.bin_range, args.bin_division)
File "C:/dipl0m/ml-master/instrument-classification/extract_features.py", line 36, in extract_pitchgrams
for i in range(len(dataset.samples)))
File "C:\Users\jackb\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\shape_base.py", line 421, in dstack
return _nx.concatenate([atleast_3d(_m) for _m in tup], 2)
File "C:\Users\jackb\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\shape_base.py", line 421, in <listcomp>
return _nx.concatenate([atleast_3d(_m) for _m in tup], 2)
File "C:/dipl0m/ml-master/instrument-classification/extract_features.py", line 36, in <genexpr>
for i in range(len(dataset.samples)))
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\tfr\sklearn.py", line 32, in transform
bin_division=self.bin_division)
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\tfr\reassignment.py", line 298, in pitchgram
output_frame_size, PitchTransform(bin_range, bin_division), magnitudes=magnitudes)
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\tfr\reassignment.py", line 144, in reassigned
self.signal_frames.sample_rate)
File "C:\Users\jackb\AppData\Roaming\Python\Python36\site-packages\tfr\reassignment.py", line 50, in transform_freqs
output_bin_count = (self.bin_range[1] - self.bin_range[0]) * self.bin_division
TypeError: 'int' object is not subscriptable

I checked reassignment.py in tfr, and I guess that problem could be in line 35
def __init__(self, bin_range = (-48, 67), bin_division=1, tuning=Tuning())
because print(type(bin_range)) gives me <class 'int'> , but it's initialised as tuple...

P.S. I'm running this on Windows, but I don't think that this is the root of evil.

beatles time map - inverse polar mapping

In the classic time map the regularly spaces events are on the diagonal and accelerating/decelerating events are progressively more away the diagonal. The radius represents velocity and the angle from diagonal represents acceleration.

It might be more natural to transform this space via inverse polar mapping on the single quadrant. In this case the cartesian x axis would represent the velocity and the y axis the acceleration. y = 0 means steady velocity, y > 0 acceleration and y < 0 deceleration. It could be combined with logaritmic mapping of the radius.

Any quantization could be done after the inverse polar mapping.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.