interactiveaudiolab / penn Goto Github PK

View Code? Open in Web Editor NEW

200.0 9.0 14.0 74.92 MB

Pitch Estimating Neural Networks (PENN)

License: MIT License

Python 97.39% Shell 2.61%

frequency music periodicity pitch speech voicing

penn's People

Contributors

Stargazers

Watchers

Forkers

maxmax2016 ishine shaun95 chenchy wgsh3706 fredadiv robert-dl sharvil mirror116 anthonio9 turian cameronchurchwell iamunr4v31

penn's Issues

Add warning that multichannel audio is not supported

#3 demonstrates that stereo audio throws an error. The solution is to convert to mono. A more descriptive error or warning would help.

Issue in Running in Colab with GPU, even same on Mac (M2) [MPS]

import penn 
audio, sample_rate = torchaudio.load('/content/try this.wav')
 
hopsize = .01
 
fmin = 30.
fmax = 1000.
 
gpu = 0
  
batch_size = 2048  
checkpoint = None
 
center = 'half-hop'
 
interp_unvoiced_at = .065
 
pitch, periodicity = penn.from_audio(
    audio,
    sample_rate,
    hopsize=hopsize,
    fmin=fmin,
    fmax=fmax,
    checkpoint=checkpoint,
    batch_size=batch_size,
    center=center,
    interp_unvoiced_at=interp_unvoiced_at,
    gpu=gpu)

CLI uses --files not --audio_files (as in README)

Nice work, I just read it this morning.

The CLI uses --files, whereas the docs say --audio_files.

(p.s. I'm curious if you have the results tables with for PTDB and MDB-STEM-SYNTH individually? Table II seems to indicate that the scores are mixed.)

Need support for 0-centered frames and support to sample_rate 22.05kHz

It seems both pad=True and pad=False are not zero centered. When Pad=True, the first frame starts from -(winsz-hopsz)//2. instead of -winsz//2.

When using this model for audio at sample rate of 22.05kHz at a hop size of 256, the rounding in the time_to_samples causes the audio hop size to be inaccurate. causing the number of frames to be bigger/smaller than what the hopsize field indicates.

Steps to reproduce the reported RPA accuracy of 98%

Hi,

I'm working on a project similar to yours, but solely focused on guitar pitch recognition. To have a better look into penn models training I've integrated Weights & Biases into the project, checkout my fork. I'm pretty sure the only thing I've changed in the config is the LOG_INTERVAL value, by setting it to 500, however the training and validation accuracy reported during the training oscillate around 50%, similar results are reported by the evaluation done after the model is trained.

The figures below are the result of my take at training the fcnf0++ model from scratch.

Training accuracy reported every epoch

Validation accuracy reported every epoch

Training loss

The estimated performance is included in the overall.json file generated by the training script.

It's clear that I'm missing something, do you have any advice on steps to achieve the best results, perhaps some issues in my take that are obvious? I tried to follow the README instructions, download, preprocess and partition the mdb and ptdb datasets according to fcnf0++ config, then run the training. In the overal.json file it's reported that evaluation on mdb reaches around 60% and ptdb only around 20%.

overall.json

RuntimeError                              Traceback (most recent call last)
[<ipython-input-5-487bd8f6f6cf>](https://localhost:8080/#) in <module>
     29 
     30 # Infer pitch and periodicity
---> 31 pitch, periodicity = penn.from_audio(
     32     audio,
     33     penn.SAMPLE_RATE,

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py](https://localhost:8080/#) in _conv_forward(self, input, weight, bias)
    307                             weight, bias, self.stride,
    308                             _single(0), self.dilation, self.groups)
--> 309         return F.conv1d(input, weight, bias, self.stride,
    310                         self.padding, self.dilation, self.groups)
    311 

RuntimeError: Given groups=1, weight of size [256, 1, 32], expected input[2048, 2, 993] to have 1 channels, but got 2 channels instead

I'm getting this with both CPU and GPU inference, and having installed both via pip and having cloned from Github. Do you know what might be the problem?

interactiveaudiolab / penn Goto Github PK

penn's People

Contributors

Stargazers

Watchers

Forkers

penn's Issues

Add warning that multichannel audio is not supported

Issue in Running in Colab with GPU, even same on Mac (M2) [MPS]

CLI uses --files not --audio_files (as in README)

Need support for 0-centered frames and support to sample_rate 22.05kHz

Steps to reproduce the reported RPA accuracy of 98%

Batch inference is misaligned after thousands of files

Where can we read your paper?

Annotating Vocals into MIDI

Getting RuntimeError when attempting to run inference

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent