Giter Club home page Giter Club logo

penn's People

Contributors

caedonhsieh avatar maxrmorrison avatar nathanpruyne avatar sharvil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

penn's Issues

Issue in Running in Colab with GPU, even same on Mac (M2) [MPS]

import penn 
audio, sample_rate = torchaudio.load('/content/try this.wav')
 
hopsize = .01
 
fmin = 30.
fmax = 1000.
 
gpu = 0
  
batch_size = 2048  
checkpoint = None
 
center = 'half-hop'
 
interp_unvoiced_at = .065
 
pitch, periodicity = penn.from_audio(
    audio,
    sample_rate,
    hopsize=hopsize,
    fmin=fmin,
    fmax=fmax,
    checkpoint=checkpoint,
    batch_size=batch_size,
    center=center,
    interp_unvoiced_at=interp_unvoiced_at,
    gpu=gpu)
image

CLI uses --files not --audio_files (as in README)

Nice work, I just read it this morning.

The CLI uses --files, whereas the docs say --audio_files.

(p.s. I'm curious if you have the results tables with for PTDB and MDB-STEM-SYNTH individually? Table II seems to indicate that the scores are mixed.)

Need support for 0-centered frames and support to sample_rate 22.05kHz

It seems both pad=True and pad=False are not zero centered. When Pad=True, the first frame starts from -(winsz-hopsz)//2. instead of -winsz//2.

When using this model for audio at sample rate of 22.05kHz at a hop size of 256, the rounding in the time_to_samples causes the audio hop size to be inaccurate. causing the number of frames to be bigger/smaller than what the hopsize field indicates.

Steps to reproduce the reported RPA accuracy of 98%

Hi,

I'm working on a project similar to yours, but solely focused on guitar pitch recognition. To have a better look into penn models training I've integrated Weights & Biases into the project, checkout my fork. I'm pretty sure the only thing I've changed in the config is the LOG_INTERVAL value, by setting it to 500, however the training and validation accuracy reported during the training oscillate around 50%, similar results are reported by the evaluation done after the model is trained.

The figures below are the result of my take at training the fcnf0++ model from scratch.

Training accuracy reported every epoch
image

Validation accuracy reported every epoch
image

Training loss
image

The estimated performance is included in the overall.json file generated by the training script.

It's clear that I'm missing something, do you have any advice on steps to achieve the best results, perhaps some issues in my take that are obvious? I tried to follow the README instructions, download, preprocess and partition the mdb and ptdb datasets according to fcnf0++ config, then run the training. In the overal.json file it's reported that evaluation on mdb reaches around 60% and ptdb only around 20%.

overall.json

Getting RuntimeError when attempting to run inference

Hi team - can't wait to try this! I'm getting the following RuntimeError when trying to run inference with the pretrained model:

RuntimeError                              Traceback (most recent call last)
[<ipython-input-5-487bd8f6f6cf>](https://localhost:8080/#) in <module>
     29 
     30 # Infer pitch and periodicity
---> 31 pitch, periodicity = penn.from_audio(
     32     audio,
     33     penn.SAMPLE_RATE,

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py](https://localhost:8080/#) in _conv_forward(self, input, weight, bias)
    307                             weight, bias, self.stride,
    308                             _single(0), self.dilation, self.groups)
--> 309         return F.conv1d(input, weight, bias, self.stride,
    310                         self.padding, self.dilation, self.groups)
    311 

RuntimeError: Given groups=1, weight of size [256, 1, 32], expected input[2048, 2, 993] to have 1 channels, but got 2 channels instead

I'm getting this with both CPU and GPU inference, and having installed both via pip and having cloned from Github. Do you know what might be the problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.