interactiveaudiolab / penn Goto Github PK
View Code? Open in Web Editor NEWPitch Estimating Neural Networks (PENN)
License: MIT License
Pitch Estimating Neural Networks (PENN)
License: MIT License
#3 demonstrates that stereo audio throws an error. The solution is to convert to mono. A more descriptive error or warning would help.
import penn
audio, sample_rate = torchaudio.load('/content/try this.wav')
hopsize = .01
fmin = 30.
fmax = 1000.
gpu = 0
batch_size = 2048
checkpoint = None
center = 'half-hop'
interp_unvoiced_at = .065
pitch, periodicity = penn.from_audio(
audio,
sample_rate,
hopsize=hopsize,
fmin=fmin,
fmax=fmax,
checkpoint=checkpoint,
batch_size=batch_size,
center=center,
interp_unvoiced_at=interp_unvoiced_at,
gpu=gpu)
Nice work, I just read it this morning.
The CLI uses --files, whereas the docs say --audio_files.
(p.s. I'm curious if you have the results tables with for PTDB and MDB-STEM-SYNTH individually? Table II seems to indicate that the scores are mixed.)
It seems both pad=True and pad=False are not zero centered. When Pad=True, the first frame starts from -(winsz-hopsz)//2. instead of -winsz//2.
When using this model for audio at sample rate of 22.05kHz at a hop size of 256, the rounding in the time_to_samples causes the audio hop size to be inaccurate. causing the number of frames to be bigger/smaller than what the hopsize field indicates.
Hi,
I'm working on a project similar to yours, but solely focused on guitar pitch recognition. To have a better look into penn
models training I've integrated Weights & Biases into the project, checkout my fork. I'm pretty sure the only thing I've changed in the config is the LOG_INTERVAL
value, by setting it to 500, however the training and validation accuracy reported during the training oscillate around 50%, similar results are reported by the evaluation done after the model is trained.
The figures below are the result of my take at training the fcnf0++
model from scratch.
Training accuracy reported every epoch
Validation accuracy reported every epoch
The estimated performance is included in the overall.json
file generated by the training script.
It's clear that I'm missing something, do you have any advice on steps to achieve the best results, perhaps some issues in my take that are obvious? I tried to follow the README instructions, download, preprocess and partition the mdb and ptdb datasets according to fcnf0++
config, then run the training. In the overal.json
file it's reported that evaluation on mdb reaches around 60% and ptdb only around 20%.
After a few thousand files of batch inference, the pitch and periodicity become misaligned. This is some sort of multiprocessing issue.
"Cross-domain Neural Pitch and Periodicity Estimation"
Can we export vocals into MIDI? Such as Omnizart Vocals ?
Hi team - can't wait to try this! I'm getting the following RuntimeError
when trying to run inference with the pretrained model:
RuntimeError Traceback (most recent call last)
[<ipython-input-5-487bd8f6f6cf>](https://localhost:8080/#) in <module>
29
30 # Infer pitch and periodicity
---> 31 pitch, periodicity = penn.from_audio(
32 audio,
33 penn.SAMPLE_RATE,
[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py](https://localhost:8080/#) in _conv_forward(self, input, weight, bias)
307 weight, bias, self.stride,
308 _single(0), self.dilation, self.groups)
--> 309 return F.conv1d(input, weight, bias, self.stride,
310 self.padding, self.dilation, self.groups)
311
RuntimeError: Given groups=1, weight of size [256, 1, 32], expected input[2048, 2, 993] to have 1 channels, but got 2 channels instead
I'm getting this with both CPU and GPU inference, and having installed both via pip and having cloned from Github. Do you know what might be the problem?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.