gudgud96 / frechet-audio-distance Goto Github PK
View Code? Open in Web Editor NEWA lightweight library for Frechet Audio Distance calculation.
License: MIT License
A lightweight library for Frechet Audio Distance calculation.
License: MIT License
So far this library includes Frechet Audio Distance and CLAP score metrics.
We are looking for help implementing also the Kullback Leibler divergence.
Can u used for idnetiification if 2 sounds are same person?
Hi there,
Thanks for making this!
I have some issues when calculating the score for wav's with 44.1k samples. The error that is trown is:
exception thrown, Input signal length=2 is too small to resample from 44100->16000
I've tried to replicate the loading process as done in this repo.
wav, sr = sf.read(file_name, dtype='int16')
But this only returns an array with mostly zero's.
The wav's are not corrupted as far as I can see.
Hello,
Thank you for providing this tool! It's great!
If you are interested in improvement suggestions, I would like to suggest the following:
vggish
the model checkpoint is downloaded automatically, but it is not the case with pann
.FileNotFoundError: [Errno 2] No such file or directory: 'mypath/.cache/torch/hub/Cnn14_16k_mAP%3D0.438.pth'
Again, thank you for your work!
As mentioned in fcaspe/ddx7#1, I am unable to replicate the FAD score to a satisfactory level yet as reported in the paper.
Need further investigation on whether the diff is due to inherent implementation diffs compared to the Google version, or diffs outside of FAD calculation. Hence I decide to look into some major works to do a more detailed benchmark of the FAD scores reported VS calculated here. Candidates to be listed (will start with DDX7), paper suggestions are welcomed.
So far this library includes Frechet Audio Distance and CLAP score metrics.
We are looking for help implementing also the Inception Score.
We are in dire need of proper CI! Given the amount of content we have now in this repo, and the importance of correctness in calculating each score, CI is crucial for being a safety net for further deployment and iterations.
We do have extensive test examples in notebooks, but the idea is to ensure that everything is right and things don't break during PR merge.
Right now we have one (only one...) which is the VGGish unit test. This can be replicated for other embedding types. The only blocker I faced previously is that model downloading is extremely slow for some models (e.g. PANN), and CI would run forever. But, I would believe that well-known packages like EnCodec and CLAP should do fine.
Thanks for the great work!
I'm just wondering if the transformers version <=4.30 should be necessary or not.
This is because there are newer audio models (e.g. generative model or other models like Encodec in transformers) that are only available in newer versions of transformers.
Is there potential for conflict between the newer version of transformers and your implementation?
As we have more and more pull requests coming in, I need to setup proper CI for testing >.<
The unit test folders are in place, I just need to prepare Github Actions for automated CI.
Recently, there are a few papers that calculate FAD score using alternatives of VGG-ish model, such as:
MusicLM - uses (1) Trill2 (Shor et al., 2020) and (2) VGGish3 (Hershey et al., 2017) trained on YouTube-8M audio event dataset (Abu-El-Haija et al., 2016).
AudioLDM - uses PANN (Kong et al., 2020b)
A clear list of models to be supported is to be determined depending on the demand of the research community, but we should try to support Trill2 and PANN first. Basically, we need to:
(i) source for open-source PyTorch checkpoints for these models
(ii) refactor code to abstract out audio embedding calculation
Floating-point PCM WAV files are currently not handled correctly. Such files are not very common, but are created when passing float32 numpy arrays to scipy.io.wav.write()
or torchaudio.save()
.
What currently happens on loading is this:
wav_data, sr = sf.read(fname, dtype='int16')
wav_data
will usually only contain -1, 0, and +1 samples), in line with the documentation of soundfile.read()
.assert wav_data.dtype == np.int16, 'Bad sample type: %r' % wav_data.dtype
wav_data = wav_data / 32768.0 # Convert to [-1.0, +1.0]
One solution would be to not pass a dtype
argument to soundfile.read()
and then convert whatever comes back, something like:
wav_data, sr = sf.read(fname)
if np.issubdtype(data.dtype, np.floating):
wav_data = np.asarray(wav_data, dtype=np.float32)
else:
wav_data = np.divide(wav_data, np.iinfo(wav_data.dtype).max, dtype=np.float32)
Another solution would be to pass dtype=np.float32
to soundfile.read()
. Was there a specific reason against this route? With an example OGG file, I've seen that dtype=np.float32
produced values between -1.2 and +1.4, while dtype=np.int16
scaled it by 32767 and clipped too high and too low values. Not sure if the latter is preferable in any case.
Hello, I was trying to use the Vggish model, but got the following error. It seems Vggish is not available on PyTorch Hub.
UserWarning: You are about to download and run code from an untrusted repository. In a future release, this won't be allowed. To add the repository to your trusted list, change the command to {calling_fn}(..., trust_repo=False) and a command prompt will appear asking for an explicit confirmation of trust, or load(..., trust_repo=True), which will assume that the prompt is to be answered with 'yes'. You can also use load(..., trust_repo='check') which will only prompt for confirmation if the repo is not already trusted. This will eventually be the default behaviour
warnings.warn(
Downloading: "https://github.com/harritaylor/torchvggish/zipball/master" to /Users/zihaohe/.cache/torch/hub/master.zip
Traceback (most recent call last):
File "", line 1, in
File "/Users/zihaohe/miniconda3/lib/python3.9/site-packages/torch/hub.py", line 555, in load
repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, trust_repo, "load",
File "/Users/zihaohe/miniconda3/lib/python3.9/site-packages/torch/hub.py", line 230, in _get_cache_or_reload
download_url_to_file(url, cached_file, progress=False)
File "/Users/zihaohe/miniconda3/lib/python3.9/site-packages/torch/hub.py", line 611, in download_url_to_file
u = urlopen(req)
File "/Users/zihaohe/miniconda3/lib/python3.9/urllib/request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "/Users/zihaohe/miniconda3/lib/python3.9/urllib/request.py", line 523, in open
response = meth(req, response)
File "/Users/zihaohe/miniconda3/lib/python3.9/urllib/request.py", line 632, in http_response
response = self.parent.error(
File "/Users/zihaohe/miniconda3/lib/python3.9/urllib/request.py", line 555, in error
result = self._call_chain(*args)
File "/Users/zihaohe/miniconda3/lib/python3.9/urllib/request.py", line 494, in _call_chain
result = func(*args)
File "/Users/zihaohe/miniconda3/lib/python3.9/urllib/request.py", line 747, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/Users/zihaohe/miniconda3/lib/python3.9/urllib/request.py", line 523, in open
response = meth(req, response)
File "/Users/zihaohe/miniconda3/lib/python3.9/urllib/request.py", line 632, in http_response
response = self.parent.error(
File "/Users/zihaohe/miniconda3/lib/python3.9/urllib/request.py", line 561, in error
return self._call_chain(*args)
File "/Users/zihaohe/miniconda3/lib/python3.9/urllib/request.py", line 494, in _call_chain
result = func(*args)
File "/Users/zihaohe/miniconda3/lib/python3.9/urllib/request.py", line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Unavailable
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.