aliutkus / speechmetrics Goto Github PK

View Code? Open in Web Editor NEW

887.0 887.0 152.0 28.87 MB

A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR

License: MIT License

Python 100.00%

speechmetrics's People

Contributors

Stargazers

Watchers

Forkers

jiaxp3144 xzm2004260 wikipedia2008 mpariente kingstorm lcolbois aixingxy dendisuhubdy bilaldendani yongyug iver56 dakami okrio ismallfish duswang 5l1v3r1 jakechal awoziji huangshincheng pigip yuzhongshanyue breizhn hyli666 maoxin7676 ronggan schmiph2 xiongmaoxia xiaozhuo12138 gitzephyr dachengai xinj3 jonashaag road2018 gatsbychen wenbozhangjs owen864720655 sdqdlgj zhaoforever zhangxinaaaa xiaoyubie1994 darlingwu lalimili6 zhunzhong spxnn noetits brandery lemnzhou mnabihali 1532541390 casezhao wangrui1203 irentang justforname ctwgl lujun111 knurpsbram emirdemirel ninianhong jihwanparkpreprocessing gaoyiyeah liu-ioa wonderwrj lupengliu racxo88 eagomez2 zhangshengoo wanbiaw poliuti nxdzyl lijiong20 wanjie1412 mortyzhou-shef-bit chenchy haoheliu runngezhang zuowanbushiwo youngjay0612 normonisping sx-tts dipanja wantt rankyhong dttlgotv zcy618 benjsta lewistrong bob-hu bigsealing satyakamacodes athrunchen welsun negitr pzhang266 iastre-mar techthiyanes caa23187 simpleishappy gehaoyuuuuu alex-songs ooshyun

speechmetrics's Issues

the range of mosnet and srmr？

I was wondering the exact range of mosnet and srmr ,cuz I have seen few utterances got a result which is larger than 5 ,even up to 8.xx. Really appreciate your answer！🌼

Hi, just crossed this nice package and would like to use it in some of our projects, do you think it would be possible to have it also on PyPI not only installed from URL the main advantage is that with PyPI it can be cached from URL it has to be always installed regardless it already in site-packages... BTW, the name on pypi seems to be still available 🐰

How to comprehend output?

Hi
First, the metric is super cool, it saved me from downloading each of the metrics separately. Thanks!
Also I wanted to know how to comprehend the output. It would be great if you add that to the Readme file.
Here is the output from two of the files in your dataset, could you elaborate on the results, as in what does high positive or negative value or close to zero mean?

reference = 'data/m2_script1_produced.wav'
ditorted = 'data/m2_script1_clean.wav'

{'mosnet': array([[5.0981326]], dtype=float32),
'srmr': 4.653473083972128}
{'sdr': array([[-0.39609285]]),
'isr': array([[0.24738725]]),
'sar': array([[-0.37060632]]), '
pesq': 4.354660987854004,
'sisdr': -14.740691053217517,
'stoi': 0.9718856108717927}

Stack overflow occurred when running relative indicators

Failed to install speechmetrics following steps in Installation section

Hello everyone,
I am following steps on the README to install speechmetrics however I am facing the following errors:
(base) [bilal@Fedora ~]$ pip install git+https://github.com/aliutkus/speechmetrics#egg=speechmetrics[cpu]
Collecting speechmetrics[cpu] from git+https://github.com/aliutkus/speechmetrics#egg=speechmetrics[cpu]
Cloning https://github.com/aliutkus/speechmetrics to /tmp/pip-install-3ibjsrho/speechmetrics
Running command git clone -q https://github.com/aliutkus/speechmetrics /tmp/pip-install-3ibjsrho/speechmetrics
error: RPC failed; curl 18 transfer closed with outstanding read data remaining
fatal: the remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
ERROR: Command "git clone -q https://github.com/aliutkus/speechmetrics /tmp/pip-install-3ibjsrho/speechmetrics" failed with error code 128 in None

Can't install this package according to the steps in the readme

When I follow the steps in the readme I am unable to install this package.

conda create --name myenv python=3.7
pip install numpy
pip install git+https://github.com/aliutkus/speechmetrics#egg=speechmetrics[cpu]

The error I get is as follows:

Collecting speechmetrics[cpu]
  Cloning https://github.com/aliutkus/speechmetrics to /tmp/pip-install-niosg_5m/speechmetrics_e8b8c8981b054e5abf0eb066869fb2be
Requirement already satisfied: numpy in /home/bram/miniconda3/envs/myenv/lib/python3.6/site-packages (from speechmetrics[cpu]) (1.19.5)
Collecting gammatone@ git+https://github.com/detly/gammatone
  Cloning https://github.com/detly/gammatone to /tmp/pip-install-niosg_5m/gammatone_895b5433ba3942b89e09332d77220272
Collecting pypesq@ git+https://github.com/vBaiCai/python-pesq
  Cloning https://github.com/vBaiCai/python-pesq to /tmp/pip-install-niosg_5m/pypesq_10d103a6483e4b49baf71e1a6c5e1860
Collecting srmrpy@ git+https://github.com/jfsantos/SRMRpy
  Cloning https://github.com/jfsantos/SRMRpy to /tmp/pip-install-niosg_5m/srmrpy_bdd74c3ba26f435a9e50257e2b5ae2ee
Collecting Gammatone@ https://github.com/detly/gammatone/archive/master.zip#egg=Gammatone
  Using cached https://github.com/detly/gammatone/archive/master.zip
INFO: pip is looking at multiple versions of speechmetrics to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of <Python from Requires-Python> to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of pypesq to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of gammatone to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of speechmetrics[cpu] to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install speechmetrics, speechmetrics==1.0 and speechmetrics[cpu]==1.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    speechmetrics[cpu] 1.0 depends on gammatone 1.0 (from git+https://github.com/detly/gammatone)
    speechmetrics 1.0 depends on gammatone 1.0 (from git+https://github.com/detly/gammatone)
    srmrpy 1.0 depends on gammatone 1.0 (from https://github.com/detly/gammatone/archive/master.zip#egg=Gammatone)

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

Where are the versions specified and can they be updated to make the install commands work? Any other tips are welcome as well.

License

I was hoping to use this repository for a commercial project. Would it be possible for you to add a license to this repository that would allow it?

Really awesome repository by the way!

Can you list what the parameters of each function are?

such as:
metrics = speechmetrics.load('bsseval' , window_length)
scores = metrics(?, ?)

metrics = speechmetrics.load('nb_pesq', window_length)
scores = metrics(?, ?)

metrics = speechmetrics.load('stoi',, window_length)
scores = metrics(?, ?)
......

MOSNet and SRMR for evaluating enhanced speech

Hi,
I am asking about the two absolute metrics MOSNet and SRMR. Can I use these absolutes metrics to evaluate enhanced speech resulted from deep learning based models?

ERROR：0xC00000FD: Stack overflow

python out:
Loaded speechmetrics.relative.nb_pesq
Loaded speechmetrics.relative.pesq
Loaded speechmetrics.relative.sisdr
Loaded speechmetrics.relative.stoi

vs dubeg it:
0x00007FFD86C9FA07 (pesq_core.cp38-win_amd64.pyd) (python.exe 中)处有未经处理的异常: 0xC00000FD: Stack overflow (参数: 0x0000000000000001, 0x000000C2BF203000)。

Wide-band PESQ instead of Narrow band?

Hey,

I wonder what you would think about making the WB PESQ from here the default in speech_metrics.
This replicates the results from Loizou's Matlab code.

We could still keep the current pesq under raw_pesq or something.
I'm willing to make a PR if needed.

Non existing "tensorflow==2.0.0"

The issue is that there is no longer "tensorflow==2.0.0".
I ended up installing straight from the stable packages of tensorflow here: link
Note: after a brief check I noticed: the code works with tensorflow 2.3.0 [at least mosnet], and by installing tensorflow-cpu==2.3.0 Everything works well.

cant not print

a,sr=sf.read('E:/speech/sliced_test_clean/S_01_01.wav')
b,sr=sf.read('E:/speech/sliced_test_-5/S_01_01.wav')
score=pesq(a,b,sr)
print(score)
hello,my code like this but cant print the score,i dont konw why.

Feature request: Add support for ViSQOL

Thanks for making speechmetrics!

Here's a candidate implementation of ViSQOL: https://github.com/google/visqol

Why does each file have 4 elements, and why do 8 files have the same elements?

No argument to specify GPU_id when make absolute metrics?

Hi, Is there any argument to specify GPU id? the default device may be out of memory.

Can these methods be applied to voip?

Hi
Which method is most used in voip usage scenarios？ has a non-intrusive method ？
thanks!

Can I train MOsnet myself?

Poetry solverproblem when trying to install both `speechmetrics` and `pysepm`

Hi,
First and foremost; thank you for this amazing code! 🔥,

I want to install both pysepm and speechmetrics in my poetry env; however, I get this error:

$ poetry add git+https://github.com/aliutkus/speechmetrics.git

Updating dependencies
Resolving dependencies... (24.7s)

  SolverProblemError

  Because speechmetrics (rev master) depends on srmrpy (branch master)
   and pysepm (rev master) depends on SRMRpy (1.0), speechmetrics (rev master) is incompatible with pysepm (rev master).
  So, because semi-guided-speech depends on both pysepm (branch master) and speechmetrics (branch master), version solving failed.

  at /usr/lib/python3.10/site-packages/poetry/puzzle/solver.py:241 in _solve

Why is there a tight dependency on the master/branch version of SRMRpy?

speechmetrics/setup.py

Line 23 in 6e15429

'srmrpy @ git+https://github.com/jfsantos/SRMRpy',

fyi: this is the setup.py of pysepm:

https://github.com/schmiph2/pysepm/blob/3c3f35ef5846d0e976adbc9d72469c3d4ae99a4f/setup.py#L17

An installation problem about 'speechmetrics' repository

Hi,I failed to install your speechmetrics on windows neither python3 nor python2 yesterday，and I also tried on ubuntu but failed again.Could you tell me whats the problem and how I can solve it?Thanks!
there are no issues button in your 'speechmetrics' repository ,so I raise the question there

What should the window setting of SISDR be?

I understand that the window is the maximum value of this metric. so the window of STOI is 1 and the window of PESQ is 5, but what should the window setting of SISDR be?

ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH?

how to handle this problem when install

the score of SRMR is bigger

Hi, when I use the SRMR to evaluate the quality of audio, sometime the score is bigger than 1 and the average score of 2430 audio is 2.23. Do you have any idea about this?

Is this going to depend on `bsseval`

Thanks for the wrapper, it's a nice idea !
Is speechmetrics going to depend on bsseval once it is done?
Which would also mean that speechmetrics will include SI-SDR? That would be really practical to share the same metrics in the community.

does this code can be adapt to Mac?

example/test.py

test.py: scores = metrics(reference, test)
example_use: scores = metrics(path_to_estimate_file, path_to_reference)

Which of the above two is correct?

'str' object has no attribute 'decode'

I follow the examples and get something wrong,
here are my codes

import speechmetrics
window_length = 5 # seconds
metrics = speechmetrics.load('absolute', window_length)
scores = metrics("/Wave/000001.wav")
print(scores)

and I get errors like this:

'str' object has no attribute 'decode'
  File "/home/mike/testaudioquality/test.py", line 3, in <module>
    metrics = speechmetrics.load('absolute', window_length)

the versions of my python is 3.6 and my tensorflow-gpu version is 2.0.0

setup.py installation error: `tensorflow-gpu` package has been removed.

The installation command in the readme throws an error, because setup.py is trying to build the tensorflow-gpu package, which has been removed from pip.

From their pypi page:

tensorflow-gpu has been removed. Please install tensorflow instead.

As of December 2022, tensorflow-gpu has been removed and has been replaced with this new, empty package that generates an error upon installation.

Installing tensorflow separately does not help, as setup.py always looks for the empty tensorflow-gpu package instead.

Keras model error

When running test.py, I get the following error :

Trying ABSOLUTE metrics: 
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    metrics = sm.load('absolute', window)
  File "/home/mparient/code_perso/cloned/speechmetrics/speechmetrics/__init__.py", line 151, in load
    new_metric = load_function(window)
  File "/home/mparient/code_perso/cloned/speechmetrics/speechmetrics/absolute/mosnet/__init__.py", line 22, in load
    mosnet = MOSNet(window, hop)
  File "/home/mparient/code_perso/cloned/speechmetrics/speechmetrics/absolute/mosnet/model.py", line 36, in __init__
    padding='same'))(re_input)
  File "/home/mparient/.virtualenvs/speechmetrics/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 817, in __call__
    self._maybe_build(inputs)
  File "/home/mparient/.virtualenvs/speechmetrics/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 2141, in _maybe_build
    self.build(input_shapes)
  File "/home/mparient/.virtualenvs/speechmetrics/lib/python3.6/site-packages/tensorflow_core/python/keras/layers/convolutional.py", line 153, in build
    raise ValueError('The channel dimension of the inputs '
ValueError: The channel dimension of the inputs should be defined. Found `None`.

Installing the b1 version (as specified in the original repo) doesn't solve the problem. Any idea?

Process finished with exit code -1073741571 (0xC00000FD)

why？

Inconsistency between museval and speechmetrics-bsseval

I wrote the following code to compare the behavior between museval and speechmetrics-bsseval.

from museval.metrics import bss_eval
import speechmetrics as sm
import numpy as np

metrics = sm.load(['bsseval'],window=1)

ref = np.random.randn(1, 44100*3, 2)  # [nsrc, nsample, channel], a single audio source with two channels 
est = np.random.randn(1, 44100*3, 2)

res = bss_eval(ref,est,window=44100,hop=44100)

bsseval = metrics(est[0,...],ref[0,...],rate=44100)

print(res)

print(bsseval)

It output the following:

Loaded  speechmetrics.relative.bsseval
(array([[-3.02169448, -2.98148236, -3.01738321]]), array([[-0.03463801, -0.03900151, -0.0400294 ]]), array([[inf, inf, inf]]), array([[-21.09888836, -21.01320054, -21.05034071]]), array([[0]]))
{'sdr': array([[-2.99676764, -2.98088619, -2.99560498],
       [-3.04682562, -2.98208233, -3.03924334]]), 'isr': array([[-0.01493135, -0.01706893, -0.01804728],
       [-0.02410121, -0.02879832, -0.0266307 ]]), 'sar': array([[-21.00349928, -20.94294041, -20.91428113],
       [-21.19664823, -21.08528379, -21.18806085]])}

It seems that speechmetrics treat two channels as two sources.

Change the following code in bsseval.py:16, the problem would be solved.

result = bss_eval(reference_sources=audios[1][None,...], # shape: [nsrc, nsample, nchannels]
                estimated_sources=audios[0][None,...],
                window=self.bss_window * rate,
                hop=self.bss_hop * rate)

Multiple Values for Single Metric?

Hello. Thank you for the efforts in this all-in-one package.
I am mainly using MOSNet and SRMR now, as they are non-intrusive (absolute).

But I find that more than 1 value are output from the codes for each metric, even I only load 1 audio file for evaluation.
Here is an example of Python output:
{'mosnet': array([2.75408196, 3.04858017, 3.26394176]), 'srmr': array([10.25382248, 7.35339144, 8.33086446]), 'stoi': array([ 0.09140952, -0.08605568, -0.0059758 ])}
You can see there are 3 values for each metric. And sometimes I got only 1 or 2 for each.

I am quite sure my input audio is an 1d array as numpy.shape(audio) gives (101984,)
Is it because of the length of input audio?

Also please advise why SRMR returns large number while in your introduction it is between 0 to 1 with 1 the best?
Thanks!

Why is the MOSNET value not 5 when two files are the same

I pointed both the reference file and the evaluation file to the same audio file and found that the MOSNET value is 2.6, which is unscientific, what went wrong?