Light

atrcheema / seqmetrics Goto Github PK

View Code? Open in Web Editor NEW

14.0 3.0 2.0 4 MB

Various errors for tabular/structured/time-series data

Home Page: https://seqmetrics.readthedocs.io

License: GNU General Public License v3.0

Python 78.45% TeX 0.93% Jupyter Notebook 20.62%

performance-metrics errors score loss

seqmetrics's People

Contributors

Stargazers

Watchers

Forkers

mrakhokhar asadmalic

seqmetrics's Issues

[REVIEW]: SeqMetrics

This is a longer review with explicit comments to the paper and the software. I have subdivided into different categories related to different aspects of the paper + software. Bullet points that I have marked with (Important) is currently blocking issues for me to recommend this paper being accepted at JOSS.

Documentation

Overall the documentation is clear and concise. The API is clearly laid out and there are examples included. Some minor suggestions for improvements in the near future:

You mention that easy_mpl is needed for plotting the metrics. However, it is not mentioned in the documentation or README that you can actually install this by writing pip install SeqMetrics[all]. Please add these additional install instructions.
Consider for docs for plot_metrics utility function to not only show the api but also the corresponding plot. See this page for how to embed plots in sphinx.
Documentation for the class based API is essentially missing: https://seqmetrics.readthedocs.io/en/latest/rgr.html#SeqMetrics.RegressionMetrics. I know this is because it is just calling the functional API but it would then be great if there was a reference per metric to there functional counterpart.

References

The paper mentions multiple frameworks for comparison and clearly lay out that SegMetrics core value is having more metrics than any other package. I would have preferred if the authors had a more nuanced view of the other frameworks e.g. there are features that metrics package from keras or torchmetrics have that SegMetrics does not. I do not consider this a huge problem, but worth considering.

Additionally, in the README.md of the project there are multiple related projects mentioned at the bottom that are not included in the paper ( forecasting_metrics, hydroeval etc). I would like to ask the authors why these are not reference in the paper.

On the other hand, all the frameworks mentioned in the paper are not listed in the related section on the README.md. Again, minor stuff.

App

Overall adding such application really makes SegMetrics easy to use even for non-programmers. Really nice. I would like the authors to consider the following problems:

The app should be better documented, especially for instructions for typing/pasting values. From the code I can see that a comma separated list is expected, but this is not clear from the instructions. A simple numpy array does not work for example. Including fig2 and fig3 to the documentation and README file would definitely help.
Since it is a simple streamlit app that users can deploy themself without too much hassle, I really think the authors should consider adding instructions on how the app can be deployed by users locally (lets say that I do not trust streamlit servers with my data but still want the nice interface). This probably requires a bit of refactoring of the repository to include the app in the src directory and the addition of pip install SegMetrics[app] option for installing. Additionally, the paper should be updated to reflect that the webinterface can be self hosted.

Robustness

The paper uses the word robust or robustness 7 times. I therefore went into this thinking that the kind of unittesting that the authors have implemented would be excellet. However, I am a bit underwelmed by the authors unittesting. As someone that is working in the exact same space, I cannot stress how important testing that a given metric is computing the correct result is. If it is not computing the correct result the core value of such package vanishes.
As an example of the problems I have:

SeqMetrics/tests/test_rgr.py

Lines 231 to 239 in f1b8858

 def test_r2(self): 

 new_r2 = r2(t11, p11) 

 assert np.allclose(new_r2, 0.0003276772244559177) 

 return 

 def test_nse(self): 

 new_nse = nse(t11, p11) 

 assert np.allclose(new_nse, -1.068372251749874) 

 return

Here two different metrics are tested for a single given input. However, it is not clear at all why these tests are actually checking that implementation is correct. For comparison here is some of the code from scikit-learn testing the r2_score metric:
https://github.com/scikit-learn/scikit-learn/blob/fad66eb8296d0d7da27e041d78a8edff5b7d4361/sklearn/metrics/tests/test_regression.py#L187-L220
I therefore really recommend the authors to:

(Important) Implement unit testing against other frameworks whenever possible. The authors are already doing this for certain classification metrics (

SeqMetrics/tests/test_cls.py

Lines 11 to 12 in f1b8858

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

from sklearn.metrics import confusion_matrix, balanced_accuracy_score

), however it should be done for most metrics. Metrics where there is no reference implementation to compare against should be tested for multiple values and preferably it should be more clear where the reference values comes from.

When the unittests are improve then a general improvement to the CI would be needed:

(Important) Currently only Python 3.7 is tested, which is officially end-of-life. Either run CI that checks multiple versions of Python or at the very least a newer supported version
Test only run on ubuntu right now and no other major OS. I recommend that the authors either add test for other OS or explicitly state what OS are supported in there README.md
(Important) Because the CI only uses Python 3.7 the actual numpy version being tested is numpy-1.21.6, which is around 2 years old at this point. I see this as a overall consequence that the authors have not included some upper/lower bounds on supported numpy/scipy versions in the requirements file.

I want to stress that the reason I am nitpicking on this is because the authors are claiming that the framework is very robust in the paper which is a claim not fully supported by the code.

Other

(Important) Missing community guidelines: are contributions welcome? what should a contribution look like etc.?
(nitpicking) In fig1 of the paper the authors it is redundant to say "class-based api" both at the bottom and top of the figure (same goes for functional). Only mentioning this once should be enough?
(nitpicking) The overall resolution (pixel-wise) of figures in the paper is on the lower side and could be increase to help the readability of the text in the figures

error at line 533 of TSErrors.py

Hi,

I believe the definition of MARE is not correct.

This should change from (ae) to (are), it means the absolute of relative errors should be taken into consideration.

Bests

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	def test_r2(self):
	new_r2 = r2(t11, p11)
	assert np.allclose(new_r2, 0.0003276772244559177)
	return

	def test_nse(self):
	new_nse = nse(t11, p11)
	assert np.allclose(new_nse, -1.068372251749874)
	return

	from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
	from sklearn.metrics import confusion_matrix, balanced_accuracy_score