sepandhaghighi / pycm Goto Github PK
View Code? Open in Web Editor NEWMulti-class confusion matrix library in Python
Home Page: http://pycm.io
License: MIT License
Multi-class confusion matrix library in Python
Home Page: http://pycm.io
License: MIT License
I want to be able to add pycm as a dependency with conda.
Currently it seems to be available with pip, but I would like to install pycm with the following command:
conda install pycm
Thanks !
recommend most related parameters considering if the dataset is unbalanced or not, or the classification is binary or not
The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.
It would be useful if there was a method to rename the classes.
In many workflows it is common to encode class names as integers. However, when working with final results it is most useful to convert these integers back into the string class names.
It would be useful to do this after the construction of the ConfusionMatrix
, because it is more efficient to remap the labels on the matrix rows and columns than it is to rename each item in the actual and predicted vectors (which can be quite large).
Something like ConfusionMatrix.rename(mapping) or ConfusionMatrix.relabel(mapping) might be a good method signature where mapping
is a dictionary of old names to new names. It would also be useful if this method "did the right thing" when multiple old names mapped to a single new name (i.e. sum the counts in the corresponding matrix cells).
Evaluation Measures for Models Assessment over Imbalanced Data Sets
The zero_one_loss function computes the sum or the average of the 0-1 classification loss
The support is the number of occurrences of each class in y_true
Ballabio, D., Grisoni, F. and Todeschini, R. (2018). Multivariate comparison of classification performance measures. Chemometrics and Intelligent Laboratory Systems, 174, pp.33-44.
Gorodkin J (2004) Comparing two K-category assignments by a K-category
correlation coefficient. Computational Biology and Chemistry 28: 367โ374.
One last issue @sepandhaghighi
You define confusion matrices in your README.md and paper.md but you don't state who the main users of the library will be, i.e., your 'target audience' as the JOSS guideliness state.
Please add some language specifying which users you think will find pycm
most useful, e.g. pycm is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models
Make sure to add this to both the README.md and the paper.md.
Thanks! That's the last "major-minor" revision :)
Ballabio, D., Grisoni, F. and Todeschini, R. (2018). Multivariate comparison of classification performance measures. Chemometrics and Intelligent Laboratory Systems, 174, pp.33-44.
I would have expected matrix and normalized matrix to return the confusion matrix as a numpy array or a pandas DataFrame. However instead it prints a string representation.
It might be a more clear design if these functions were renamed to print_matrix / print_normalized_matrix and matrix / normalized_matrix were added to return a value that the a user can work with.
They are basically used everywhere, no need to convert. They are faster and memory efficient. It's an issue with large lists. The library supports only lists at the moment
Line 13 in c04dd6c
Hi again @sepandhaghighi, one more minor issue:
the examples in the README are great for demonstrating functionality, but I think examples demonstrating actual use cases would be helpful for people.
Again, there are many metrics included in the package; I would say pick three-five metrics that you think will be the ones most frequently used by all users, and then provide some more real-world examples of using pycm with those metrics.
E.g., using it in conjunction with scikit-learn to compare two or three different classifiers.
This would be a place that a simple visualization would really be helpful and make the concepts more intuitive.
I have often adapted this code from the scikit-learn docs to my own needs when I have to visualize a confusion matrix:
http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-selection-plot-confusion-matrix-py
If you do not want to add scikit-learn as a dependency, you might just add a data
sub-package with outputs from some models you have trained, and the same goes for matplotlib or whatever you use to visualize, you could for now just have the plots as static .png files in your library.
Maybe there was a particular problem you were facing that inspired you to develop the package and you have true data + predictions from those models that you could include as example datasets?
I really think adding these sorts of examples will help you recruit users to whom it's not immediately obvious why they shouldn't just use the basic confusion_matrix functionality in scikit-learn.
Hi @sepandhaghighi I'm reviewing pycm
for JOSS.
First, you all have done a great job and the library is well put-together. I'm sorry I haven't started the review sooner. I think the library as-is is very close to "accept". There are just a couple of minor revisions I think might be helpful.
Of those minor revisions, the one I would most like to see is a thorough documentation of the API.
At the bare minimum, the user should be able to type help(ConfusionMatrix)
(or ConfusionMatrix?
in Ipython) and get some sort of useful help. As you probably know, the way to fix this would be to add a docstring to the ConfusionMatrix
class. This docstring should state what the acceptable types for y_true
and y_pred
are--are only lists
of ints
and np.ndarrays
of ints
valid? Including more brief versions of the examples from the README.md (without the entire output of the print(cm)
statement) would probably also be helpful.
I think it would also be helpful to state what data types are acceptable in plain English on your README.md. A potential user will want to know if pymc
can accept their data in its present state or if they're going to have to convert it somehow to the right format.
I will need to finish this review tomorrow but I wanted to get it started for you as soon as possible. I hope my review will be helpful and we can quickly get your library into JOSS so more people can appreciate it.
Evaluation Measures for Models Assessment over Imbalanced Data Sets
List of labels to index the matrix
Add the capability of comparing two CMs by appropriate parameters.
Hey @sepandhaghighi thanks for working on the issues.
One very minor one: I think the JOSS guidelines ask for you to put any relevant DOIs in the paper.md file.
See the example paper.md
:
https://raw.githubusercontent.com/arfon/fidgit/master/paper/paper.md
In a somewhat meta fashion, Fidgit is publishing itself to figshare with DOI 'https://doi.org/10.6084/m9.figshare.828487' [@figshare_archive].
As far as I can tell, your library DOI is not in your paper.md
currently.
Please add that.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.