sepandhaghighi / pycm Goto Github PK

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Add method to rename classes

It would be useful if there was a method to rename the classes.

In many workflows it is common to encode class names as integers. However, when working with final results it is most useful to convert these integers back into the string class names.

It would be useful to do this after the construction of the ConfusionMatrix, because it is more efficient to remap the labels on the matrix rows and columns than it is to rename each item in the actual and predicted vectors (which can be quite large).

Something like ConfusionMatrix.rename(mapping) or ConfusionMatrix.relabel(mapping) might be a good method signature where mapping is a dictionary of old names to new names. It would also be useful if this method "did the right thing" when multiple old names mapped to a single new name (i.e. sum the counts in the corresponding matrix cells).

Add DP (Discriminator Power)

Evaluation Measures for Models Assessment over Imbalanced Data Sets

Add zero-one classification loss

The zero_one_loss function computes the sum or the average of the 0-1 classification loss

Add RCI (Relative Classifier Information)

Matrix print bug in long class name

Add CHANGELOG.md

Add support parameter

The support is the number of occurrences of each class in y_true

Key error in some parameters

Chi-Squared
Conditional Entropy
Cramer-V
Joint Entropy
Lambda A
Lambda B
Mutual Information
Phi-Squared

Add OSX to travis config

One-vs-All matrix print

Add Block Diagram

Add AUC/AUNU/AUNP

Ballabio, D., Grisoni, F. and Todeschini, R. (2018). Multivariate comparison of classification performance measures. Chemometrics and Intelligent Laboratory Systems, 174, pp.33-44.

Add online help function

Add Modified Confusion Entropy

Add Overall MCC

Gorodkin J (2004) Comparing two K-category assignments by a K-category
correlation coefficient. Computational Biology and Chemistry 28: 367–374.

statement of need in README and paper.md

One last issue @sepandhaghighi

You define confusion matrices in your README.md and paper.md but you don't state who the main users of the library will be, i.e., your 'target audience' as the JOSS guideliness state.

Please add some language specifying which users you think will find pycm most useful, e.g. pycm is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models

Make sure to add this to both the README.md and the paper.md.
Thanks! That's the last "major-minor" revision :)

Add dlnd/slnd

Ballabio, D., Grisoni, F. and Todeschini, R. (2018). Multivariate comparison of classification performance measures. Chemometrics and Intelligent Laboratory Systems, 174, pp.33-44.

Names of matrix and normalized_matrix methods are confusing.

I would have expected matrix and normalized matrix to return the confusion matrix as a numpy array or a pandas DataFrame. However instead it prints a string representation.

It might be a more clear design if these functions were renamed to print_matrix / print_normalized_matrix and matrix / normalized_matrix were added to return a value that the a user can work with.

Support numpy arrays

They are basically used everywhere, no need to convert. They are faster and memory efficient. It's an issue with large lists. The library supports only lists at the moment

pycm/pycm/pycm_obj.py

Line 13 in c04dd6c

if not isinstance(actual_vector,list) or not isinstance(predict_vector,list):

Modify Document

Add Table Of Contents
Cite Reference

example usages

Hi again @sepandhaghighi, one more minor issue:

the examples in the README are great for demonstrating functionality, but I think examples demonstrating actual use cases would be helpful for people.

Again, there are many metrics included in the package; I would say pick three-five metrics that you think will be the ones most frequently used by all users, and then provide some more real-world examples of using pycm with those metrics.
E.g., using it in conjunction with scikit-learn to compare two or three different classifiers.

This would be a place that a simple visualization would really be helpful and make the concepts more intuitive.
I have often adapted this code from the scikit-learn docs to my own needs when I have to visualize a confusion matrix:
http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-selection-plot-confusion-matrix-py
If you do not want to add scikit-learn as a dependency, you might just add a data sub-package with outputs from some models you have trained, and the same goes for matplotlib or whatever you use to visualize, you could for now just have the plots as static .png files in your library.

Maybe there was a particular problem you were facing that inspired you to develop the package and you have true data + predictions from those models that you could include as example datasets?

I really think adding these sorts of examples will help you recruit users to whom it's not immediately obvious why they shouldn't just use the basic confusion_matrix functionality in scikit-learn.

Add CBA (Class Balance Accuracy)

document API

Hi @sepandhaghighi I'm reviewing pycm for JOSS.
First, you all have done a great job and the library is well put-together. I'm sorry I haven't started the review sooner. I think the library as-is is very close to "accept". There are just a couple of minor revisions I think might be helpful.

Of those minor revisions, the one I would most like to see is a thorough documentation of the API.

At the bare minimum, the user should be able to type help(ConfusionMatrix) (or ConfusionMatrix? in Ipython) and get some sort of useful help. As you probably know, the way to fix this would be to add a docstring to the ConfusionMatrix class. This docstring should state what the acceptable types for y_true and y_pred are--are only lists of ints and np.ndarrays of ints valid? Including more brief versions of the examples from the README.md (without the entire output of the print(cm) statement) would probably also be helpful.

I think it would also be helpful to state what data types are acceptable in plain English on your README.md. A potential user will want to know if pymc can accept their data in its present state or if they're going to have to convert it somehow to the right format.

I will need to finish this review tomorrow but I wanted to get it started for you as soon as possible. I hope my review will be helpful and we can quickly get your library into JOSS so more people can appreciate it.

In a somewhat meta fashion, Fidgit is publishing itself to figshare with DOI 'https://doi.org/10.6084/m9.figshare.828487' [@figshare_archive].

As far as I can tell, your library DOI is not in your paper.md currently.
Please add that.

sepandhaghighi / pycm Goto Github PK

pycm's Issues

Recommend Projects

Recommend Topics

Recommend Org