Giter Club home page Giter Club logo

dtreeviz's People

Contributors

4skynet avatar alex-amc avatar ashafix avatar borjagarciag avatar demetrio92 avatar djcordhose avatar f-rosato avatar h4dr1en avatar h4nyu avatar jlubbersgeo avatar kmori1229 avatar mepland avatar oegedijk avatar parrt avatar praneet460 avatar qrsforever avatar simonturintech avatar stepanworkv avatar thomsentner avatar tirkarthi avatar tlapusan avatar windisch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dtreeviz's Issues

viz.view()

I am running this in Jupyter Lab with nothing to show. I have installed all the dependency packages. Thanks for your help!
2018-12-26_17-08-03

No title option for plot

is it possible to add a title to the dtreeviz plot? Something like plt.title("This is a plot").
I have searched the documentation on GitHub but didn't find any hint for this parameter.

outlines in leaf bar charts

hi @tlapusan, The leaf graphs are looking closer to the other style now but I think I'd prefer if you remove the background grid as it's different. If you still wanted, you could put an option as a parameter I guess.

Also, note that I have an outline around the bars as a looks a lot better:

Screen Shot 2019-10-11 at 11 05 52 AM

See those grey outlines? I think the code that does that is (such as in class_split_viz()):

        hist, bins, barcontainers = ax.hist(X_hist,
                                            color=X_colors,
                                            align='mid',
                                            histtype=histtype,
                                            bins=bins,
                                            label=class_names)
        # Alter appearance of each bar
        for patch in barcontainers:
            for rect in patch.patches:
                rect.set_linewidth(.5)
                rect.set_edgecolor(colors['rect_edge'])

consider limiting the scope of the library

Hey! I came here from the http://explained.ai/decision-tree-viz/index.html article (which is a great write-up 👍). One thing bothered me => unsolicited suggestion.

In the README you have:

This is the start of a python machine learning library to augment scikit-learn. At the moment, all we have is functionality for decision tree visualization and model interpretation.

From an user perspective: why not have a package just for tree visualization and interpretation, and have this planned "python machine learning library to agument scikit-learn" depending on a tree visualization & inspection package, if needed?

The task of inspecting trees looks quite specific, do you really need to add solutions for unrelated problems (some ML algorithms?) to the same package? Obviously, I don't know what you've planned for the whole library, so please excuse me if that doesn't make any sense :)

A few use cases, for having a dedicated tree viz library:

  • tree visualization has a few dependencies, which other code may not need; it can be the other way around as well - people who want tree visualization may have to install unrelated dependencies. It is solvable, but needs care.

  • https://github.com/TeamHG-Memex/eli5 is not developed actively right now, but we'd absolutely consider making your tree visualization a default, which implies having it as a dependency (probably an optional one); depending on a general-purpose ML library just for its visualization features is less nice than depending on a visualization/inspection library.

  • by having separate packages you may get different release schedules, different contributors, etc. In a large package some code usually gets outdated and deprecated over time. If deprecated code is a separate package, one can just leave it as-is - no need to remove it from an all-in-one library, and no need to maintain it if there is no motivation.

Plot in a for loop not shown in iPython notebook?

How can you plot a decision tree several times within a for loop such as

`for i in range(2):
viz = dtreeviz(clf,
           iris['data'],
           iris['target'],
           target_name='',
           feature_names=np.array(iris['feature_names']),
           scale=2,
           class_names={0:'setosa',1:'versicolor',2:'virginica'})
        
viz`

The Jupyter notebook doesnt show any output. Is there some kind of possibility to have an inplace plot several times or is this a limitation of your package?

Issue with import

I do:
from dtreeviz.trees import *
I get:
NameError: name 'rtreeviz_univar' is not defined
when execuring

t = rtreeviz_univar(ax,
                    X_train.WGT, y_train,
                    max_depth=2,
                    feature_name='Vehicle Weight',
                    target_name='MPG',
                    fontsize=14)

If i do:

from dtreeviz import trees
'dtreeviz' in dir(trees)

returns True

'rtreeviz_univar' in dir(trees)

returns False
Strange

show_just_path_prediction

I started to work on showing just the prediction path (based on what we discussed in a previous issue)

Right now, it looks like this :
Screenshot 2019-11-28 at 15 01 21

Am I on the right track ? :) Do you think we need to show also the neighborhood nodes. I don't have a strong argument for them, but I like how they look (they show somehow the 'opposite' prediction path)

label on leaf node

Would it be possible to put the name of the predicted class on a leaf node instead of just the color?

white space in feature names don't work

See pictures for code and Viz result.

I think the issue may be that my feature names contain spaces. Is it possible this could be causing issues ?

I'm running this on MacOS X Mojave, Jupyter Lab, Anaconda 5.2

An example feature name is '12M Realized Volatility_Amean'
delimiter error

viz result

Bars at data limits not plotted

Hi

I have an issue where the bars at the limits of the data range are not being plotted, this is particularly problematic when the feature cardinality is 2. It only happens for particular input values but I have not yet been able to find the relationship between those which are successfully plotted and those which are not.

A reproducible example:

Correct result

import numpy as np
from sklearn.tree import DecisionTreeClassifier
from dtreeviz.trees import dtreeviz

x = np.random.choice([0,1], size=(100,2))
y = np.random.choice([0,1], size=100)

dtreeviz(
    tree_model=DecisionTreeClassifier(max_depth=1).fit(x, y),
    X_train=x,
    y_train=y,
    feature_names=['a','b'],
    target_name='y',
    class_names=[1,0]
)

dtreeviz_correct

Incorrect result

x = np.random.choice([-1,1], size=(100,2))
y = np.random.choice([0,1], size=100)

dtreeviz(
    tree_model=DecisionTreeClassifier(max_depth=1).fit(x, y),
    X_train=x,
    y_train=y,
    feature_names=['a','b'],
    target_name='y',
    class_names=[1,0]
)

dtreeviz_incorrect

Thank you for the library!

viz for leaf samples distribution

I would like to propose a new vizualisation type for leaf samples : by histrogram.
It is useful when you want to easily see the general distribution of leaf samples, instead of individual leaf samples.
More useful when there is a big tree with a lot of leaves and the bar plot vizualisation is hard to interpret.
For implementation, this will add another value ('hist') for display_type parameter from viz_leaf_samples function.
The new vizualisation will look like this :

hist

  • when we want to include a subset of leaf samples (when we want to exclude the outlier leaves, like that leaf with ±350 samples from above viz)

hist_filter

@parrt what do you think about it ?

TypeError: can't multiply sequence by non-int of type 'float'

Thansk for the code and tutorial. I found it very useful.
I am getting TypeError in my dataset. I set up this easy dataset and still getting the same error.

TypeError: can't multiply sequence by non-int of type 'float'

It maybe caused by feature_names because in your example you always used datasets that already have feature_names set up. Can you help me with that?

import pandas as pd
from sklearn import preprocessing, tree
from dtreeviz.trees import dtreeviz

Things = {'Feature01': [3,4,5,0],
'Feature02': [4,5,6,0],
'Feature03': [1,2,3,8],
'Target01': ['5','5','6','6']}
df = pd.DataFrame(Things,
columns= ['Feature01', 'Feature02',
'Feature03', 'Target01'])
y_variable=df[['Target01']]
X_variable=df[['Feature01', 'Feature02','Feature03']]

regressor = DecisionTreeRegressor(max_depth=2)
regressor.fit(X_variable, y_variable)

dtreeviz(regressor,
X_variable,
y_variable,
target_name='Target01',
feature_names=['Feature01', 'Feature02','Feature03']#df.columns[0:3]

     )

IndexError: index 0 is out of bounds for axis 0 with size 0

I trained a Sklearn CART model (using default parameters provided by Scikit-learn) on the attached data (data.txt, where the column "Target" contains the labels to predict, the rest of the columns containing the values associated with features).

For model fitting, I used :
from sklearn.tree import DecisionTreeClassifier
from sklearn.utils.class_weight import compute_sample_weight
clf = DecisionTreeClassifier(random_state = 374564)
clf.fit(data.drop(['Target'], axis = 1).values, data['Target'].values, sample_weight = compute_sample_weight("balanced", data['Target'].values))

I launched dtreeviz with the following command :
viz = dtreeviz(clf, data.drop(['Target'], axis = 1).values, data['Target'].values, target_name='pred', feature_names=feature_names, class_names=["Class1", "Class2"])

Then I got the following error message :
Traceback (most recent call last):
File "", line 1, in
File "/home/bomane/anaconda3/lib/python3.7/site-packages/dtreeviz/trees.py", line 373, in dtreeviz
filename=f"{tmp}/leaf{node.id}_{getpid()}.svg")
File "/home/bomane/anaconda3/lib/python3.7/site-packages/dtreeviz/trees.py", line 543, in class_leaf_viz
draw_piechart(node.class_counts(), size=size, colors=colors, filename=filename, label=f"n={node.nsamples()}")
File "/home/bomane/anaconda3/lib/python3.7/site-packages/dtreeviz/trees.py", line 728, in draw_piechart
i = np.nonzero(counts)[0][0]
IndexError: index 0 is out of bounds for axis 0 with size 0

Can you please tell me what could be the problem here ?

OSX:Display error in JupyterLab(Warning: No loadimage plugin for "svg:cairo")

Plaoform: Mac OS Mojave
Using Anaconda
Python -v: 3.7.1
Using brew install graphviz --with-librsvg --with-pango to install graphviz

if .dot files contain digraph G { A -> B }, dot -Tsvg:cairo test.dot > test.svg can produce plot successfully.

In jupyter Lab,it display this:
image

In install the graphviz, it tells me that icu4c is keg-only, which means it was not symlinked into /usr/local, however it dot works and terminal tells me graphviz 2.40.1 is already installed and up-to-date

compare to shap?

I'm new to this project but have previously used SHAP (https://github.com/slundberg/shap), so any documentation comparing/contrasting this toolset with SHAP would be interesting.

My quick take is that dtreeviz is primarily about visualization, whereas SHAP is primarily about numerical estimates of feature importance (plus visualizing these estimates). Is there any overlap? To the extent that they don't overlap, a demo that shows how to use dtreeviz and SHAP in complementary ways to more-fully explain a model could be useful.

Warning: No loadimage plugin for "svg:cairo"

Hello there,

I am trying to plot iris dataset in Ubuntu 18.04 bionic, with graphviz package installed. However I am getting svg:cairo warnings and there only arrows with empty boxes. The warnings that I am getting:

Warning: No loadimage plugin for "svg:cairo"
Warning: No loadimage plugin for "svg:cairo"
Warning: No loadimage plugin for "svg:cairo"
Warning: No loadimage plugin for "svg:cairo"
Warning: No loadimage plugin for "svg:cairo"
Warning: No loadimage plugin for "svg:cairo"
Warning: No loadimage plugin for "svg:cairo"
Warning: No loadimage plugin for "svg:cairo"

The figure looks like this:
2018-09-27-143948_4680x1920_scrot

Getting "TypeError: 'module' object is not callable" while calling dtreeviz method

while running the code:

viz = dtreeviz(clf,
                   X_train,
                   y_train,
                   target_name='target_name',
                   feature_names=feature_names,
                   orientation=orientation,
                   class_names=["setosa",
                                "versicolor",
                                "virginica"],  # 0,1,2 targets
                   fancy=fancy,
                   X=X,
                   label_fontsize=label_fontsize,
                   ticks_fontsize=ticks_fontsize,
                   fontname=fontname)

getting error as below:

TypeError                                 Traceback (most recent call last)
<ipython-input-23-95e7c1fb5cfe> in <module>()
      3 #regr = regr.fit(X_train, y_train)
      4 
----> 5 viz = viz_tree(clf,X_train,y_train)
      6 viz.save("GermanCredit.svg") # suffix determines the generated image format
      7 viz.view()             # pop up window to display image

<ipython-input-22-c33325585be5> in viz_tree(clf, X_train, y_train, orientation, fancy, pickX, label_fontsize, ticks_fontsize, fontname)
     29                    label_fontsize=label_fontsize,
     30                    ticks_fontsize=ticks_fontsize,
---> 31                    fontname=fontname)
     32 
     33     return viz

TypeError: 'module' object is not callable

PMML Support

Does this library have support for loading trees saved in PMML files? If not, would there be any interest in supporting it?

Tools like IBM's SPSS can export their decision trees in PMML XML. To have this library support it would potentially mean that enterprise customers no longer need to purchase SPSS in order to view their tree. It would also allow for easier distribution of a tree once created.

Example leaf size not proportional to number of samples

OS: Windows 7 Enterprise
Python: 3.6.5
Running on Jupyter Notebook

I am able to recreate the tree visualization for the iris data set as given in README.md

iris

However, when I run on my own data set I find the peculiarity that one of the leaves ("Fault B") does not seem to be proportional to the number of sample (35) as the others are

mydata

This is an unwieldy data set and I tried to reproduce the issue after applying PCA to reduce the dimensionality, but the resulting decision tree looks good (i.e. the leaves appear appropriately sized)

afterpca

Very curious!

Unable to get dot installed properly on Mac Mojave (10.14.4)

Hi,

I'm following the instructions on your github page to get dtreeviz working for a school project but it doesn't seem to be updating dot properly. I've ran your instructions multiple times. Also, found this thread and tried everything in there as well:

#33

I've followed all the instructions to install on github page:

(including uninstalling python-graphviz and graphviz and brew uninstall packages and downloading their package and doing a make install)

https://github.com/parrt/dtreeviz

I'm on a MacBook Pro 13" 2018 Mojave (10.14.4)

Was wondering if you might be able to help?

brew upgrade pango librsvg
Error: pango 1.42.4_1 already installed
Error: librsvg 2.44.13 already installed

My system is still finding dot from /usr/local/bin/dot

It also doesn't know anything about a ../Cellar/graphviz/2.40.1/bin/dot directory

I also saw bunch of warnings when I did the make but didn't find any errors:

✔ /tmp/graphviz-2.40.1
20:34 $ which dot
/usr/local/bin/dot
✔ /tmp/graphviz-2.40.1
20:34 $ dot -Tsvg:cairo
Format: "svg:cairo" not recognized. Use one of: svg:svg:core
✘-1 /tmp/graphviz-2.40.1
20:35 $ lw -l
-bash: lw: command not found
✘-127 /tmp/graphviz-2.40.1
20:35 $ ls -l ../Cellar/graphviz/2.40.1/bin/dot
ls: ../Cellar/graphviz/2.40.1/bin/dot: No such file or directory
✘-1 /tmp/graphviz-2.40.1

here is a history of my commands for the install:

589  brew uninstall graphviz
 590  brew reinstall pango librsvg --build-from-source # even if already there, please reinstall
 591  brew reinstall cairo --build-from-source
 592  brew install graphviz --build-from-source
 593  brew info graphviz
 594  #./configure --includedir=/usr/local/include/graphviz --with-pangocairo=yes --with-rsvg=yes
 595  history
 596  make uninstall
 597  ./configure --includedir=/usr/local/include/graphviz --with-pangocairo=yes --with-rsvg=yes
 598  rm -rf /usr/local/lib/graphviz
 599  ./configure --includedir=/usr/local/include/graphviz --with-pangocairo=yes --with-rsvg=yes | tee make.log 2>&1
 600  #make -j 8 | tee
 601  mv make.log configure.log
 602  make -j 8 | tee make.log 2>&1
 603  make install | tee install.log 2>&1
 604  history

set the range of histogram

Hi! This is a very cool project and useful for me.
But I have an issue about the style of histograms.

If the data features have a wide value range, the plotted histogram will be unbalanced like this:
2019-01-27 19 41 29

I think if we can specify the range of the histogram, it will be useful to solve this issue.
Do you have an idea about this?

I will check dtreeviz/trees.py, I'm not sure whether I can implement the function.

MacOS: Error: invalid option: --with-librsvg

First off thank you for creating the dependency it looks incredible to use.
But sadly I cant seem to install graphiz. Following the instructions for issue #23 I am confused with what I am supposed to do:
1.) I have xcode-select installed
2.) I ran "sudo xcodebuild -license" from the command-line (I dont understand why...just to confirm I got it I guess.)
3.) I run "brew uninstall graphviz"
4.) then finally run "brew install graphviz --with-librsvg --with-pango" in command-line

Then I get the following error
Error: invalid option: --with-librsvg
image

Thank you in advance for any help.

dtreeviz fails with IndexError for n_classes >= 11

Apparently dtreeviz in dtreeviz.trees.py fails on the following block of code:

> /home/macermak/.local/lib/python3.6/site-packages/dtreeviz/trees.py(320)dtreeviz()
    318 
    319     n_classes = shadow_tree.nclasses()
--> 320     color_values = color_blind_friendly_colors[n_classes]
    321 
    322     # Fix the mapping from target value to color for entire tree

resulting in

IndexError: list index out of range

My model has 15 classes and I believe the length of color_blind_friendly_colors (which is 11) is causing the issue.

problems of empty data in the output fig

Your project of visualization-interpretation is very interesting and meaningful. I had a problem when I output the fig for visualization. The data/diagrams in the output fig was empty with only a framework. I tried to find the problem but I failed. Have any others had this problem? could you please help me solve this problem? Thank you!

Windows 10 Installation Bug - Graphviz path wants str or os.PathLike object not WindowsPath

To install I ran:
pip install dtreeviz on anaconda prompt (Not as admin).
Then I downloaded and installed graphviz-2.38.msi from here: https://graphviz.gitlab.io/_pages/Download/Download_windows.html

When running:
(base) C:\Users\Will>where dot
I get:
C:\Program Files (x86)\Graphviz2.38\bin\dot.exe

When running:
(base) C:\Users\Will>dot -V
I get:
dot - graphviz version 2.38.0 (20140413.2041)

When running:

import os
import subprocess
proc = subprocess.Popen(['dot','-V'])
print( os.getenv('Path') )

The output contains the path:
C:\Program Files (x86)\Graphviz2.38\bin\

When running:

import graphviz.backend as be
cmd = ["dot", "-V"]
stdout, stderr = be.run(cmd, capture_output=True, check=True, quiet=False)
print( stderr )

Output:

dot - graphviz version 2.38.0 (20140413.2041)
b'dot - graphviz version 2.38.0 (20140413.2041)\r\n'

The problem is I get an error when running the first boston regression example:

from sklearn.datasets import *
from sklearn import tree
from dtreeviz.trees import *
regr = tree.DecisionTreeRegressor(max_depth=2)
boston = load_boston()
regr.fit(boston.data, boston.target)

viz = dtreeviz(regr,
               boston.data,
               boston.target,
               target_name='price',
               feature_names=boston.feature_names)
              
viz.view()  

Throws this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-216c2df6378b> in <module>()
      9                feature_names=boston.feature_names)
     10 
---> 11 viz.view()

C:\Users\Will\Anaconda3\Lib\site-packages\dtreeviz\trees.py in view(self)
     64         tmp = tempfile.gettempdir()
     65         svgfilename = f"{tmp}/DTreeViz_{getpid()}.svg"
---> 66         self.save(svgfilename)
     67         view(svgfilename)
     68 

C:\Users\Will\Anaconda3\Lib\site-packages\dtreeviz\trees.py in save(self, filename)
     78 
     79         g = graphviz.Source(self.dot, format='svg')
---> 80         dotfilename = g.save(directory=path.parent, filename=path.stem)
     81 
     82         if PLATFORM=='darwin':

C:\Users\Will\Anaconda3\Lib\site-packages\graphviz\files.py in save(self, filename, directory)
    145             self.directory = directory
    146 
--> 147         filepath = self.filepath
    148         tools.mkdirs(filepath)
    149 

C:\Users\Will\Anaconda3\Lib\site-packages\graphviz\files.py in filepath(self)
    129     @property
    130     def filepath(self):
--> 131         return os.path.join(self.directory, self.filename)
    132 
    133     def save(self, filename=None, directory=None):

C:\Users\Will\Anaconda3\lib\ntpath.py in join(path, *paths)
     74 # Join two (or more) paths.
     75 def join(path, *paths):
---> 76     path = os.fspath(path)
     77     if isinstance(path, bytes):
     78         sep = b'\\'

TypeError: expected str, bytes or os.PathLike object, not WindowsPath

It looks like it still doesn't like the fact a path variable is being passed in as a WindowsPath rather than a string?

Warning: No loadimage plugin for "svg:cairo" in Mac

I have been trying to use dtreeviz to visualise iris data using my Mac.

#Iris data was loaded in an earlier part of the code   
from sklearn.datasets import *
from sklearn import tree
from dtreeviz.trees import *  

classifier = tree.DecisionTreeClassifier(max_depth=2)  # limit depth of tree
classifier.fit(iris.data, iris.target)

viz = dtreeviz(classifier, 
               iris.data, 
               iris.target,
               target_name='variety',
               feature_names=iris.feature_names, 
               class_names=["setosa", "versicolor", "virginica"]  # need class_names for classifier
              )  
viz.view()

Despite installing graphviz as:

brew install graphviz --with-librsvg --with-app --with-pango

I get the following output.

screen shot 2018-10-30 at 14 59 24

Solutions proposed in #4 did not work for me either. On downloading the zipped file in the thread and using dot -Tsvg t.dot > t.svg I could view the combined file. However, this did not work for me after deleting the contents of the folder III from the current directory.

Also, I noticed that the individual plots of only some of the nodes get stored in the location: "/private/var/folders/gs/r3f6yn8n4zj570qsj4s9rvnm0000gp/T/DTreeViz_19418" This included leaf1, leaf3, leaf4, node0, node2, legend and two other files named DTreeViz.

Large svg files being generated for datasets having more than a million records

I am trying to visualize a decision tree for one of the random forest regression models I have built and I have noticed that the svg file size is around 1.3 gb. I have reduced the depth of the model and changed the visualization to non-fancy (fancy = false) but it still generates a 250 mb+ svg file because of which rendering fails. These huge file sizes are because of the histograms being generated at the leaf node. Is there any way to have a simple output with following fields:

  1. Number of records
  2. Prediction

I have gone through the code and noticed that I need to change the function regr_leaf_node in order to suppress the histogram. Please let me know what would be the next steps.

Thanks,
Vinay

make size of plots configurable

It would be great to have the sizes of plots configurable on all platforms even as SVGs. Same of the plots come out quite tiny and hard to read by default.

list index out of range for regressor with y in {0,1}

Hi, thank you for your library, I think it's very useful to show the results, specially I like treeviz_bivar_3D , I was trying to compile a code with 2 variables and 1 binary output, but I obtain this error, I'm not sure if it was because the data type:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-217-d956e21d7290> in <module>
     47                       dist=12,
     48 
---> 49                       show={'splits','title'})
     50 
     51 

~/anaconda3/lib/python3.7/site-packages/dtreeviz/trees.py in rtreeviz_bivar_3D(ax, X_train, y_train, max_depth, feature_names, target_name, fontsize, ticks_fontsize, fontname, azim, elev, dist, show, colors, n_colors_in_map)
    234 
    235     rt = tree.DecisionTreeRegressor(max_depth=max_depth)
--> 236     rt.fit(X_train, y_train)
    237 
    238     y_lim = np.min(y_train), np.max(y_train)

~/anaconda3/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
   1155             sample_weight=sample_weight,
   1156             check_input=check_input,
-> 1157             X_idx_sorted=X_idx_sorted)
   1158         return self
   1159 

~/anaconda3/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    248         if len(y) != n_samples:
    249             raise ValueError("Number of labels=%d does not match "
--> 250                              "number of samples=%d" % (len(y), n_samples))
    251         if not 0 <= self.min_weight_fraction_leaf <= 0.5:
    252             raise ValueError("min_weight_fraction_leaf must in [0, 0.5]")

ValueError: Number of labels=1 does not match number of samples=21

Make colors configurable

Hi!

Could you make the colors configurable? I like the idea of being colorblind-friendly; unfortunately on some projectors the palette colors are indistinguishable.

Thanks and cheers

TypeError: unhashable type: 'numpy.ndarray'

I am having an issue with visualizing decision trees with dtreeviz. I am running this on Ubuntu 18.04 using an anaconda environment using Python 3.6.0 and Jupyter Notebook.

I used the following example with some changes.

regr = tree.DecisionTreeRegressor(max_depth=2)
boston = load_boston()
regr.fit(boston.data, boston.target)

viz = dtreeviz(regr,
               boston.data,
               boston.target,
               target_name='price',
               feature_names=boston.feature_names)
              
viz.view()              

I tried to use the same code on my nutrient database.

from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import export_graphviz

dtree = DecisionTreeRegressor()
dtree.fit(tree_X, tree_y)

from dtreeviz.trees import dtreeviz

viz = dtreeviz(dtree,
               tree_X,
               tree_y,
               target_name='Iron_(mg)',
               feature_names=['Fiber_TD_(g)', 'Carbohydrt_(g)'])
              
viz.view()  

It returned the following error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-82-60a9d000cb22> in <module>
      5                tree_y,
      6                target_name='Iron',
----> 7                feature_names=np.array(['Fiber_TD_(g)', 'Carbohydrt_(g)'], dtype='<U17'))
      8 
      9 viz.view()

~/anaconda3/envs/playground/lib/python3.6/site-packages/dtreeviz/trees.py in dtreeviz(tree_model, X_train, y_train, feature_names, target_name, class_names, precision, orientation, show_root_edge_labels, show_node_labels, fancy, histtype, highlight_path, X, max_X_features_LR, max_X_features_TD)
    672 
    673     shadow_tree = ShadowDecTree(tree_model, X_train, y_train,
--> 674                                 feature_names=feature_names, class_names=class_names)
    675 
    676     if X is not None:

~/anaconda3/envs/playground/lib/python3.6/site-packages/dtreeviz/shadow.py in __init__(self, tree_model, X_train, y_train, feature_names, class_names)
     68         self.unique_target_values = np.unique(y_train)
     69         self.node_to_samples = ShadowDecTree.node_samples(tree_model, X_train)
---> 70         self.class_weights = compute_class_weight(tree_model.class_weight, self.unique_target_values, self.y_train)
     71 
     72         tree = tree_model.tree_

~/anaconda3/envs/playground/lib/python3.6/site-packages/sklearn/utils/class_weight.py in compute_class_weight(class_weight, classes, y)
     39     from ..preprocessing import LabelEncoder
     40 
---> 41     if set(y) - set(classes):
     42         raise ValueError("classes should include all valid labels that can "
     43                          "be in y")

TypeError: unhashable type: 'numpy.ndarray'

I thought it's because it wanted me to out in a numpy array type instead of a list. But after checking out dtreeviz in my notebook, it required a list for the argument.

But I tried it anyway by converting the list to a numpy array.

viz = dtreeviz(dtree,
               tree_X,
               tree_y,
               target_name='Iron_(mg)',
               feature_names=np.array(['Fiber_TD_(g)', 'Carbohydrt_(g)']))
              
viz.view()  

But I still got the same error.

Then I tried to match the datatype the same as the boston datastet features name.

from sklearn.datasets import load_boston
boston = load_boston()
boston.feature_names

# Output
array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT'],
      dtype='<U7')
viz = dtreeviz(dtree,
               tree_X,
               tree_y,
               target_name='Iron_(mg)',
               feature_names=np.array(['Fiber_TD_(g)', 'Carbohydrt_(g)'], dtype='<U7'))
              
viz.view()

But I still got the same error every time. I'm not really sure what happened. I was wondering if there was something I was missing?

Tree spine view

Perhaps a view of tree that showed only path from root to leaf for specific test vector would be useful.

Discover data patterns by investigating leaf training samples

Generally speaking, ML trained models can be used to both make predictions and/or to better understand our data.
Until now, we have created visualisations for histogram of classes in leaves, split plots for the regressor leaves, but we don't have a way to find out more about training samples reaching those leaves.
I think it will be useful to have a method to return the leaf training examples and maybe some general stats about them.

Bellow, I have attached a screenshot with get_node_samples() from my library. I guess it could be pretty easy integrated in dtreeviz, taking in consideration that there is already built-in functionality to take the samples from a leaf.

Screenshot 2019-11-18 at 17 52 26

UnicodeDecodeError under Japanese envrionment

Hello!

I encountered UnicodeDecodeError under Windows10; Japanese version.

I think there is a need to add encoding='UTF-8' when opening a SVG file at line 51 of utils.py.

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-20-f25891910720> in <module>()
      6                class_names=['Safe', 'Bad'])
      7 
----> 8 viz.view()

c:\users\takahiro_endo\venv64\lib\site-packages\dtreeviz\trees.py in view(self)
     64         tmp = tempfile.gettempdir()
     65         svgfilename = f"{tmp}/DTreeViz_{getpid()}.svg"
---> 66         self.save(svgfilename)
     67         view(svgfilename)
     68 

c:\users\takahiro_endo\venv64\lib\site-packages\dtreeviz\trees.py in save(self, filename)
    101             with open(filename, encoding='UTF-8') as f:
    102                 svg = f.read()
--> 103             svg = inline_svg_images(svg)
    104             with open(filename, "w", encoding='UTF-8') as f:
    105                 f.write(svg)

c:\users\takahiro_endo\venv64\lib\site-packages\dtreeviz\utils.py in inline_svg_images(svg)
     50         svgfilename = img.attrib["{http://www.w3.org/1999/xlink}href"]
     51         with open(svgfilename) as f:
---> 52             imgsvg = f.read()
     53         imgroot = ET.fromstring(imgsvg)
     54         for k,v in img.attrib.items(): # copy IMAGE tag attributes to svg from image file

UnicodeDecodeError: 'cp932' codec can't decode byte 0x92 in position 1079: illegal multibyte sequence

enormous repo size (~400mb)

Hey - interesting project. Just git cloned it to check out some things and realized it was taking a lot longer than expected, especially given I had already seen the number of files in the github UI. So after cloning:

du -h .

gives

 48K	./.git/hooks
8.0K	./.git/info
  0B	./.git/logs/refs/heads
  0B	./.git/logs/refs/remotes/origin
  0B	./.git/logs/refs/remotes
  0B	./.git/logs/refs
  0B	./.git/logs
4.0K	./.git/objects/info
339M	./.git/objects/pack
339M	./.git/objects
  0B	./.git/refs/heads
4.0K	./.git/refs/remotes/origin
4.0K	./.git/refs/remotes
  0B	./.git/refs/tags
4.0K	./.git/refs
340M	./.git
 20K	./.idea
 36K	./animl/viz
 48K	./animl
5.5M	./notebooks
8.0K	./testing/bin
188K	./testing/data
 64M	./testing/samples
 66M	./testing
412M	.

A lot of it looks like duplicates of these images in the history:

git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| sed -n 's/^blob //p' \
| sort --numeric-sort --key=2 \
| cut -c 1-12,41- \
| gnumfmt --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
| tail 20

(from https://stackoverflow.com/a/42544963)
gives

387fa734f114  4.9MiB testing/samples/fires-TD-4.svg
1eb65f5cc60b  4.9MiB testing/samples/fires-TD-4.svg
4b5db9e74df0  4.9MiB testing/samples/fires-TD-4.svg
d89306ae2202  5.5MiB notebooks/examples.ipynb
a114a526f20e  5.6MiB testing/playground.ipynb
1689888cef83  6.9MiB testing/samples/sweets-TD-2.svg
54a0623f219b  6.9MiB testing/samples/sweets-LR-2-X.svg
72b5602a4d49  7.0MiB testing/samples/sweets-TD-2.svg
4fbf5680d31f  7.0MiB testing/samples/sweets-LR-2-X.svg
32367bcf5ae5  7.1MiB testing/samples/sweets-TD-2.svg
74a48fb74d54  7.2MiB testing/samples/sweets-LR-2-X.svg
eb1fd3844434  9.3MiB testing/samples/sweets-LR-3.svg
12c0992eac36  9.4MiB testing/samples/sweets-TD-3-X.svg
090c453c6736  9.5MiB testing/samples/sweets-TD-3-X.svg
cef390169f37  9.5MiB testing/samples/sweets-LR-3.svg
ba34b922a1c3  9.6MiB testing/samples/sweets-TD-3-X.svg
0aeb5d2e2c35  9.9MiB testing/samples/sweets-LR-3.svg
c0e22ce1ad0b   12MiB testing/samples/sweets-TD-4.svg
92e68cee1463   12MiB testing/samples/sweets-TD-4.svg
5f83e5ce4ca6   12MiB testing/samples/sweets-TD-4.svg

Maybe one of these answers could help you out?
https://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-git-repository

Donut charts in Graphviz

Hi @parrt,

Following your suggestion via email (20 July 2019):

donuts do look cool; maybe create an issue in repo so I can look at this when I have time?

here's an extract from my email to you:

I’m sending you this email in case:

  • You’re using Graphviz to create pie charts (I gather you might be using matplotlib for this—for these nodes—not Graphviz) and
  • You’d prefer donuts

Here’s a code pen I created yesterday, “Donut charts in d3-graphviz”, which overlays filled circles on the pies to make them like donuts.

Some background on this technique/hack, in a d3-graphviz GitHub issue: “image node attribute that refers to dynamically generated SVG (data URL)?”.

Set graph size

My previous question was not clear I guess, but my question should be how to set the size of each graph generated by the library?

Can't get viz.view() to work

Potentially related to #19

I installed dot using your command brew install graphviz --with-librsvg --with-app --with-pango and also have done pip install graphviz. I am having trouble with the viz.view() command when trying to run your example code. I am getting the following error:

---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
<ipython-input-21-4930bdb0a4e7> in <module>()
     12                X=X)  # need to give single observation for prediction
     13 
---> 14 viz.view()

/anaconda3/lib/python3.6/site-packages/dtreeviz/trees.py in view(self)
     66         tmp = tempfile.gettempdir()
     67         svgfilename = f"{tmp}/DTreeViz_{getpid()}.svg"
---> 68         self.save(svgfilename)
     69         view(svgfilename)
     70 

/anaconda3/lib/python3.6/site-packages/dtreeviz/trees.py in save(self, filename)
     89             cmd = ["dot", f"-T{format}:cairo", "-o", filename, dotfilename]
     90             # print(' '.join(cmd))
---> 91             stdout, stderr = run(cmd, capture_output=True, check=True, quiet=False)
     92 
     93         else:

/anaconda3/lib/python3.6/site-packages/graphviz/backend.py in run(cmd, input, capture_output, check, quiet, **kwargs)
    157         stderr_write_bytes(err, flush=True)
    158     if check and proc.returncode:
--> 159         raise CalledProcessError(proc.returncode, cmd, output=out, stderr=err)
    160 
    161     return out, err

CalledProcessError: Command '['dot', '-Tsvg:cairo', '-o', '/var/folders/13/x7py_p6115gdj9y7_bybd8xc4gmyvs/T/DTreeViz_84409.svg', '/var/folders/13/x7py_p6115gdj9y7_bybd8xc4gmyvs/T/DTreeViz_84409']' returned non-zero exit status 1.

I am running Python 3.6.5::Anaconda in a Jupyter Notebook. I also tried this in Jupyter Lab and got the same error. Thanks for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.