parrt / dtreeviz Goto Github PK
View Code? Open in Web Editor NEWA python library for decision tree visualization and model interpretation.
License: MIT License
A python library for decision tree visualization and model interpretation.
License: MIT License
In new examples the trees are displayed using the default sklearn not dtreeviz ;) This is for @tlapusan :)
is it possible to add a title to the dtreeviz plot? Something like plt.title("This is a plot").
I have searched the documentation on GitHub but didn't find any hint for this parameter.
hi @tlapusan, The leaf graphs are looking closer to the other style now but I think I'd prefer if you remove the background grid as it's different. If you still wanted, you could put an option as a parameter I guess.
Also, note that I have an outline around the bars as a looks a lot better:
See those grey outlines? I think the code that does that is (such as in class_split_viz()
):
hist, bins, barcontainers = ax.hist(X_hist,
color=X_colors,
align='mid',
histtype=histtype,
bins=bins,
label=class_names)
# Alter appearance of each bar
for patch in barcontainers:
for rect in patch.patches:
rect.set_linewidth(.5)
rect.set_edgecolor(colors['rect_edge'])
Hey! I came here from the http://explained.ai/decision-tree-viz/index.html article (which is a great write-up 👍). One thing bothered me => unsolicited suggestion.
In the README you have:
This is the start of a python machine learning library to augment scikit-learn. At the moment, all we have is functionality for decision tree visualization and model interpretation.
From an user perspective: why not have a package just for tree visualization and interpretation, and have this planned "python machine learning library to agument scikit-learn" depending on a tree visualization & inspection package, if needed?
The task of inspecting trees looks quite specific, do you really need to add solutions for unrelated problems (some ML algorithms?) to the same package? Obviously, I don't know what you've planned for the whole library, so please excuse me if that doesn't make any sense :)
A few use cases, for having a dedicated tree viz library:
tree visualization has a few dependencies, which other code may not need; it can be the other way around as well - people who want tree visualization may have to install unrelated dependencies. It is solvable, but needs care.
https://github.com/TeamHG-Memex/eli5 is not developed actively right now, but we'd absolutely consider making your tree visualization a default, which implies having it as a dependency (probably an optional one); depending on a general-purpose ML library just for its visualization features is less nice than depending on a visualization/inspection library.
by having separate packages you may get different release schedules, different contributors, etc. In a large package some code usually gets outdated and deprecated over time. If deprecated code is a separate package, one can just leave it as-is - no need to remove it from an all-in-one library, and no need to maintain it if there is no motivation.
How to make the image size bigger?
I just read the story where you shared the thoughts and lessons learned when implementing this nice library (thanks for that, great read indeed). This issue is part of the future work that is mentioned in the article. I hope it's still relevant? I'll try to submit a PR soon;-)
How can you plot a decision tree several times within a for loop such as
`for i in range(2):
viz = dtreeviz(clf,
iris['data'],
iris['target'],
target_name='',
feature_names=np.array(iris['feature_names']),
scale=2,
class_names={0:'setosa',1:'versicolor',2:'virginica'})
viz`
The Jupyter notebook doesnt show any output. Is there some kind of possibility to have an inplace plot several times or is this a limitation of your package?
I do:
from dtreeviz.trees import *
I get:
NameError: name 'rtreeviz_univar' is not defined
when execuring
t = rtreeviz_univar(ax,
X_train.WGT, y_train,
max_depth=2,
feature_name='Vehicle Weight',
target_name='MPG',
fontsize=14)
If i do:
from dtreeviz import trees
'dtreeviz' in dir(trees)
returns True
'rtreeviz_univar' in dir(trees)
returns False
Strange
I started to work on showing just the prediction path (based on what we discussed in a previous issue)
Right now, it looks like this :
Am I on the right track ? :) Do you think we need to show also the neighborhood nodes. I don't have a strong argument for them, but I like how they look (they show somehow the 'opposite' prediction path)
Would it be possible to put the name of the predicted class on a leaf node instead of just the color?
Hi
I have an issue where the bars at the limits of the data range are not being plotted, this is particularly problematic when the feature cardinality is 2. It only happens for particular input values but I have not yet been able to find the relationship between those which are successfully plotted and those which are not.
A reproducible example:
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from dtreeviz.trees import dtreeviz
x = np.random.choice([0,1], size=(100,2))
y = np.random.choice([0,1], size=100)
dtreeviz(
tree_model=DecisionTreeClassifier(max_depth=1).fit(x, y),
X_train=x,
y_train=y,
feature_names=['a','b'],
target_name='y',
class_names=[1,0]
)
x = np.random.choice([-1,1], size=(100,2))
y = np.random.choice([0,1], size=100)
dtreeviz(
tree_model=DecisionTreeClassifier(max_depth=1).fit(x, y),
X_train=x,
y_train=y,
feature_names=['a','b'],
target_name='y',
class_names=[1,0]
)
Thank you for the library!
I would like to propose a new vizualisation type for leaf samples : by histrogram.
It is useful when you want to easily see the general distribution of leaf samples, instead of individual leaf samples.
More useful when there is a big tree with a lot of leaves and the bar plot vizualisation is hard to interpret.
For implementation, this will add another value ('hist') for display_type parameter from viz_leaf_samples function.
The new vizualisation will look like this :
@parrt what do you think about it ?
Thansk for the code and tutorial. I found it very useful.
I am getting TypeError in my dataset. I set up this easy dataset and still getting the same error.
TypeError: can't multiply sequence by non-int of type 'float'
It maybe caused by feature_names because in your example you always used datasets that already have feature_names set up. Can you help me with that?
import pandas as pd
from sklearn import preprocessing, tree
from dtreeviz.trees import dtreeviz
Things = {'Feature01': [3,4,5,0],
'Feature02': [4,5,6,0],
'Feature03': [1,2,3,8],
'Target01': ['5','5','6','6']}
df = pd.DataFrame(Things,
columns= ['Feature01', 'Feature02',
'Feature03', 'Target01'])
y_variable=df[['Target01']]
X_variable=df[['Feature01', 'Feature02','Feature03']]
regressor = DecisionTreeRegressor(max_depth=2)
regressor.fit(X_variable, y_variable)
dtreeviz(regressor,
X_variable,
y_variable,
target_name='Target01',
feature_names=['Feature01', 'Feature02','Feature03']#df.columns[0:3]
)
I trained a Sklearn CART model (using default parameters provided by Scikit-learn) on the attached data (data.txt, where the column "Target" contains the labels to predict, the rest of the columns containing the values associated with features).
For model fitting, I used :
from sklearn.tree import DecisionTreeClassifier
from sklearn.utils.class_weight import compute_sample_weight
clf = DecisionTreeClassifier(random_state = 374564)
clf.fit(data.drop(['Target'], axis = 1).values, data['Target'].values, sample_weight = compute_sample_weight("balanced", data['Target'].values))
I launched dtreeviz with the following command :
viz = dtreeviz(clf, data.drop(['Target'], axis = 1).values, data['Target'].values, target_name='pred', feature_names=feature_names, class_names=["Class1", "Class2"])
Then I got the following error message :
Traceback (most recent call last):
File "", line 1, in
File "/home/bomane/anaconda3/lib/python3.7/site-packages/dtreeviz/trees.py", line 373, in dtreeviz
filename=f"{tmp}/leaf{node.id}_{getpid()}.svg")
File "/home/bomane/anaconda3/lib/python3.7/site-packages/dtreeviz/trees.py", line 543, in class_leaf_viz
draw_piechart(node.class_counts(), size=size, colors=colors, filename=filename, label=f"n={node.nsamples()}")
File "/home/bomane/anaconda3/lib/python3.7/site-packages/dtreeviz/trees.py", line 728, in draw_piechart
i = np.nonzero(counts)[0][0]
IndexError: index 0 is out of bounds for axis 0 with size 0
Can you please tell me what could be the problem here ?
Plaoform: Mac OS Mojave
Using Anaconda
Python -v: 3.7.1
Using brew install graphviz --with-librsvg --with-pango
to install graphviz
if .dot files contain digraph G { A -> B }
, dot -Tsvg:cairo test.dot > test.svg
can produce plot successfully.
In jupyter Lab,it display this:
In install the graphviz, it tells me that icu4c is keg-only, which means it was not symlinked into /usr/local
, however it dot works and terminal tells me graphviz 2.40.1 is already installed and up-to-date
I'm new to this project but have previously used SHAP (https://github.com/slundberg/shap), so any documentation comparing/contrasting this toolset with SHAP would be interesting.
My quick take is that dtreeviz is primarily about visualization, whereas SHAP is primarily about numerical estimates of feature importance (plus visualizing these estimates). Is there any overlap? To the extent that they don't overlap, a demo that shows how to use dtreeviz and SHAP in complementary ways to more-fully explain a model could be useful.
Hello there,
I am trying to plot iris dataset in Ubuntu 18.04 bionic, with graphviz
package installed. However I am getting svg:cairo
warnings and there only arrows with empty boxes. The warnings that I am getting:
Warning: No loadimage plugin for "svg:cairo"
Warning: No loadimage plugin for "svg:cairo"
Warning: No loadimage plugin for "svg:cairo"
Warning: No loadimage plugin for "svg:cairo"
Warning: No loadimage plugin for "svg:cairo"
Warning: No loadimage plugin for "svg:cairo"
Warning: No loadimage plugin for "svg:cairo"
Warning: No loadimage plugin for "svg:cairo"
Can I set fontname(if yes, how can I set) or let matplotlib.rcParams to set fontname?
Lines 473 to 474 in 0905383
while running the code:
viz = dtreeviz(clf,
X_train,
y_train,
target_name='target_name',
feature_names=feature_names,
orientation=orientation,
class_names=["setosa",
"versicolor",
"virginica"], # 0,1,2 targets
fancy=fancy,
X=X,
label_fontsize=label_fontsize,
ticks_fontsize=ticks_fontsize,
fontname=fontname)
getting error as below:
TypeError Traceback (most recent call last)
<ipython-input-23-95e7c1fb5cfe> in <module>()
3 #regr = regr.fit(X_train, y_train)
4
----> 5 viz = viz_tree(clf,X_train,y_train)
6 viz.save("GermanCredit.svg") # suffix determines the generated image format
7 viz.view() # pop up window to display image
<ipython-input-22-c33325585be5> in viz_tree(clf, X_train, y_train, orientation, fancy, pickX, label_fontsize, ticks_fontsize, fontname)
29 label_fontsize=label_fontsize,
30 ticks_fontsize=ticks_fontsize,
---> 31 fontname=fontname)
32
33 return viz
TypeError: 'module' object is not callable
Does this library have support for loading trees saved in PMML files? If not, would there be any interest in supporting it?
Tools like IBM's SPSS can export their decision trees in PMML XML. To have this library support it would potentially mean that enterprise customers no longer need to purchase SPSS in order to view their tree. It would also allow for easier distribution of a tree once created.
OS: Windows 7 Enterprise
Python: 3.6.5
Running on Jupyter Notebook
I am able to recreate the tree visualization for the iris data set as given in README.md
However, when I run on my own data set I find the peculiarity that one of the leaves ("Fault B") does not seem to be proportional to the number of sample (35) as the others are
This is an unwieldy data set and I tried to reproduce the issue after applying PCA to reduce the dimensionality, but the resulting decision tree looks good (i.e. the leaves appear appropriately sized)
Very curious!
Hi,
I'm following the instructions on your github page to get dtreeviz working for a school project but it doesn't seem to be updating dot properly. I've ran your instructions multiple times. Also, found this thread and tried everything in there as well:
I've followed all the instructions to install on github page:
(including uninstalling python-graphviz and graphviz and brew uninstall packages and downloading their package and doing a make install)
https://github.com/parrt/dtreeviz
I'm on a MacBook Pro 13" 2018 Mojave (10.14.4)
Was wondering if you might be able to help?
brew upgrade pango librsvg
Error: pango 1.42.4_1 already installed
Error: librsvg 2.44.13 already installed
My system is still finding dot from /usr/local/bin/dot
It also doesn't know anything about a ../Cellar/graphviz/2.40.1/bin/dot
directory
I also saw bunch of warnings when I did the make but didn't find any errors:
✔ /tmp/graphviz-2.40.1
20:34 $ which dot
/usr/local/bin/dot
✔ /tmp/graphviz-2.40.1
20:34 $ dot -Tsvg:cairo
Format: "svg:cairo" not recognized. Use one of: svg:svg:core
✘-1 /tmp/graphviz-2.40.1
20:35 $ lw -l
-bash: lw: command not found
✘-127 /tmp/graphviz-2.40.1
20:35 $ ls -l ../Cellar/graphviz/2.40.1/bin/dot
ls: ../Cellar/graphviz/2.40.1/bin/dot: No such file or directory
✘-1 /tmp/graphviz-2.40.1
here is a history of my commands for the install:
589 brew uninstall graphviz
590 brew reinstall pango librsvg --build-from-source # even if already there, please reinstall
591 brew reinstall cairo --build-from-source
592 brew install graphviz --build-from-source
593 brew info graphviz
594 #./configure --includedir=/usr/local/include/graphviz --with-pangocairo=yes --with-rsvg=yes
595 history
596 make uninstall
597 ./configure --includedir=/usr/local/include/graphviz --with-pangocairo=yes --with-rsvg=yes
598 rm -rf /usr/local/lib/graphviz
599 ./configure --includedir=/usr/local/include/graphviz --with-pangocairo=yes --with-rsvg=yes | tee make.log 2>&1
600 #make -j 8 | tee
601 mv make.log configure.log
602 make -j 8 | tee make.log 2>&1
603 make install | tee install.log 2>&1
604 history
Hi! This is a very cool project and useful for me.
But I have an issue about the style of histograms.
If the data features have a wide value range, the plotted histogram will be unbalanced like this:
I think if we can specify the range of the histogram, it will be useful to solve this issue.
Do you have an idea about this?
I will check dtreeviz/trees.py
, I'm not sure whether I can implement the function.
First off thank you for creating the dependency it looks incredible to use.
But sadly I cant seem to install graphiz. Following the instructions for issue #23 I am confused with what I am supposed to do:
1.) I have xcode-select installed
2.) I ran "sudo xcodebuild -license" from the command-line (I dont understand why...just to confirm I got it I guess.)
3.) I run "brew uninstall graphviz"
4.) then finally run "brew install graphviz --with-librsvg --with-pango" in command-line
Then I get the following error
Error: invalid option: --with-librsvg
Thank you in advance for any help.
Apparently dtreeviz
in dtreeviz.trees.py
fails on the following block of code:
> /home/macermak/.local/lib/python3.6/site-packages/dtreeviz/trees.py(320)dtreeviz()
318
319 n_classes = shadow_tree.nclasses()
--> 320 color_values = color_blind_friendly_colors[n_classes]
321
322 # Fix the mapping from target value to color for entire tree
resulting in
IndexError: list index out of range
My model has 15
classes and I believe the length of color_blind_friendly_colors
(which is 11
) is causing the issue.
I wonder if a split plot per leaf would look ok; seems like it'd be useful, @tlapusan. Here is a sample split plot. Maybe one y range / axis on left and then tight grouping of split plots?
Your project of visualization-interpretation is very interesting and meaningful. I had a problem when I output the fig for visualization. The data/diagrams in the output fig was empty with only a framework. I tried to find the problem but I failed. Have any others had this problem? could you please help me solve this problem? Thank you!
To install I ran:
pip install dtreeviz on anaconda prompt (Not as admin).
Then I downloaded and installed graphviz-2.38.msi from here: https://graphviz.gitlab.io/_pages/Download/Download_windows.html
When running:
(base) C:\Users\Will>where dot
I get:
C:\Program Files (x86)\Graphviz2.38\bin\dot.exe
When running:
(base) C:\Users\Will>dot -V
I get:
dot - graphviz version 2.38.0 (20140413.2041)
When running:
import os
import subprocess
proc = subprocess.Popen(['dot','-V'])
print( os.getenv('Path') )
The output contains the path:
C:\Program Files (x86)\Graphviz2.38\bin\
When running:
import graphviz.backend as be
cmd = ["dot", "-V"]
stdout, stderr = be.run(cmd, capture_output=True, check=True, quiet=False)
print( stderr )
Output:
dot - graphviz version 2.38.0 (20140413.2041)
b'dot - graphviz version 2.38.0 (20140413.2041)\r\n'
The problem is I get an error when running the first boston regression example:
from sklearn.datasets import *
from sklearn import tree
from dtreeviz.trees import *
regr = tree.DecisionTreeRegressor(max_depth=2)
boston = load_boston()
regr.fit(boston.data, boston.target)
viz = dtreeviz(regr,
boston.data,
boston.target,
target_name='price',
feature_names=boston.feature_names)
viz.view()
Throws this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-216c2df6378b> in <module>()
9 feature_names=boston.feature_names)
10
---> 11 viz.view()
C:\Users\Will\Anaconda3\Lib\site-packages\dtreeviz\trees.py in view(self)
64 tmp = tempfile.gettempdir()
65 svgfilename = f"{tmp}/DTreeViz_{getpid()}.svg"
---> 66 self.save(svgfilename)
67 view(svgfilename)
68
C:\Users\Will\Anaconda3\Lib\site-packages\dtreeviz\trees.py in save(self, filename)
78
79 g = graphviz.Source(self.dot, format='svg')
---> 80 dotfilename = g.save(directory=path.parent, filename=path.stem)
81
82 if PLATFORM=='darwin':
C:\Users\Will\Anaconda3\Lib\site-packages\graphviz\files.py in save(self, filename, directory)
145 self.directory = directory
146
--> 147 filepath = self.filepath
148 tools.mkdirs(filepath)
149
C:\Users\Will\Anaconda3\Lib\site-packages\graphviz\files.py in filepath(self)
129 @property
130 def filepath(self):
--> 131 return os.path.join(self.directory, self.filename)
132
133 def save(self, filename=None, directory=None):
C:\Users\Will\Anaconda3\lib\ntpath.py in join(path, *paths)
74 # Join two (or more) paths.
75 def join(path, *paths):
---> 76 path = os.fspath(path)
77 if isinstance(path, bytes):
78 sep = b'\\'
TypeError: expected str, bytes or os.PathLike object, not WindowsPath
It looks like it still doesn't like the fact a path variable is being passed in as a WindowsPath rather than a string?
I have been trying to use dtreeviz to visualise iris data using my Mac.
#Iris data was loaded in an earlier part of the code
from sklearn.datasets import *
from sklearn import tree
from dtreeviz.trees import *
classifier = tree.DecisionTreeClassifier(max_depth=2) # limit depth of tree
classifier.fit(iris.data, iris.target)
viz = dtreeviz(classifier,
iris.data,
iris.target,
target_name='variety',
feature_names=iris.feature_names,
class_names=["setosa", "versicolor", "virginica"] # need class_names for classifier
)
viz.view()
Despite installing graphviz as:
brew install graphviz --with-librsvg --with-app --with-pango
I get the following output.
Solutions proposed in #4 did not work for me either. On downloading the zipped file in the thread and using dot -Tsvg t.dot > t.svg
I could view the combined file. However, this did not work for me after deleting the contents of the folder III from the current directory.
Also, I noticed that the individual plots of only some of the nodes get stored in the location: "/private/var/folders/gs/r3f6yn8n4zj570qsj4s9rvnm0000gp/T/DTreeViz_19418" This included leaf1, leaf3, leaf4, node0, node2, legend and two other files named DTreeViz.
I am trying to visualize a decision tree for one of the random forest regression models I have built and I have noticed that the svg file size is around 1.3 gb. I have reduced the depth of the model and changed the visualization to non-fancy (fancy = false) but it still generates a 250 mb+ svg file because of which rendering fails. These huge file sizes are because of the histograms being generated at the leaf node. Is there any way to have a simple output with following fields:
I have gone through the code and noticed that I need to change the function regr_leaf_node in order to suppress the histogram. Please let me know what would be the next steps.
Thanks,
Vinay
It would be great to have the sizes of plots configurable on all platforms even as SVGs. Same of the plots come out quite tiny and hard to read by default.
Hi, thank you for your library, I think it's very useful to show the results, specially I like treeviz_bivar_3D , I was trying to compile a code with 2 variables and 1 binary output, but I obtain this error, I'm not sure if it was because the data type:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-217-d956e21d7290> in <module>
47 dist=12,
48
---> 49 show={'splits','title'})
50
51
~/anaconda3/lib/python3.7/site-packages/dtreeviz/trees.py in rtreeviz_bivar_3D(ax, X_train, y_train, max_depth, feature_names, target_name, fontsize, ticks_fontsize, fontname, azim, elev, dist, show, colors, n_colors_in_map)
234
235 rt = tree.DecisionTreeRegressor(max_depth=max_depth)
--> 236 rt.fit(X_train, y_train)
237
238 y_lim = np.min(y_train), np.max(y_train)
~/anaconda3/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
1155 sample_weight=sample_weight,
1156 check_input=check_input,
-> 1157 X_idx_sorted=X_idx_sorted)
1158 return self
1159
~/anaconda3/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
248 if len(y) != n_samples:
249 raise ValueError("Number of labels=%d does not match "
--> 250 "number of samples=%d" % (len(y), n_samples))
251 if not 0 <= self.min_weight_fraction_leaf <= 0.5:
252 raise ValueError("min_weight_fraction_leaf must in [0, 0.5]")
ValueError: Number of labels=1 does not match number of samples=21
This might be relevant to your interests possibly:
scikit-learn/scikit-learn#9251
btw, animl is impossible to google ;)
Hi!
Could you make the colors configurable? I like the idea of being colorblind-friendly; unfortunately on some projectors the palette colors are indistinguishable.
Thanks and cheers
I am having an issue with visualizing decision trees with dtreeviz
. I am running this on Ubuntu 18.04 using an anaconda environment using Python 3.6.0 and Jupyter Notebook.
I used the following example with some changes.
regr = tree.DecisionTreeRegressor(max_depth=2)
boston = load_boston()
regr.fit(boston.data, boston.target)
viz = dtreeviz(regr,
boston.data,
boston.target,
target_name='price',
feature_names=boston.feature_names)
viz.view()
I tried to use the same code on my nutrient database.
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import export_graphviz
dtree = DecisionTreeRegressor()
dtree.fit(tree_X, tree_y)
from dtreeviz.trees import dtreeviz
viz = dtreeviz(dtree,
tree_X,
tree_y,
target_name='Iron_(mg)',
feature_names=['Fiber_TD_(g)', 'Carbohydrt_(g)'])
viz.view()
It returned the following error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-82-60a9d000cb22> in <module>
5 tree_y,
6 target_name='Iron',
----> 7 feature_names=np.array(['Fiber_TD_(g)', 'Carbohydrt_(g)'], dtype='<U17'))
8
9 viz.view()
~/anaconda3/envs/playground/lib/python3.6/site-packages/dtreeviz/trees.py in dtreeviz(tree_model, X_train, y_train, feature_names, target_name, class_names, precision, orientation, show_root_edge_labels, show_node_labels, fancy, histtype, highlight_path, X, max_X_features_LR, max_X_features_TD)
672
673 shadow_tree = ShadowDecTree(tree_model, X_train, y_train,
--> 674 feature_names=feature_names, class_names=class_names)
675
676 if X is not None:
~/anaconda3/envs/playground/lib/python3.6/site-packages/dtreeviz/shadow.py in __init__(self, tree_model, X_train, y_train, feature_names, class_names)
68 self.unique_target_values = np.unique(y_train)
69 self.node_to_samples = ShadowDecTree.node_samples(tree_model, X_train)
---> 70 self.class_weights = compute_class_weight(tree_model.class_weight, self.unique_target_values, self.y_train)
71
72 tree = tree_model.tree_
~/anaconda3/envs/playground/lib/python3.6/site-packages/sklearn/utils/class_weight.py in compute_class_weight(class_weight, classes, y)
39 from ..preprocessing import LabelEncoder
40
---> 41 if set(y) - set(classes):
42 raise ValueError("classes should include all valid labels that can "
43 "be in y")
TypeError: unhashable type: 'numpy.ndarray'
I thought it's because it wanted me to out in a numpy array type instead of a list. But after checking out dtreeviz
in my notebook, it required a list for the argument.
But I tried it anyway by converting the list to a numpy array.
viz = dtreeviz(dtree,
tree_X,
tree_y,
target_name='Iron_(mg)',
feature_names=np.array(['Fiber_TD_(g)', 'Carbohydrt_(g)']))
viz.view()
But I still got the same error.
Then I tried to match the datatype the same as the boston datastet features name.
from sklearn.datasets import load_boston
boston = load_boston()
boston.feature_names
# Output
array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT'],
dtype='<U7')
viz = dtreeviz(dtree,
tree_X,
tree_y,
target_name='Iron_(mg)',
feature_names=np.array(['Fiber_TD_(g)', 'Carbohydrt_(g)'], dtype='<U7'))
viz.view()
But I still got the same error every time. I'm not really sure what happened. I was wondering if there was something I was missing?
colors.ipynb
notebook are missingPerhaps a view of tree that showed only path from root to leaf for specific test vector would be useful.
What would I need to do to change the colors of the classes? Maybe provide a simple example (in a notebook).
Can we use Catboost, XGBoost, and Lightgbm?
Generally speaking, ML trained models can be used to both make predictions and/or to better understand our data.
Until now, we have created visualisations for histogram of classes in leaves, split plots for the regressor leaves, but we don't have a way to find out more about training samples reaching those leaves.
I think it will be useful to have a method to return the leaf training examples and maybe some general stats about them.
Bellow, I have attached a screenshot with get_node_samples() from my library. I guess it could be pretty easy integrated in dtreeviz, taking in consideration that there is already built-in functionality to take the samples from a leaf.
Hello!
I encountered UnicodeDecodeError under Windows10; Japanese version.
I think there is a need to add encoding='UTF-8' when opening a SVG file at line 51 of utils.py.
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-20-f25891910720> in <module>()
6 class_names=['Safe', 'Bad'])
7
----> 8 viz.view()
c:\users\takahiro_endo\venv64\lib\site-packages\dtreeviz\trees.py in view(self)
64 tmp = tempfile.gettempdir()
65 svgfilename = f"{tmp}/DTreeViz_{getpid()}.svg"
---> 66 self.save(svgfilename)
67 view(svgfilename)
68
c:\users\takahiro_endo\venv64\lib\site-packages\dtreeviz\trees.py in save(self, filename)
101 with open(filename, encoding='UTF-8') as f:
102 svg = f.read()
--> 103 svg = inline_svg_images(svg)
104 with open(filename, "w", encoding='UTF-8') as f:
105 f.write(svg)
c:\users\takahiro_endo\venv64\lib\site-packages\dtreeviz\utils.py in inline_svg_images(svg)
50 svgfilename = img.attrib["{http://www.w3.org/1999/xlink}href"]
51 with open(svgfilename) as f:
---> 52 imgsvg = f.read()
53 imgroot = ET.fromstring(imgsvg)
54 for k,v in img.attrib.items(): # copy IMAGE tag attributes to svg from image file
UnicodeDecodeError: 'cp932' codec can't decode byte 0x92 in position 1079: illegal multibyte sequence
Hey - interesting project. Just git cloned it to check out some things and realized it was taking a lot longer than expected, especially given I had already seen the number of files in the github UI. So after cloning:
du -h .
gives
48K ./.git/hooks
8.0K ./.git/info
0B ./.git/logs/refs/heads
0B ./.git/logs/refs/remotes/origin
0B ./.git/logs/refs/remotes
0B ./.git/logs/refs
0B ./.git/logs
4.0K ./.git/objects/info
339M ./.git/objects/pack
339M ./.git/objects
0B ./.git/refs/heads
4.0K ./.git/refs/remotes/origin
4.0K ./.git/refs/remotes
0B ./.git/refs/tags
4.0K ./.git/refs
340M ./.git
20K ./.idea
36K ./animl/viz
48K ./animl
5.5M ./notebooks
8.0K ./testing/bin
188K ./testing/data
64M ./testing/samples
66M ./testing
412M .
A lot of it looks like duplicates of these images in the history:
git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| sed -n 's/^blob //p' \
| sort --numeric-sort --key=2 \
| cut -c 1-12,41- \
| gnumfmt --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
| tail 20
(from https://stackoverflow.com/a/42544963)
gives
387fa734f114 4.9MiB testing/samples/fires-TD-4.svg
1eb65f5cc60b 4.9MiB testing/samples/fires-TD-4.svg
4b5db9e74df0 4.9MiB testing/samples/fires-TD-4.svg
d89306ae2202 5.5MiB notebooks/examples.ipynb
a114a526f20e 5.6MiB testing/playground.ipynb
1689888cef83 6.9MiB testing/samples/sweets-TD-2.svg
54a0623f219b 6.9MiB testing/samples/sweets-LR-2-X.svg
72b5602a4d49 7.0MiB testing/samples/sweets-TD-2.svg
4fbf5680d31f 7.0MiB testing/samples/sweets-LR-2-X.svg
32367bcf5ae5 7.1MiB testing/samples/sweets-TD-2.svg
74a48fb74d54 7.2MiB testing/samples/sweets-LR-2-X.svg
eb1fd3844434 9.3MiB testing/samples/sweets-LR-3.svg
12c0992eac36 9.4MiB testing/samples/sweets-TD-3-X.svg
090c453c6736 9.5MiB testing/samples/sweets-TD-3-X.svg
cef390169f37 9.5MiB testing/samples/sweets-LR-3.svg
ba34b922a1c3 9.6MiB testing/samples/sweets-TD-3-X.svg
0aeb5d2e2c35 9.9MiB testing/samples/sweets-LR-3.svg
c0e22ce1ad0b 12MiB testing/samples/sweets-TD-4.svg
92e68cee1463 12MiB testing/samples/sweets-TD-4.svg
5f83e5ce4ca6 12MiB testing/samples/sweets-TD-4.svg
Maybe one of these answers could help you out?
https://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-git-repository
Hi @parrt,
Following your suggestion via email (20 July 2019):
donuts do look cool; maybe create an issue in repo so I can look at this when I have time?
here's an extract from my email to you:
I’m sending you this email in case:
- You’re using Graphviz to create pie charts (I gather you might be using matplotlib for this—for these nodes—not Graphviz) and
- You’d prefer donuts
Here’s a code pen I created yesterday, “Donut charts in d3-graphviz”, which overlays filled circles on the pies to make them like donuts.
Some background on this technique/hack, in a d3-graphviz GitHub issue: “image node attribute that refers to dynamically generated SVG (data URL)?”.
My previous question was not clear I guess, but my question should be how to set the size of each graph generated by the library?
Potentially related to #19
I installed dot using your command brew install graphviz --with-librsvg --with-app --with-pango
and also have done pip install graphviz
. I am having trouble with the viz.view()
command when trying to run your example code. I am getting the following error:
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-21-4930bdb0a4e7> in <module>()
12 X=X) # need to give single observation for prediction
13
---> 14 viz.view()
/anaconda3/lib/python3.6/site-packages/dtreeviz/trees.py in view(self)
66 tmp = tempfile.gettempdir()
67 svgfilename = f"{tmp}/DTreeViz_{getpid()}.svg"
---> 68 self.save(svgfilename)
69 view(svgfilename)
70
/anaconda3/lib/python3.6/site-packages/dtreeviz/trees.py in save(self, filename)
89 cmd = ["dot", f"-T{format}:cairo", "-o", filename, dotfilename]
90 # print(' '.join(cmd))
---> 91 stdout, stderr = run(cmd, capture_output=True, check=True, quiet=False)
92
93 else:
/anaconda3/lib/python3.6/site-packages/graphviz/backend.py in run(cmd, input, capture_output, check, quiet, **kwargs)
157 stderr_write_bytes(err, flush=True)
158 if check and proc.returncode:
--> 159 raise CalledProcessError(proc.returncode, cmd, output=out, stderr=err)
160
161 return out, err
CalledProcessError: Command '['dot', '-Tsvg:cairo', '-o', '/var/folders/13/x7py_p6115gdj9y7_bybd8xc4gmyvs/T/DTreeViz_84409.svg', '/var/folders/13/x7py_p6115gdj9y7_bybd8xc4gmyvs/T/DTreeViz_84409']' returned non-zero exit status 1.
I am running Python 3.6.5::Anaconda in a Jupyter Notebook. I also tried this in Jupyter Lab and got the same error. Thanks for your help!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.