Giter Club home page Giter Club logo

Comments (24)

parrt avatar parrt commented on May 14, 2024 1

Ok, i think this was because compute_class_weight() was called even for regressors, but it only makes sense for classifiers. ALSO, it sounds like maybe your tree_y was not a simple 1D array but maybe 2D.

from dtreeviz.

parrt avatar parrt commented on May 14, 2024

it wants a list not an array for sure, but not sure why your list doesn't work. what type is boston.feature_names?

from dtreeviz.

grilhami avatar grilhami commented on May 14, 2024

@parrt When I print it on Jupyter notebook, the output looks something like this

array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT'],
      dtype='<U7')

It has a dtype of <U7. I tried to math my array as the same data type, but it still spits out the same error.

from dtreeviz.

parrt avatar parrt commented on May 14, 2024

@grilhami Hmm...it can't be the features as this works:

>>> import numpy as np
>>> np.array(['a','b','c'])
array(['a', 'b', 'c'], dtype='<U1')
>>> x = np.array(['a','b','c'])
>>> set(x)
{'a', 'c', 'b'}
>>> 

It must be the class weights passed to the tree model. can you print regr.class_weight for me?

from dtreeviz.

JSAN730 avatar JSAN730 commented on May 14, 2024

I'm getting the same issue... has this been fixed?

from dtreeviz.

JSAN730 avatar JSAN730 commented on May 14, 2024

regr.class_weight is none for me

from dtreeviz.

parrt avatar parrt commented on May 14, 2024

@JSAN730 are you using 0.3.3 dtreeviz?

from dtreeviz.

JSAN730 avatar JSAN730 commented on May 14, 2024

Yup, I'm using 0.3.3 dtree viz

from dtreeviz.

parrt avatar parrt commented on May 14, 2024

Damn. ok, if you send me a small test.py and a sample dataset (or pull from sklearn), I'll track it down

from dtreeviz.

JSAN730 avatar JSAN730 commented on May 14, 2024

I was assuming that feature_names was just a list of the column names for the data with predictor variables. I'll send you the test.py and sample dataset. Below is my code:

features = list(vf_ulta_per_data)

viz = dtreeviz(regr,
vf_ulta_per_data,
vf_ulta_per_target,
target_name= 'VF % ULTA 2018', # this name will be displayed at the leaf node
feature_names=features
)
viz

from dtreeviz.

parrt avatar parrt commented on May 14, 2024

Hmm...yeah, that should work. let's figure this out. you can email me at [email protected]

from dtreeviz.

parrt avatar parrt commented on May 14, 2024

I'm getting a different error than was reported. Is this what you see @JSAN730 ?

ValueError                                Traceback (most recent call last)
<ipython-input-1-c47ac4c24103> in <module>
     27                vf_ulta_per_target,
     28                target_name= 'VF % ULTA 2018',  # this name will be displayed at the leaf node
---> 29                feature_names=features
     30               )
     31 viz

~/anaconda3/lib/python3.7/site-packages/dtreeviz-0.3.2-py3.7.egg/dtreeviz/trees.py in dtreeviz(tree_model, X_train, y_train, feature_names, target_name, class_names, precision, orientation, show_root_edge_labels, show_node_labels, fancy, histtype, highlight_path, X, max_X_features_LR, max_X_features_TD)
    681 
    682     shadow_tree = ShadowDecTree(tree_model, X_train, y_train,
--> 683                                 feature_names=feature_names, class_names=class_names)
    684 
    685     if X is not None:

~/anaconda3/lib/python3.7/site-packages/dtreeviz-0.3.2-py3.7.egg/dtreeviz/shadow.py in __init__(self, tree_model, X_train, y_train, feature_names, class_names)
     68         self.unique_target_values = np.unique(y_train)
     69         self.node_to_samples = ShadowDecTree.node_samples(tree_model, X_train)
---> 70         self.class_weights = compute_class_weight(tree_model.class_weight, self.unique_target_values, self.y_train)
     71 
     72         tree = tree_model.tree_

~/anaconda3/lib/python3.7/site-packages/sklearn/utils/class_weight.py in compute_class_weight(class_weight, classes, y)
     40 
     41     if set(y) - set(classes):
---> 42         raise ValueError("classes should include all valid labels that can "
     43                          "be in y")
     44     if class_weight is None or len(class_weight) == 0:

ValueError: classes should include all valid labels that can be in y

from dtreeviz.

JSAN730 avatar JSAN730 commented on May 14, 2024

Yes, I'm getting that same message.

ValueError: classes should include all valid labels that can be in y

from dtreeviz.

parrt avatar parrt commented on May 14, 2024

ah. well, isn't that supposed to be same?

Screen Shot 2019-04-07 at 9 26 09 AM

from dtreeviz.

JSAN730 avatar JSAN730 commented on May 14, 2024

how should I code it if the target is a continuous variable (like sales) and I'm using the other columns in the dataset to create a regression tree as predictor variables?

from dtreeviz.

parrt avatar parrt commented on May 14, 2024

if continuous, then it's regression so you can't use class_weights. Maybe you want sample_weights?

from dtreeviz.

JSAN730 avatar JSAN730 commented on May 14, 2024

That's correct, I'm trying to run a regression tree. I'm new to python, I'll try to find where I can replace class_weights to sample_weights. I thought the tree.DecisionTreeRegressor() took care of that.

from dtreeviz.

parrt avatar parrt commented on May 14, 2024

The term "class" is meant for classifiers so that arg is definitely not the one you want. :)

from dtreeviz.

lucas817 avatar lucas817 commented on May 14, 2024

I'm having the same issue (ValueError: classes should include all valid labels that can be in y). I don't fully understand the previous discussion: is this a bug that will be fixed, or is there a way to make the function not try to use class weights on a regression tree? Because as it is the only regression trees that can be visualised are the ones from built-in datasets like the boston one, which would be a shame.

from dtreeviz.

parrt avatar parrt commented on May 14, 2024

I wonder if you are mixing up regression versus classifier. Classifiers are the only ones that need to pass in the target class names. There are no such things for a regressor, which predicts a numeric value.

from dtreeviz.

lucas817 avatar lucas817 commented on May 14, 2024

I know what the difference between regression and classification is, I'm a statistician. My point is that the error appears every time, whether you pass a regression model or a classifier, it makes no difference. The only way I've gotten this to work is with built in datasets like iris and boston. Is this package only meant to be used with those or can it also be used with pandas dataframes (and if so, how)?

from dtreeviz.

parrt avatar parrt commented on May 14, 2024

oh! Sorry. I've used on all sorts of datasets. Can you send me a failing example? I'll track it down! [email protected] It must be a weird data type issue with the list of feature names.

oh, so if you pass nothing or None, you also get an error? weird

from dtreeviz.

JSAN730 avatar JSAN730 commented on May 14, 2024

@lucas817

for me, I changed the target dataset a numpy array and it worked
penetrationtarget = np.array(penetrationbeddf_c['vf_penetration'])

from dtreeviz.

parrt avatar parrt commented on May 14, 2024

oh! yeah, it has to be a dataframe series or a numpy array I think. what type are you passing in? i can update.

from dtreeviz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.