Giter Club home page Giter Club logo

Comments (13)

roland-KA avatar roland-KA commented on May 25, 2024 1

It seems to me, that the renaming of the named parameter for feature names of wrap which @sylvaticus introduced with commit 86a003f is causing some confusion. In DecisionTree.jl the named parameter for this purpose is called featurenames.

In BetaML it got somehow feature_names and then with the above mentioned commit features_names (additional s). But the documentation for wrap still says featurenames and in the example above feature_names is used. I.e. the InfoNode created by wrap in the example has the list of names in an attribute called feature_names, but printnode is looking for an attribute called features_names.

So we have every possible combination and a bit of a chaos 😀.

My suggestion is to go back to featurenames, in order to be consistent with 'DecsionTrees.jl' (and the documentation).

from betaml.jl.

sylvaticus avatar sylvaticus commented on May 25, 2024 1

Sorry, I have missed the original comment notification. I'll gonna look on this tomorrow .....

from betaml.jl.

sylvaticus avatar sylvaticus commented on May 25, 2024 1

Thanks @ablaom for reporting and @roland-KA for the deep research of the cause of the issue. I followed your suggestion and just reset it to "featurenames". This should be in the newly released v0.9.6

from betaml.jl.

roland-KA avatar roland-KA commented on May 25, 2024 1

As this issue shows, it is quite easy to run into trouble when using wrap, I'm thinking about adding a parameter check to each wrap implementation that verifies that only the keywords featurenames and classnames are used. It could throw an ArgumentError, if something is wrong.

@ablaom , @sylvaticus What's your opinion about this?

from betaml.jl.

sylvaticus avatar sylvaticus commented on May 25, 2024 1

To extend the answer of @roland-KA , this works:

julia> using BetaML

julia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];

julia> y = 2 .* X[:,1] .- X[:,2] .+ 3;

julia> mod = DecisionTreeEstimator(max_depth=10)
DecisionTreeEstimator - A Decision Tree model (unfitted)

julia>= fit!(mod,X,y);

julia> hcat(y,ŷ)
6×2 Matrix{Float64}:
   4.1    3.4
 -16.5  -17.45
 -13.8  -13.8
 -18.4  -17.45
 -27.2  -27.2
   2.7    3.4

julia> println(mod)
DecisionTreeEstimator - A Decision Tree regressor (fitted on 6 records)
Dict{String, Any}("job_is_regression" => 1, "fitted_records" => 6, "max_reached_depth" => 4, "avg_depth" => 3.25, "xndims" => 2)
*** Printing Decision Tree: ***

1. BetaML.Trees.Question{Float64}(2, 18.0)
--> True :
                1.2. BetaML.Trees.Question{Float64}(2, 31.0)
                --> True :  -27.2
                --> False:
                        1.2.3. BetaML.Trees.Question{Float64}(2, 20.5)
                        --> True :  -17.450000000000003
                        --> False:  -13.8
--> False:  3.3999999999999995


julia> wmod = wrap(mod,featurenames=["dim1","dim2"])
A wrapped Decision Tree

julia> import AbstractTrees:print_tree

julia> print_tree(wmod)
dim2 >= 18.0?
├─ dim2 >= 31.0?
│  ├─ -27.2
│  │  
│  └─ dim2 >= 20.5?
│     ├─ -17.450000000000003
│     │  
│     └─ -13.8
│        
└─ 3.3999999999999995

(I modified the docstring to consider print_tree)

from betaml.jl.

ablaom avatar ablaom commented on May 25, 2024

@roland-KA Thanks for for looking into this and for the diagnosis.

from betaml.jl.

ablaom avatar ablaom commented on May 25, 2024

Sound like a good idea.

from betaml.jl.

ablaom avatar ablaom commented on May 25, 2024

Mmm. I'm still pretty confused. Now I don't get any nice print out at all, just this:

julia> wrapped_tree = Trees.wrap(raw_tree, (featurenames=DF.names(X),))
A wrapped Decision Tree

Same if I use feature_names.

from betaml.jl.

sylvaticus avatar sylvaticus commented on May 25, 2024

I understood that the wrap function was intender for plotting only, not for printing.
The decision tree is already plot in full (but without feature names) when the DecisionTreeEstimator is explicitly printed, but I may have misunderstood the needs. If there is a need to get the tree printed other than plotted, perhaps at this time it is better if I add another parameter featurenames directly in the estimator constructor... what do you think ?

from betaml.jl.

roland-KA avatar roland-KA commented on May 25, 2024

@sylvaticus you are right, the wrap-function was intended for plotting only. But the plot recipe uses also AbstractTrees.printnode (which is implemented together which each wrap-version). And AbstractTrees.print_tree-function is based on printnode. So it is also possible to print a text-based version of the tree using print_tree.

from betaml.jl.

roland-KA avatar roland-KA commented on May 25, 2024

Mmm. I'm still pretty confused. Now I don't get any nice print out at all, just this:

julia> wrapped_tree = Trees.wrap(raw_tree, (featurenames=DF.names(X),))
A wrapped Decision Tree

Same if I use feature_names.

@ablaom How did you print the text-based version? Using AbstractTrees.print_tree?

show doesn't use the wrap-logic; so just printing the wrapped_tree won't show the feature names.

from betaml.jl.

ablaom avatar ablaom commented on May 25, 2024

@sylvaticus @roland-KA Thanks for the detailed explanations. I must have been sloppy with my first post and dropped the print_tree. I apologise for not checking this more carefully - very bad form.

from betaml.jl.

roland-KA avatar roland-KA commented on May 25, 2024

No problem, we are here to clarify and explain things 🤓

from betaml.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.