Giter Club home page Giter Club logo

Comments (4)

guillermo-navas-palencia avatar guillermo-navas-palencia commented on May 27, 2024 1

Hi, thanks for all the details.

It might be that, given the data distribution, the pre-binning algorithm (CART) considers that the best split to maximize Gini/IV is 2.5. For binary target, to reduce the presence of dominating bins (bin 0) you can try different values of the parameter gamma: http://gnpalencia.org/optbinning/tutorials/tutorial_binary.html#Reduction-of-dominating-bins. Other approaches that you might try are:

  • Pass your splits using option user_splits.
  • Use a smaller min_prebin_size and a larger max_n_prebins. Then, set min_bin_size=0.05 and max_bin_size=0.4.

from optbinning.

guillermo-navas-palencia avatar guillermo-navas-palencia commented on May 27, 2024

Hi,

Setting a small max_bin_size might produce no bins if prebins are very heterogeneous. In addition, the parameter min_prebin_size (default value = 0.05) must be <= max_bin_size (default value is None). Could you please provide a reproducible example?

Thanks

from optbinning.

nic9lif3 avatar nic9lif3 commented on May 27, 2024

Hi @guillermo-navas-palencia ,I have column with distribution like this

Value Raito
2 0.266009
3 0.191488
1 0.175111
4 0.165533
5 0.0925422
6 0.044826
7 0.0193812
8 0.00901054
9 0.00504364
10 0.00187011
11 0.00136008
12 0.000850051
13 0.000453361
14 0.00028335

When a use OptimalBinning with default parameter, then binning table I get is

Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 [-inf, 2.50) 7784 0.44112 7672 112 0.0143885 -0.015700910587103323 0.000109578 1.36971e-05
1 [2.50, 3.50) 3379 0.191488 3331 48 0.0142054 -0.0027078285694939197 1.4059e-06 1.75738e-07
2 [3.50, 4.50) 2921 0.165533 2892 29 0.00992811 0.359873097436898 0.0180819 0.00224811
3 [4.50, 5.50) 1633 0.0925422 1610 23 0.0140845 0.005960586194076356 3.27839e-06 4.09798e-07
4 [5.50, inf) 1466 0.0830783 1445 21 0.0143247 -0.011192493032173623 1.04642e-05 1.30801e-06
5 Special 0 0 0 0 0 0.0 0 0
6 Missing 463 0.0262382 446 17 0.0367171 -0.9754290478914349 0.041321 0.00496963
Totals 17646 1 17396 250 0.0141675 0.0595276 0.00723334

I feel that bin 0 contain too much in comparison with others, so I set max_bin_size to 0.4 to hope that algorithm will split bin 0 to smaller bins. But the result is not like what I expect:

Bin Count Count (%) Non-event Event Event rate WoE IV JS
0 [-inf, inf) 17183 0.973762 16950 233 0.0135599 0.04445000338761318 0.00188299 0.000235354
1 Special 0 0 0 0 0 0.0 0 0
2 Missing 463 0.0262382 446 17 0.0367171 -0.9754290478914349 0.041321 0.00496963
Totals 17646 1 17396 250 0.0141675 0.043204 0.00520499

If I set max_bin_size =0.5, the result is similar with the first result.

Thanks.

from optbinning.

nic9lif3 avatar nic9lif3 commented on May 27, 2024

Thanks you very much @guillermo-navas-palencia for your suggestion. I decrease min_prebin_size and increase max_n_prebins then value max_bin_size takes expect effect.

from optbinning.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.