Comments (4)
Hi, thanks for all the details.
It might be that, given the data distribution, the pre-binning algorithm (CART) considers that the best split to maximize Gini/IV is 2.5. For binary target, to reduce the presence of dominating bins (bin 0) you can try different values of the parameter gamma
: http://gnpalencia.org/optbinning/tutorials/tutorial_binary.html#Reduction-of-dominating-bins. Other approaches that you might try are:
- Pass your splits using option
user_splits
. - Use a smaller
min_prebin_size
and a largermax_n_prebins
. Then, setmin_bin_size=0.05
andmax_bin_size=0.4
.
from optbinning.
Hi,
Setting a small max_bin_size might produce no bins if prebins are very heterogeneous. In addition, the parameter min_prebin_size (default value = 0.05) must be <= max_bin_size (default value is None). Could you please provide a reproducible example?
Thanks
from optbinning.
Hi @guillermo-navas-palencia ,I have column with distribution like this
Value | Raito |
---|---|
2 | 0.266009 |
3 | 0.191488 |
1 | 0.175111 |
4 | 0.165533 |
5 | 0.0925422 |
6 | 0.044826 |
7 | 0.0193812 |
8 | 0.00901054 |
9 | 0.00504364 |
10 | 0.00187011 |
11 | 0.00136008 |
12 | 0.000850051 |
13 | 0.000453361 |
14 | 0.00028335 |
When a use OptimalBinning with default parameter, then binning table I get is
Bin | Count | Count (%) | Non-event | Event | Event rate | WoE | IV | JS | |
---|---|---|---|---|---|---|---|---|---|
0 | [-inf, 2.50) | 7784 | 0.44112 | 7672 | 112 | 0.0143885 | -0.015700910587103323 | 0.000109578 | 1.36971e-05 |
1 | [2.50, 3.50) | 3379 | 0.191488 | 3331 | 48 | 0.0142054 | -0.0027078285694939197 | 1.4059e-06 | 1.75738e-07 |
2 | [3.50, 4.50) | 2921 | 0.165533 | 2892 | 29 | 0.00992811 | 0.359873097436898 | 0.0180819 | 0.00224811 |
3 | [4.50, 5.50) | 1633 | 0.0925422 | 1610 | 23 | 0.0140845 | 0.005960586194076356 | 3.27839e-06 | 4.09798e-07 |
4 | [5.50, inf) | 1466 | 0.0830783 | 1445 | 21 | 0.0143247 | -0.011192493032173623 | 1.04642e-05 | 1.30801e-06 |
5 | Special | 0 | 0 | 0 | 0 | 0 | 0.0 | 0 | 0 |
6 | Missing | 463 | 0.0262382 | 446 | 17 | 0.0367171 | -0.9754290478914349 | 0.041321 | 0.00496963 |
Totals | 17646 | 1 | 17396 | 250 | 0.0141675 | 0.0595276 | 0.00723334 |
I feel that bin 0 contain too much in comparison with others, so I set max_bin_size to 0.4 to hope that algorithm will split bin 0 to smaller bins. But the result is not like what I expect:
Bin | Count | Count (%) | Non-event | Event | Event rate | WoE | IV | JS | |
---|---|---|---|---|---|---|---|---|---|
0 | [-inf, inf) | 17183 | 0.973762 | 16950 | 233 | 0.0135599 | 0.04445000338761318 | 0.00188299 | 0.000235354 |
1 | Special | 0 | 0 | 0 | 0 | 0 | 0.0 | 0 | 0 |
2 | Missing | 463 | 0.0262382 | 446 | 17 | 0.0367171 | -0.9754290478914349 | 0.041321 | 0.00496963 |
Totals | 17646 | 1 | 17396 | 250 | 0.0141675 | 0.043204 | 0.00520499 |
If I set max_bin_size =0.5, the result is similar with the first result.
Thanks.
from optbinning.
Thanks you very much @guillermo-navas-palencia for your suggestion. I decrease min_prebin_size
and increase max_n_prebins
then value max_bin_size
takes expect effect.
from optbinning.
Related Issues (20)
- Fast, greedy solver? HOT 2
- Shapely values on Scorecard object HOT 4
- Option to force continuous target type in BinningProcess HOT 1
- python setup.py install not completing HOT 2
- MulticlassBinningTable WoE HOT 1
- BinningProcess in Pipeline and cross-validation (GridSearchCV) HOT 4
- Extract p-value information from the BinningTable object HOT 1
- Negative values can lead to Scorecard failure HOT 2
- Error: Fixed user_splits are removed because produce pure prebins HOT 2
- Keep pandas.DataFrame index in `_transform` method HOT 3
- Get IV of each feature after applying MulticlassOptimalBinning in multiclass dataset (>2 labels) HOT 6
- Transform calculates WoE=0 for special_codes HOT 2
- WoE monotonicity prediction with ML HOT 1
- For different special values ,how to assign different metric_special value? HOT 1
- Count % on WOE plot HOT 2
- plot savefig issue HOT 1
- type_of_target are not proper to decide multiclass vs continuous HOT 2
- Randomness in the binning : Getting Different Bins each time HOT 7
- Question : How does OptBinning handle variables/features of DateTime Datatype ? HOT 1
- quantile method fails for integer valued X HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from optbinning.