Giter Club home page Giter Club logo

Comments (3)

alvanli avatar alvanli commented on May 26, 2024 1

thank you for the always very detailed explanation! I learned a lot reading these tickets 🤗

from interpret.

paulbkoch avatar paulbkoch commented on May 26, 2024

Hi @alvanli -- Thank you for this question because it's both interesting and something we haven't talked about or documented in detail yet. It could also be important in some circumstances.

EBMs are by default no longer a purely cyclic algorithm as described in the original GA2M papers. If you prefer to restore the cyclic nature of EBMs, you can do this easily by setting greediness or greedy_ratio to zero. The current default algorithm is a hybrid that sits somewhere in between fully greedy and fully cyclic. I've been calling it "semi-greedy" boosting.

The greediness parameter has been available for some time, but only recently became the default. I'll start chronologically by describing the original greediness parameter. That will be good background for what greedy_ratio is now.

In the original cyclic algorithm, the definition of a round was a complete sweep of all the features, so if you had 100 features, you'd make 100 boosting steps per round. During each boosting step we'd calculate the per-feature update, but we'd also get, essentially for free, the gain calculation prior to the update. The interesting thing is, since our learning rate is so low, we make small incremental steps and the gain calculation remains relatively valid after applying the boosting update. We can use this by putting the slightly stale gain calculations into a priority queue and then greedily picking which feature to boost next. Generally, assuming correlations aren't too bad, boosting one feature doesn't affect the gain of other features too much, but even so we'd prefer to periodically resweep all the features to refresh their gain calculations. We originally did this by intermixing cyclic rounds with greedy rounds. The cyclic rounds handle the issue of refreshing the gains, and the greedy rounds allow the algorithm to focus on the most important features.

Why was this algorithm change desirable? Well, the main issue with purely cyclic boosting was that it tended to overfit some features and underfit others. This new algorithm seems to be better at finding the right amount of boosting to do on each feature. It appears, although we need more time to digest this, that some of the jaggedness on EBMs graphs was due to late-stage overfitting of some features which would otherwise be smoother.

The original semi-greedy algorithm kept the number of boosting steps per rounds fixed. The greediness parameter defined how often to execute a greedy round vs a cyclic round. A good number used to be 0.5 which would alternate between greedy round and cyclic rounds. Sometimes you might want 0.66, which would do 1 cyclic round for every 2 greedy rounds.

In the 0.6.0 release we made 2 slight changes to the algorithm. It's clear that the cyclic rounds need to have as many boosting steps as the number of features, since every feature needs to be visited during the round. The greedy rounds however do not need to rigidly obey the same restriction. If there are 100 features, we could allow 90 greedy boosting steps for example. The greedy_ratio sets the ratio of the number of boosting steps in the greedy rounds vs the cyclic rounds. So, for example, the value 1.0 is exactly equivalent to the old greediness value of 0.5, since with a ratio of 1.0 you would get 100 cyclic steps followed by 100 greedy steps intermixed. The default greedy_ratio is now set to 1.5, which means that if there are 100 features there will be 150 greedy boosting steps between cyclic rounds.

The algorithm described above tends to work well on many datasets, but let's say for whatever reason you don't want to have any cyclic rounds at all. You can set greedy_ratio to some huge number that basically makes all boosting greedy, but then your algorithm is no longer periodically refreshing the gain calculations though cyclic rounds, which means that the gain calculations for features that aren't being greedily chosen during the greedy rounds are getting more and more stale. For boosting, it takes almost the same amount of work to calculate the gain as it does to calculate the update. To refresh the gain calculations we need to pay this cost, but instead of applying the update, we could just do the calculations and then throw away the update. This is what the cyclic_progress parameter is for. The easiest way to think about this parameter is as a boolean parameter that indicates whether during the cyclic rounds we apply the updates or not. If the updates are not applied and the greedy_ratio is set to a low value, then the algorithm becomes something a lot closer to what XGBoost does, but still retains the additive nature of EBMs.

Generally, the current defaults with the greedy_ratio set to 1.5 seems to do well on many datasets, and avoids the enormous computational cost that updating the gains on each boosting step would otherwise require.

from interpret.

paulbkoch avatar paulbkoch commented on May 26, 2024

Hi @alvanli -- On the other questions about max_leaves:

The max_leaves parameter is only honored when boosting mains currently. For pairs we make one cut in one dimension and then make cuts on each side of the primary cut for the other dimension. At some point we'll implement the obvious improvement that makes pairs cuts more flexible and allows for multiple cuts in each dimension.

FAST is currently even more restricted in that it chooses cuts in both dimensions that intersect in a cross. At some point we'll add options to allow for more complex tree growth, but the current FAST algorithm has the benefit of being FAST and easier to implement.

For mains, it appears the best value for max_leaves on many/most datasets is 3. 2 is pretty bad actually. 4 tends to be a bit worse than 3, but it's generally pretty close to 3, so I added it as a hyperparameter to get feedback and I could see it being perhaps better on some datasets. Incidentally, I think this property is one of the main advantages that EBMs have over XGBoost and similar algorithms since XGBoost tends to grow the trees until depth 6 which can cause overfitting on the leaf splits. The other benefit that EBMs have over XGBoost is that EBMs consider pair splits jointly, which would be cost prohibitive for unrestricted trees.

from interpret.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.