Comments (4)
OMG, I am so STUPID. This problem is caused by user not setting the parameter non_negative=TRUE. For monotonic spline, this parameter must set. I think it is best that I put this as an error if a user did not set non_netative =TRUE when bs=2. After setting non_negative=TRUE in the customer code, I got the following:
from h2o-3.
In addition, I did a gridsearch using the users knot_ids with this code:
plotPredOrig <- function() {
h2o.removeAll()
h2o_data <- h2o.importFile("/Users/wendycwong/h2o-3/smalldata/gam_test/gh_16125_gam_monotone.csv")
build the GAM model
create frame knots
knots1 <- c(0, 15000,30000, 50000,70000,90000,110000,130000,150000,170000,190000,200000)
frame_Knots1 <- as.h2o(knots1)
specify the knots array
numKnots <- c(length(knots1))
trainR <- as.data.frame(h2o_data)
hyper_parameters <- list()
hyper_parameters$scale <- c(0, 0.01, 0.1, 1)
hyper_parameters$spline_orders <- c(3, 4, 5, 6)
hyper_parameters$gam_columns <- c("sum_insured")
hyper_parameters$alpha <- c(0.1, 0.5, 0.9)
hyper_parameters$lambda<- c(0, 0.01, 0.1, 1, 5)
gam_grid <- h2o.grid("gam", grid_id = "gam_grid_id", x="sum_insured", y="y", bs=c(2), family="poisson", link="Log", training_frame=h2o_data,
seed=12345, non_negative=TRUE, knot_ids=c(h2o.keyof(frame_Knots1)), splines_non_negative = c(TRUE),
hyper_params=hyper_parameters)
gridModel <- h2o.getGrid("gam_grid_id")
browser()
modelIDs <- gam_grid@model_ids
for (ind in modelIDs) {
print(ind)
bestModel <- h2o.getModel(ind)
predBest <- h2o.predict(bestModel, h2o_data)
predBestR <- as.data.frame(h2o.predict(bestModel, h2o_data))
plot(trainR$sum_insured, trainR$y)
lines(trainR$sum_insured, predBestR$predict, type="l", lwd=5, col=3)
browser()
}
}
I found the following settings give me good results:
a. alpha = 0.9, lambda=0.1, spline_orders = 4, scale = 0, bs=2
b. alpha = 0.9, lambda=0.1, spline_orders = 5, scale = 0, bs=2
I also did a gridsearch without using his knots:
plotPred <- function() {
h2o.removeAll()
h2o_data <- h2o.importFile("/Users/wendycwong/h2o-3/smalldata/gam_test/gh_16125_gam_monotone.csv")
build the GAM model
create frame knots
knots1 <- c(0, 15000,30000, 50000,70000,90000,110000,130000,150000,170000,190000,200000)
frame_Knots1 <- as.h2o(knots1)
specify the knots array
numKnots <- c(length(knots1))
trainR <- as.data.frame(h2o_data)
hyper_parameters <- list()
hyper_parameters$scale <- c(0, 0.01, 0.1, 1)
hyper_parameters$num_knots <- c(12, 50, 100, 150)
hyper_parameters$spline_orders <- c(3,4,5,6)
hyper_parameters$lambda<- c(0.01, 0.1, 1, 5)
hyper_parameters$gam_columns <- c("sum_insured")
gam_grid <- h2o.grid("gam", grid_id = "gam_grid_id", x="sum_insured", y="y", bs=c(2), family="poisson", link="Log", training_frame=h2o_data,
seed=12345, non_negative=TRUE, splines_non_negative = c(TRUE),
hyper_params=hyper_parameters)
gridModel <- h2o.getGrid("gam_grid_id")
browser()
modelIDs <- gam_grid@model_ids
for (ind in modelIDs) {
print(ind)
bestModel <- h2o.getModel(ind)
predBest <- h2o.predict(bestModel, h2o_data)
predBestR <- as.data.frame(h2o.predict(bestModel, h2o_data))
plot(trainR$sum_insured, trainR$y)
lines(trainR$sum_insured, predBestR$predict, type="l", lwd=5, col=3)
browser()
}
}
Again, the following settings give good results:
a. alpha = 0.5, lambda = 0.01, num_knots=12, spline_orders = 3, scale=0, bs=2
b. alpha = 0.5, lambda = 0.01, num_knots=12, spline_orders = 4, scale=0, bs=2
c. alpha = 0.5, lambda = 0.1, num_knots=12, spline_orders = 4, scale=0, bs=2
d. alpha = 0.5, lambda = 0.01, num_knots=12, spline_orders = 5, scale=0, bs=2
e. alpha = 0.5, lambda = 0.01, num_knots=12, spline_orders = 6, scale=0, bs=2
from h2o-3.
Here is a plot of the best gridsearch models: setup d: alpha = 0.5, lambda = 0.01, num_knots=12, spline_orders = 5, scale=0, bs=2. This is from the second gridsearch without using the user specified knot_ids.
from h2o-3.
The gridsearch did produce a better model.
from h2o-3.
Related Issues (20)
- Appendix g/h/i: updating user guide page to adhere to style guide (gainslift_bins, gradient_epsilon, HGLM, histogram_type, huber_alpha, ignore_const_col, ignored_columns, impute_missing, in_training_checkpoints_dir, in_training_checkpoints_tree_interval, include_algos, inflection_point, init (GLRM, K-Means), init (CoxPH), interaction_constraints, interaction_pairs, interactions, intercept) HOT 1
- Appendix k/l: updating user guide page to adhere to style guide (k, keep_cross_validation_fold_assignment, keep_cross_validation_models, keep_cross_validation_predictions, lambda, lambda_min_ratio, lambda_search, laplace, learn_rate, learn_rate_annealing, link, lre_min) HOT 1
- Appendix m: updating user guide page to adhere to style guide (max_abs_leafnode_pred, max_active_predictors, max_after_balance_size, max_depth, max_iterations, max_models, max_runtime_secs, max_runtime_secs_per_model, metalearner_algorithm, metalearner_params, metalearner_transform, min_prob, min_rows, min_sdev, min_split_improvement, missing_values_handling, model_id, monotone_constraints, mtries) HOT 1
- Appendix n/o/p: updating user guide page to adhere to style guide (nbins, nbins_cats, nbins_top_level, nfolds, nlambdas, noise, non_negative, ntrees, objective_epsilon, offset_column, out_of_bounds, pca_impl, pca_method, plug_values, pred_noise_bandwidth, prior) HOT 1
- Appendix q/r/s: updating user guide page to adhere to style guide (quantile_alpha, rand_family, random_columns, rate, rate_annealing, rate_decay, remove_collinear_columns, sample_rate, sample_rate_per_class, sample_size, score_each_iteration, score_tree_interval, seed, single_node_mode, smoothing, solver, sort_metric, standardize, start_column, stop_column, stopping_metric, stopping_rounds, stopping_tolerance, stratify_by) HOT 1
- Appendix t/u/v/w/x/y: updating user guide page to adhere to style guide (theta, ties, training_frame, transform, treatment_column, tweedie_link_power, tweedie_power, tweedie_variance_power, uplift_metric, upload_custom_distribution, upload_custom_metric, use_all_factor_levels, user_points, validation_frame, weights_column, x, y) HOT 1
- Implement UMAP
- Implement HDBSCAN
- Job request failed Local server has died unexpectedly. RIP., will retry after 3s
- Fix plotting in explain: FigureCanvasAgg is non-interactive, and thus cannot be shown plt.show()
- List tests that needed to be manually verified when changing plotting actions in Python for explain function HOT 1
- Fix as_data_frame and not use csv as a medium HOT 1
- Add use_multi_thread for as_data_frame
- Bug in ICE Plot with R 4.4
- Add support for Websockets to steam.jar
- R 4.4 warning `Did you mean to use "<<-"? ( in method "get_model" for class "models_info")`
- Upload H2O-3 3.46.0.3 to CRAN
- Bug in GBM python example
- 3.46.0.3 Release Notes
- Overview video for H2O-3 like DAI
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h2o-3.