Giter Club home page Giter Club logo

Comments (4)

wendycwong avatar wendycwong commented on June 1, 2024

OMG, I am so STUPID. This problem is caused by user not setting the parameter non_negative=TRUE. For monotonic spline, this parameter must set. I think it is best that I put this as an error if a user did not set non_netative =TRUE when bs=2. After setting non_negative=TRUE in the customer code, I got the following:

image

from h2o-3.

wendycwong avatar wendycwong commented on June 1, 2024

In addition, I did a gridsearch using the users knot_ids with this code:

plotPredOrig <- function() {
h2o.removeAll()
h2o_data <- h2o.importFile("/Users/wendycwong/h2o-3/smalldata/gam_test/gh_16125_gam_monotone.csv")

build the GAM model

create frame knots

knots1 <- c(0, 15000,30000, 50000,70000,90000,110000,130000,150000,170000,190000,200000)
frame_Knots1 <- as.h2o(knots1)

specify the knots array

numKnots <- c(length(knots1))
trainR <- as.data.frame(h2o_data)

hyper_parameters <- list()
hyper_parameters$scale <- c(0, 0.01, 0.1, 1)
hyper_parameters$spline_orders <- c(3, 4, 5, 6)
hyper_parameters$gam_columns <- c("sum_insured")
hyper_parameters$alpha <- c(0.1, 0.5, 0.9)
hyper_parameters$lambda<- c(0, 0.01, 0.1, 1, 5)

gam_grid <- h2o.grid("gam", grid_id = "gam_grid_id", x="sum_insured", y="y", bs=c(2), family="poisson", link="Log", training_frame=h2o_data,
seed=12345, non_negative=TRUE, knot_ids=c(h2o.keyof(frame_Knots1)), splines_non_negative = c(TRUE),
hyper_params=hyper_parameters)
gridModel <- h2o.getGrid("gam_grid_id")
browser()
modelIDs <- gam_grid@model_ids
for (ind in modelIDs) {
print(ind)
bestModel <- h2o.getModel(ind)
predBest <- h2o.predict(bestModel, h2o_data)
predBestR <- as.data.frame(h2o.predict(bestModel, h2o_data))
plot(trainR$sum_insured, trainR$y)
lines(trainR$sum_insured, predBestR$predict, type="l", lwd=5, col=3)
browser()
}
}

I found the following settings give me good results:
a. alpha = 0.9, lambda=0.1, spline_orders = 4, scale = 0, bs=2
b. alpha = 0.9, lambda=0.1, spline_orders = 5, scale = 0, bs=2

I also did a gridsearch without using his knots:

plotPred <- function() {
h2o.removeAll()
h2o_data <- h2o.importFile("/Users/wendycwong/h2o-3/smalldata/gam_test/gh_16125_gam_monotone.csv")

build the GAM model

create frame knots

knots1 <- c(0, 15000,30000, 50000,70000,90000,110000,130000,150000,170000,190000,200000)
frame_Knots1 <- as.h2o(knots1)

specify the knots array

numKnots <- c(length(knots1))
trainR <- as.data.frame(h2o_data)

hyper_parameters <- list()
hyper_parameters$scale <- c(0, 0.01, 0.1, 1)
hyper_parameters$num_knots <- c(12, 50, 100, 150)
hyper_parameters$spline_orders <- c(3,4,5,6)
hyper_parameters$lambda<- c(0.01, 0.1, 1, 5)
hyper_parameters$gam_columns <- c("sum_insured")

gam_grid <- h2o.grid("gam", grid_id = "gam_grid_id", x="sum_insured", y="y", bs=c(2), family="poisson", link="Log", training_frame=h2o_data,
seed=12345, non_negative=TRUE, splines_non_negative = c(TRUE),
hyper_params=hyper_parameters)
gridModel <- h2o.getGrid("gam_grid_id")
browser()
modelIDs <- gam_grid@model_ids
for (ind in modelIDs) {
print(ind)
bestModel <- h2o.getModel(ind)
predBest <- h2o.predict(bestModel, h2o_data)
predBestR <- as.data.frame(h2o.predict(bestModel, h2o_data))
plot(trainR$sum_insured, trainR$y)
lines(trainR$sum_insured, predBestR$predict, type="l", lwd=5, col=3)
browser()
}
}

Again, the following settings give good results:
a. alpha = 0.5, lambda = 0.01, num_knots=12, spline_orders = 3, scale=0, bs=2
b. alpha = 0.5, lambda = 0.01, num_knots=12, spline_orders = 4, scale=0, bs=2
c. alpha = 0.5, lambda = 0.1, num_knots=12, spline_orders = 4, scale=0, bs=2
d. alpha = 0.5, lambda = 0.01, num_knots=12, spline_orders = 5, scale=0, bs=2
e. alpha = 0.5, lambda = 0.01, num_knots=12, spline_orders = 6, scale=0, bs=2

from h2o-3.

wendycwong avatar wendycwong commented on June 1, 2024

Here is a plot of the best gridsearch models: setup d: alpha = 0.5, lambda = 0.01, num_knots=12, spline_orders = 5, scale=0, bs=2. This is from the second gridsearch without using the user specified knot_ids.

image

from h2o-3.

wendycwong avatar wendycwong commented on June 1, 2024

The gridsearch did produce a better model.

from h2o-3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.