Giter Club home page Giter Club logo

Comments (5)

mlandry22 avatar mlandry22 commented on September 22, 2024

A few ways I was playing around with to cut up Expected based on the prevalent values.

train<-fread("input/train.csv")
rollup<-train[,.(Expected=mean(Expected),.N,naCount=sum(ifelse(is.na(Ref),1,0))),Id]
remove<-rollup[N==naCount,]
keep<-rollup[N>naCount,]
common0<-keep[,.N,Expected]
common1<-keep[,.N,.(Expected=round(Expected,1))]
common2<-keep[,.N,.(Expected=round(Expected))]
vals0<-common0[order(-N),][1:50,][order(Expected),]
vals1<-common1[order(-N),][1:50,][order(Expected),]
vals2<-common2[order(-N),][1:50,][order(Expected),]
cuts0<-cut(keep[,Expected],breaks=c(0,vals0[,Expected],Inf),labels=round(c(0,vals0[,Expected]),2))
cuts1<-cut(keep[,Expected],breaks=c(-1,vals1[,Expected],Inf),labels=c(-1,vals1[,Expected]))
cuts2<-cut(keep[,Expected],breaks=c(-1,vals2[,Expected],Inf),labels=c(-1,vals2[,Expected]))

#common1<-keep[,.N,.(Expected=round(Expected,1))]
vals3<-common1[order(-N),][1:20,][order(Expected),]
cuts3<-cut(keep[,Expected],breaks=c(-1,vals3[,Expected],Inf),labels=paste0("x",c(-1,vals3[,Expected])))

from rain-part2.

JohnM-TX avatar JohnM-TX commented on September 22, 2024

This sounds interesting. If I follow (which may not be the case) then it is similar to what I've been reading about precip estimates. Depending on the level of precipitation, the best way to estimate can be quite different. For light precip, you might use Ref, for medium precip you might use a different function with Kdp, and so on (realizing that this is an example and may not be the real functions.) Anyway, one of the things I haven't tried yet is breaking the data into 2 or more sets and modeling that way, but I think it's promising.

from rain-part2.

mlandry22 avatar mlandry22 commented on September 22, 2024

You're right, technically they would get different models. The first version literally had different models to estimate the chance of Light, Medium, Heavy, and all the 1mm values between them, and they definitely took different variables into account.
This is a similar idea. But I was about to set it up slightly differently, and I'm glad you made that comment because I should try both ways before seeing this through.
The difference is that in Rain 1, each MM level had a binary classification model to estimate the probability independently. And that one was directly associated with the loss metric, so you directly used each of those probabilities. This one I am setting up as a multinomial classification, rather than binary. So the model is going to try and learn probabilities for each specific bucket compared against the others.
Launching the first one....now.
Will try to get quick feedback for us so we know whether it will be worth adding to the overall last-week strategy or not.

from rain-part2.

mlandry22 avatar mlandry22 commented on September 22, 2024

It's early, but interesting to see how the GBM is solving the problem.
First, it seems my way of setting up buckets missed, as some of these are not populated enough. However, I'm not too concerned yet, as 20 is a bit high for the first pass.

Aside from x14, this is in sorted order. You can read the bucket as the minimum of the bucket. So x0 contains those between 0.0mm and 0.1mm. x4.3 are the readings between 4.3mm and 14mm.

So what it appears to be doing is guessing the mode (x0.1) as the default and then finding ways to guess something the second most popular, which is x4.3. Error in all other buckets is 99% or so. Truthfully I'm far more concerned with the probabilities, in hopes that a mini-stacking can figure out the best absolute error guess, given the suite of 20 probabilities.

This was after 33 trees only and error is still well in the steep descent part of the curve, so plenty of room to go. Validation error is still nearly identical to training error (shown).

image

from rain-part2.

JohnM-TX avatar JohnM-TX commented on September 22, 2024

Don't know if domain knowledge is useful for you, but I found these articles informative:
http://www.nwas.org/jom/articles/2013/2013-JOM19/2013-JOM19.pdf
http://www.nwas.org/jom/articles/2013/2013-JOM20/2013-JOM20.pdf
http://www.nwas.org/jom/articles/2013/2013-JOM21/2013-JOM21.pdf

There is a chart in the first article showing viable ranges of variables:
image

from rain-part2.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.