Thread to discuss submission strategy. At 3 members and 2 submissions per day, it won'

Submission Management,about mlandry22/rain-part2

Comments (78)

mlandry22 commented on June 21, 2024

I think for default strategy, we can try this:
John and Thakur get 1/day and if I am paying atttention as the deadline rolls around (will soon be 4pm pacific, a good time on weekdays) I will claim one.

I think that default strategy will work pretty well because my plan for this competition is to have a large variety of attempts available, so I'll probably try and post what the holdout scores are as I create them, and then we can decide if we ever want to use one. I won't be able to submit everything I generate anyway, so I will allow you guys to submit what you want.

So we can discuss/post here to try and plan better, but unless it's clear that we have something interesting going on, you each can have one a day, I think.

from rain-part2.

ThakurRajAnand commented on June 21, 2024

Sounds good to me.

from rain-part2.

JohnM-TX commented on June 21, 2024

It works for me. I'll try to give a heads up if not using one.

from rain-part2.

JohnM-TX commented on June 21, 2024

Lately I've been sending a submission a few hours after the previous day's deadline. So, I have already submitted for "today" and we have one remaining.

from rain-part2.

mlandry22 commented on June 21, 2024

I have had one ready for whenever there was a free slot. It's a semi-useless blend, but something I wanted to try. But I see you have since bumped up performance of the XGB contribution, so I'll probably take the (deadline - 10 minutes) slot for some kind of blend there, unless either of you have something specific you want to test that will gain is more insight. If so, by all means, go ahead.

from rain-part2.

mlandry22 commented on June 21, 2024

Nice going, John. I put your best and my single best together at 70/30 and we bumped up 2 spots.

from rain-part2.

mlandry22 commented on June 21, 2024

We are "leading the league" in at least one category: most submissions!
In fact, we are probably just under the max allowable for teaming up. The competition has been running since 9/17, so the max submissions is around 82 or 84, and we are at 79.

from rain-part2.

JohnM-TX commented on June 21, 2024

Glad we made it under the limit and popped up a couple spots. I have some new developments that should be good. Will tie them into the main thread and post code in the next 24 hours.

from rain-part2.

JohnM-TX commented on June 21, 2024

Didn't get us better position tonight, but improved the XGB score to 23.7687, just .0015 below our current best.

from rain-part2.

mlandry22 commented on June 21, 2024

Nice. That makes it our leading single model, so surely we'll improve on any combination. But I will resist the urge to do such a thing until we're next out of ideas. I'll try and have something ready from R, in case Thakur is still working on his by the deadline tomorrow. Thakur, if you have something, that 2nd one is all yours.

John, are you using some sort of validation set to gauge local improvement as well? No problem if you aren't. We'll want to eventually, but it's still plenty early.

from rain-part2.

ThakurRajAnand commented on June 21, 2024

I will set up sklearn GBM + spearmint to run and whatever best parameter spearmint is able to find before deadline will submit that. Since I will do 5 fold so might be able to report CV scores also.

** Any suggestions for doing proper validation or anything you seems to be matching closely with LB

from rain-part2.

JohnM-TX commented on June 21, 2024

Yes I have a basic validation set in use. I carved out 1/5 for validation and use the other 4/5 for training (keeping ID integrity). It seems to be directionally correct and is in the neighborhood of agreement with the public LB (low 20's).

I find that if I run CV on xgboost, those scores are much lower (between 2-3). I think it's because outliers are removed before running the model, and a big part of the MAE comes from those mysterious outliers.

from rain-part2.

ThakurRajAnand commented on June 21, 2024

@johnm914 I am still in office and would reach home after 1 hour only.Not sure If I would be able to generate submission on time. Feel free to use the submission if you have something ready.

from rain-part2.

ThakurRajAnand commented on June 21, 2024

Anyone have anything good to submit. I just started playing around and can try submitting a H2O NN if you guys don't have something good to test.

from rain-part2.

mlandry22 commented on June 21, 2024

I think it's all yours Thakur. John got his in last night, so take your shot. I will let you know when I have something very useful. My scripting is working, but I need to deploy it to a system where I'm comfortable letting it go for hours on end. So far it's on my everyday laptop. Step by step. Will have something I like when it's all done.

from rain-part2.

ThakurRajAnand commented on June 21, 2024

LB Score : 23.82631

I would say not bad at all, since NN was very simple. Below are the parameters I have used.

model <- h2o.deeplearning( x=2:22, y=1, training_frame = x1, activation = "RectifierWithDropout", hidden = c(10,10), input_dropout_ratio = 0.1, hidden_dropout_ratio = c(0.1,0.1), loss = "Absolute", epochs = 10 )

I would be saving code for each of my submission in new file and would keep naming convention of my submission and code file same. e.g T0001.R and T0001.csv

from rain-part2.

mlandry22 commented on June 21, 2024

Awesome! I'll make sure and add some h2o.deeplearning configs in the thing that runs all weekend so we can see if that configuration can be beat. Don't let that stop you from doing the same. But that's a perfect utilization of what I want, and will be nice to let some H2O servers run all weekend.

from rain-part2.

JohnM-TX commented on June 21, 2024

Guess who's in the top 10, boys? Us!!

Code is deposited in the folder. I'll be taking a break until Sunday night for family time and won't be submitting for a few days. Plus, it's time for me to take a break from the terminal and view the problem from a distance for a short while. Have a great weekend!

from rain-part2.

mlandry22 commented on June 21, 2024

Awesome!!
Ok, I said awesome twice. But adding deep learning to the mix and whatever John just did to get us up 12 spots is great. Will take a look at what it was.

from rain-part2.

ThakurRajAnand commented on June 21, 2024

WOW

from rain-part2.

mlandry22 commented on June 21, 2024

Have a great weekend, John.
I've seen the code and now I see the following notes in your pair of submissions:

Fri, 30 Oct 2015 04:16:50
more features 95/5 blend with sample non-tuned model
xgb-10-29-4.csv     
23.74293
...
Thu, 29 Oct 2015 03:47:52
added new features 78/22 blend with sample
xgb-10-27-4.csv     
23.76872

So this is looking great. We have an R gbm, XGBoost GBM, and H2O deep learning, all closing in on similar independent scores, so we are in good shape for some nice variance in our models. At some point, we'll probably want to look into stacking these: learning weights, rather than the guess-and-check simple average method we're doing (which is fine for now).

from rain-part2.

JohnM-TX commented on June 21, 2024

Hi guys - I made a new submission tonight with better local CV but it did not do well on the LB. It was the same XGb model, with one bug fix to a calculated field, and blended 85/15 or so with all zeros (essentially scaling down the values.) I thought it would do better, but no.

I'll look back to features and see if there are gains to be made there unless there is another avenue we need to explore.

from rain-part2.

JohnM-TX commented on June 21, 2024

Did some work on features, cutpoints, etc. and got .10-.15 gain locally but nothing on the LB. Finally got some small gain with an ensemble of the two best xgb models. I'm thinking this branch of development is about played out and will spend my next effort trying to identify some of those outliers that contribute to the bulk of MAE.

from rain-part2.

JohnM-TX commented on June 21, 2024

Hi guys. Do either of you have plans for the second submission today? If
not, I'll use it but no problem waiting til later either.

On Fri, Oct 30, 2015 at 12:25 AM, Mark Landry [email protected]
wrote:

Have a great weekend, John.
I've seen the code and now I see the following notes in your pair of
submissions:

Fri, 30 Oct 2015 04:16:50
more features 95/5 blend with sample non-tuned model
xgb-10-29-4.csv
23.74293
...
Thu, 29 Oct 2015 03:47:52
added new features 78/22 blend with sample
xgb-10-27-4.csv
23.76872

So this is looking great. We have an R gbm, XGBoost GBM, and H2O deep
learning, all closing in on similar independent scores, so we are in good
shape for some nice variance in our models. At some point, we'll probably
want to look into stacking these: learning weights, rather than the
guess-and-check simple average method we're doing (which is fine for now).

—
Reply to this email directly or view it on GitHub
#3 (comment).

from rain-part2.

ThakurRajAnand commented on June 21, 2024

No plan from my side

from rain-part2.

mlandry22 commented on June 21, 2024

Go ahead, John. Thanks for pushing us forward. I'll catch up once we get our conference over next wek :-)

from rain-part2.

JohnM-TX commented on June 21, 2024

No problem. Unfortunately my last two submissions were bricks! But, I might be onto something. Will post it as an issue and maybe you guys can build/ course correct when things settle down.

On Thu, Nov 5, 2015 at 3:59 PM, Mark Landry [email protected]
wrote:

Go ahead, John. Thanks for pushing us forward. I'll catch up once we get our conference over next wek :-)

Reply to this email directly or view it on GitHub:
#3 (comment)

from rain-part2.

ThakurRajAnand commented on June 21, 2024

Cool. Please post it whenever you are ready. I recently left job and was busy doing knowledge transfer, but now I am completely free. I will go through all the features you and Mark have created and will start running some models.

from rain-part2.

mlandry22 commented on June 21, 2024

Trying deep learning out now. Bigger models than Thakur listed here are not working at all. So I'm going back and starting with just what you had running. Could be my features aren't good as they are, too.

from rain-part2.

JohnM-TX commented on June 21, 2024

I tried more tinkering with features but couldn't get any improvements. Also ran through with a h2o RF model but still in the same range. I'll go back to looking for outliers to see if we have any luck there. In this strangely-placed post
https://www.kaggle.com/c/how-much-did-it-rain-ii/forums/t/17317/just-curious-what-s-the-mouse-over-number-on-scores
the comp admin seems to indicate that it's not possible to find the outliers in the test data.

from rain-part2.

JohnM-TX commented on June 21, 2024

So what do we want as an overall strategy for the last week? I made some (very) slight progress with the outlier detection tonight and can pursue that. I've tried more feature engineering with rainfall prediction, but hit many dead ends. One of the main challenges I've had is flaky validation. I'm doing a simple leave out 20% of the data which sometimes works sometimes doesn't. Open to suggestions...

Anyway, how should we proceed? I think with just some ensembling and simple tweaks we can move up a few spots and who knows, maybe we pull it together and jump back to top 10?

from rain-part2.

mlandry22 commented on June 21, 2024

The logistic regression isn't off to a great start. It bottomed out shortly after. I'm trying to figure out if it can still be informative, particularly when stacked with something like nlopt for MAE. But let's not hold our breath.

Therefore, back to John's question about the overall strategy. And I like the two concepts mentioned.

For flaky validation, I can provide some horsepower to run 5-fold, rather than a single 20%. We can compare validation scores across each and see how that works out.

That will also give us some ensembling direction. And with what we have, that's probably a good idea, though I know it didn't work out too well on the first attempt (again, I have had that same experience before, too).

Other ideas?

As it stands, I think perhaps we can share a list of folds so that we can consistently use the same validation sets? That way we can all consistently use the same first fold if that's all we want to start with, and have it be consistent.

from rain-part2.

JohnM-TX commented on June 21, 2024

Yes, I agree sharing validation sets would be a good move. Some interesting traffic today on the forum: https://www.kaggle.com/c/how-much-did-it-rain-ii/forums/t/16680/cross-validating
Several others have had trouble with classification carrying over from train to test. There's something different about those days that throws us off. For outlier detection, I tried to stay away from "rainfall-related" variables and look at "site specific" variables like radardist, sd(Zdr) and such but still no luck.

Thakur - any insight from outlier detection or other pursuits? I could use some inspiration!

from rain-part2.

mlandry22 commented on June 21, 2024

Ok, so John, I'm going to use this as a basis for creating the folds, so that scores should be on par with what you're using, and then we can get 4 other sets to start using:

# create an interim validation set                          
idnumsv <- unique(trraw[, Id])
validx <-sample(1:length(idnumsv), length(idnumsv)/5)                     
valraw <- trraw[Id %in% validx, ] 
trraw <- trraw[!Id %in% validx, ] 
val <- collapsify(valraw)
val <- val[!is.na(val[, wref]), ]      
setDF(val)

That comes with a seed set up top which should ensure the sampling is consistent.
I'll do that and post a 2-column table of IDs and folds. Then I'll try and adapt our best code and run them all on 5-fold.

Speaking of, I'll try and catch up with everything to answer this question myself, but our leading models are something like

XGBoost, as posted by John
DL as posted by Thakur
R/GBM as posted by Mark (slightly broken, btw, forgot to make an x2 from x)

Others I've overlooked as I was distracted for a few weeks?

from rain-part2.

ThakurRajAnand commented on June 21, 2024

Sorry for the delay but I am back. Have been bit busy with some other stuff. I will pick features created by you and John and start improving NN and will try to optimize GBM from sklearn directly on mae.

Since I was away, I will be happy if your guys assign something.

from rain-part2.

ThakurRajAnand commented on June 21, 2024

@johnm914 instead of trying to find the outliers directly did you try to generate classification CV probabilities and use them as features in regression model. I am going to give one try to auto-encoders to do outlier detection after I leave office today.

from rain-part2.

JohnM-TX commented on June 21, 2024

I briefly went down the path with k-means but looking back probably didn't
do it right. I drew up a schematic earlier today that I can try with
classification feeding into regression. In theory it should work but it
looks like many others have tried and failed. Probably because they're not
as smart as us, right? Joking of course. I'll give it a go in the next 24
hours and see anyway. Brain is wearing out this evening - midnight my
time...

On Tue, Dec 1, 2015 at 11:52 PM, Thakur Raj Anand [email protected]
wrote:

@johnm914 https://github.com/JohnM914 instead of trying to find the
outliers directly did you try to generate classification CV probabilities
and use them as features in regression model. I am going to give one try to
auto-encoders to do outlier detection after I leave office today.

—
Reply to this email directly or view it on GitHub
#3 (comment).

from rain-part2.

JohnM-TX commented on June 21, 2024

Here is my latest on features that might predict outliers, which by my way of thinking are different than features to predict ranfiall. I tried to focus on features that would be specific to a site - radar calibration, geography, local interference, timing intervals, etc.

collapsify2 <- function(dt) { 
  dt[, .(expected = mean(Expected, na.rm = T)
    , bigflag = mean(bigflag, na.rm = T)
    , negflag = sum(negflag, na.rm = T)
    , records = .N    
    , timemean = mean(timespans, na.rm = T)
    , timesum = sum(timespans, na.rm = T)
    , timemin = min(timespans, na.rm = T)
    , timemax = max(timespans, na.rm = T)
    , timesd = sd(timespans,na.rm = T)
    , minssum = sum(minutes_past, na.rm = T)
    , minsmax = max(minutes_past, na.rm = T)
    , minssd = sd(minutes_past, na.rm = T)
    , zdrmax = max(Zdr, na.rm = T)
    , zdrmin = min(Zdr, na.rm = T)
    , zdrsd = sd(Zdr, na.rm = T)
    , kapsd = sd(Kdp*radardist_km, na.rm = T)
    , rhosd = sd(RhoHV, na.rm = T)
    , rhomin = min(RhoHV, na.rm = T)
    , rd = mean(radardist_km, na.rm = T)
    , refcdivrd = max((RefComposite-Ref)/radardist_km, na.rm = T) 
    , c1 = max(Zdr/Ref, na.rm = T)
    , c2 = max(RefComposite/Ref, na.rm = T)
    , c3missratio = sum(is.na(RhoHV))/.N
    , refmissratio = sum(is.na(Ref))/.N
    , refcmissratio = sum(is.na(RefComposite))/.N
  ), Id]
}

Here's what I got for feature importance. As mentioned, this did well locally but not with the test set.

from rain-part2.

mlandry22 commented on June 21, 2024

John, on that same idea, we might try to compute peculiar single readings. Like if rhomin is way out of range and everything else is in normal range. That's a hard feature for trees to find, despite what people claim about interactions.
We'd probably want normalized values of each reading from the pre-collapsed set: (reading-colMean)/colSD
And then something like getting the max value per ID divided by the mean value per ID (accent the outlier). There's a couple things to look for, and playing around with numbers can probably get a few such features.

Things you don't want this to flag:

IDs where most values are extremely low/high all the time
IDs where most values are extremely low/high for one or two timepoints

Things you would want this to flag:

IDs where one value is extremely low/high all the time
IDs where one value is extremely low/high one or two timepoints

Likely an important part about making this work efficiently is to make a single flag for all features; either:

scan all columns per ID and just store the max of outliers found
calculate them independently, but also create a feature that evaluates if any of the independent ones are flagged
sum up the independents; 0/1/2/3 can be found just as easily as 0/1
Doing either the second or third is easiest for debugging reasons and very easy to ignore the independent ones. But that single feature is useful as if this doesn't happen often, the trees can use the overall one with a lot more power than spreading it out amongst all the features.

Another thing that seems silly but worthwhile, if it's there, is whether they have internally inconsistent numbers. If the 10th is greater than the 50th or 50th greater than 90th. Not likely the case, but if it happens, it might be interesting to look at.

from rain-part2.

mlandry22 commented on June 21, 2024

...I think this verifies that no silly mistakes are out there.

t<-fread("train.csv")
t[!is.na(Ref_5x5_10th) & !is.na(Ref_5x5_50th),sum(Ref_5x5_10th>Ref_5x5_50th)]
t[!is.na(Ref_5x5_90th) & !is.na(Ref_5x5_50th),sum(Ref_5x5_50th>Ref_5x5_90th)]
t[!is.na(RefComposite_5x5_10th) & !is.na(RefComposite_5x5_50th),sum(RefComposite_5x5_10th>RefComposite_5x5_50th)]
t[!is.na(RefComposite_5x5_90th) & !is.na(RefComposite_5x5_50th),sum(RefComposite_5x5_50th>RefComposite_5x5_90th)]
t[!is.na(RhoHV_5x5_10th) & !is.na(RhoHV_5x5_50th),sum(RhoHV_5x5_10th>RhoHV_5x5_50th)]
t[!is.na(RhoHV_5x5_90th) & !is.na(RhoHV_5x5_50th),sum(RhoHV_5x5_50th>RhoHV_5x5_90th)]
t[!is.na(Zdr_5x5_10th) & !is.na(Zdr_5x5_50th),sum(Zdr_5x5_10th>Zdr_5x5_50th)]
t[!is.na(Zdr_5x5_90th) & !is.na(Zdr_5x5_50th),sum(Zdr_5x5_50th>Zdr_5x5_90th)]
t[!is.na(Kdp_5x5_10th) & !is.na(Kdp_5x5_50th),sum(Kdp_5x5_10th>Kdp_5x5_50th)]
t[!is.na(Kdp_5x5_90th) & !is.na(Kdp_5x5_50th),sum(Kdp_5x5_50th>Kdp_5x5_90th)]

from rain-part2.

mlandry22 commented on June 21, 2024

And then here is a simple way to get started with looking for the weird stuff:

library(data.table)

t<-fread("train.csv")
t[,zRef:=minutes_past*0]
t[,zRefComposite:=minutes_past*0]
t[,zRhoHV:=minutes_past*0]
t[,zZdr:=minutes_past*0]
t[,zKdp:=minutes_past*0]

t[!is.na(Ref),zRef:=scale(Ref, center = TRUE, scale = TRUE)[,1]]
t[!is.na(RefComposite),zRefComposite:=scale(RefComposite, center = TRUE, scale = TRUE)[,1]]
t[!is.na(RhoHV),zRhoHV:=scale(RhoHV, center = TRUE, scale = TRUE)[,1]]
t[!is.na(Zdr),zZdr:=scale(Zdr, center = TRUE, scale = TRUE)[,1]]
t[!is.na(Kdp),zKdp:=scale(Kdp, center = TRUE, scale = TRUE)[,1]]

t[,maxZ:=pmax(zRef,zRefComposite,zRhoHV,zZdr,zKdp)]
t[,meanZ:=(zRef+zRefComposite+zRhoHV+zZdr+zKdp)/5]
t[,meanNonZeroZ:=(zRef+zRefComposite+zRhoHV+zZdr+zKdp)/(1+ifelse(zRef==0,0,1)+ifelse(zRefComposite==0,0,1)+ifelse(zRhoHV==0,0,1)+ifelse(zZdr==0,0,1)+ifelse(zKdp==0,0,1))]
t[,maxAbsZ:=pmax(abs(zRef),abs(zRefComposite),abs(zRhoHV),abs(zZdr),abs(zKdp))]
t[,meanAbsZ:=(abs(zRef)+abs(zRefComposite)+abs(zRhoHV)+abs(zZdr)+abs(zKdp))/5]
t[,meanAbsNonZeroZ:=(abs(zRef)+abs(zRefComposite)+abs(zRhoHV)+abs(zZdr)+abs(zKdp))/(1+ifelse(zRef==0,0,1)+ifelse(zRefComposite==0,0,1)+ifelse(zRhoHV==0,0,1)+ifelse(zZdr==0,0,1)+ifelse(zKdp==0,0,1))]
t2<-t[,.(maxRatio=max(ratioMaxAbs_meanAbs),Expected=mean(Expected)),Id]
t2[,.(median=median(Expected),mean=mean(Expected),meanCapped=mean(pmin(Expected,50)),.N),round(maxRatio,1)][order(round)]

But doing all that doesn't seem convincing at first:

   round    median        mean meanCapped      N
 1:   0.0 0.7620004 326.2158641 12.6169167 410096
 2:   1.3 0.5080003   0.7257147  0.7257147      7
 3:   1.4 0.5080003  87.5228872  8.8886983    109
 4:   1.5 0.5080003 172.9487308  7.5987424   3490
 5:   1.6 0.7620004  63.0267645  4.9785835  25942
 6:   1.7 1.0160005  56.0237884  4.9542465  16917
 7:   1.8 1.0160005  47.6335496  4.8691934  15032
 8:   1.9 1.0160005  48.6061699  5.0389237  10765
 9:   2.0 0.7620004  75.2349484  5.7090180  36310
10:   2.1 1.0160005  41.9078220  4.9771725  18475
11:   2.2 1.0160005  41.8893130  4.8625946  19920
12:   2.3 1.0160005  40.0369639  4.7955911  19967
13:   2.4 0.7620004  39.2674235  4.3239085  19865
14:   2.5 1.0160005  37.2729321  4.5524655  34591
15:   2.6 1.0750006  27.6358265  4.5071043  37173
16:   2.7 1.2700007  23.6880710  4.6437057  48376
17:   2.8 1.2700007  23.7996848  4.7349457  44136
18:   2.9 1.2700007  24.2110894  4.6691191  66051
19:   3.0 1.2700007  18.1062962  4.4207018  37647
20:   3.1 1.2700007  20.9359522  4.2279012  32788
21:   3.2 1.2700007  19.5954574  4.2232663  31539
22:   3.3 1.2700007  18.2001923  4.1939458  30618
23:   3.4 1.2700007  15.8351303  3.8901064  29082
24:   3.5 1.2700007  17.1309199  3.9404645  27212
25:   3.6 1.2700007  16.4194614  3.8531756  24567
26:   3.7 1.0160005  18.8579310  3.9361624  22868
27:   3.8 1.2700007  12.7421878  3.8746778  19899
28:   3.9 1.0160005  12.8914775  3.7394227  17744
29:   4.0 1.2700007   9.4873132  3.7121392  14757
30:   4.1 1.2700007  14.7490870  3.7254200  12581
31:   4.2 1.2700007  12.2639715  3.5607097  10848
32:   4.3 1.2700007  14.3822789  3.6191568   9120
33:   4.4 1.2700007  18.9320663  4.0976536   7676
34:   4.5 1.2650007  18.3044052  3.7352882   6296
35:   4.6 1.2700007  14.6096468  3.6498686   4871
36:   4.7 1.0160005  16.4558360  3.7568519   3906
37:   4.8 1.2700007  16.3944929  3.8032243   2927
38:   4.9 1.2700007  12.0857711  4.0340287   2205
39:   5.0 1.2700007   5.3350087  3.4537775   1540
40:   5.1 1.2700007  17.3513730  4.2926483   1088
41:   5.2 1.0750006  38.6896190  4.4416716    761
42:   5.3 1.2700007  23.8237763  4.3680016    537
43:   5.4 1.0160005   3.8097636  3.0975536    344
44:   5.5 1.5240008   7.0909679  4.4143251    167
45:   5.6 1.0160005  92.3979459  5.2137943     87
46:   5.7 0.6350003   1.3112734  1.3112734     44
47:   5.8 3.5560020   3.9793354  3.9793354      3
48:   5.9 0.2540001   0.2540001  0.2540001      1
    round    median        mean meanCapped      N

from rain-part2.

mlandry22 commented on June 21, 2024

Not an elegant way of getting it done, but here is a way to get folds that should be consistent with what John has been doing. John's data should check out with fold 0.

I'll try and get the CSV zipped and posted, but might email it. Here is R code to get it.

library(data.table)
library(readr)
library(dplyr)
set.seed(333)

trraw <- fread("train.csv", select = selection)
idnumsh <- unique(trraw[, Id])

s<-sample(1:length(idnumsh))
s2<-floor(s/((1+length(idnumsh))/5))
foldTable<-as.data.frame(idnumsh)
foldTable$fold<-s2
colnames(foldTable)[1]<-"Id"
write.csv(foldTable,"foldTable.csv",row.names=F)

from rain-part2.

mlandry22 commented on June 21, 2024

I'm running my code through those five folds now. I won't have an opportunity to submit by today's deadline. But I ought to have all five models done with plenty of time for tomorrow's.

from rain-part2.

JohnM-TX commented on June 21, 2024

Cool. I may have something in time. Setting up my outliers detector and classifier now.

from rain-part2.

mlandry22 commented on June 21, 2024

It's nice that you've been pushing the best single model. I was going to get an ensemble set up in case we didn't have two for the deadline.

Are either of you using the Marshall Palmer directly? It's likely not a bad feature, but I don't think I'm using it.

from rain-part2.

mlandry22 commented on June 21, 2024

Well I went ahead and tried the ensemble given that it seems like we have either 0 or 1 other submissions today. It helped, but not much. As expected, mostly.

from rain-part2.

JohnM-TX commented on June 21, 2024

At least it's some progress. I ran the classifier based on flags along the lines of what Mark described. It worked very well on the validation set but as before, it did not do any good on the test set.

I'll finally leave this avenue and go back to trying probability matching. I didn't get any boost fitting to a standard gamma distribution, but an ensemble approach might work. The woman who wrote this paper seems to think it has merit.
Ebert_2001_PME.pdf

from rain-part2.

JohnM-TX commented on June 21, 2024

Mark - I used rates calculated by Marshall-Palmer as a feature for xgboost and it was ranked #1 for feature importance. I used this modified version after trying different parameters in a direct calculation of MAE on the training data.

ratemm  = ((10^(Ref/10))/170) ^ (1/2)   
precipmm = sum(timespans * ratemm, na.rm = T)  # grouped by Id

where timespans is the fraction of the hour from the previous reading to the current reading associated with the Ref value.

from rain-part2.

mlandry22 commented on June 21, 2024

OK, got the folds run last night. But I might have underestimated the variance in what we all aim to predict as well. The ID code I sent should be universal, so no exclusions. But John capped at 50, I think I saw, I had previously capped at 70, I think. So we'll be covering different spaces a little bit. I suppose if we do any stacking, we'll only do it on those where we have full predicitons, and that's likely the best anyway.

So that said, here is a statement regarding the variance of my folds, which is fairly large, I think:

1: 0.2035
2: 0.2067
3: 0.2001
4: 0.2031
5: 0.2014

So shifting at the third decimal place, which isn't too bad--we just bumped our score up by a similar amount of the variance here and it was only worth one spot.

Tough to know what to make of it, but I can provide predictions for the full holdout set I was using and then the test set.

from rain-part2.

mlandry22 commented on June 21, 2024

For a little inspiration, the amount we moved on our last submission is almost 1/4 of what we need to get 10th, as it stands. So it's certainly achievable. Now we just need some clue of how to do it 4.5 more times ;-)

from rain-part2.

mlandry22 commented on June 21, 2024

I'm currently struggling a bit with the difference in what my GBM models are doing vs the median compared to how the public leaderboard shows those same stats.
My GBMs take a median of about 2.6 down to 0.2.
The public leaderboard takes a median of 24.17106 down to 23.73444.

Yes, many outliers were removed, but the basis movement should be fairly stable. Removing outliers does take the count of records down, so moving 2.4 units will get diluted when we just add a bunch of immovable points. Still, I don't quite get why 2.4 units on training becomes 0.44 on the public board.

from rain-part2.

mlandry22 commented on June 21, 2024

Ha, it's because I trained on the log of the target. Yay for diagnostics, even if it takes me three hours to understand.

from rain-part2.

JohnM-TX commented on June 21, 2024

Tried some probability matching without success. The result basically "downgraded" the values of ~900 in the blended model (which I think came from RF) to the 30-50 range or so. Since it did worse, maybe that means the Rf model or blend has successfully identified some of the outliers?

To Mark's earlier point on our score gap, it doesn't seem like it should be so hard to get a little 0.02 bump.

from rain-part2.

mlandry22 commented on June 21, 2024

This is the wrong time to try new things out, but with the Marshall-Palmer being deterministic, it seems possible to do this, so I am going to try:

Don't aggregate readings until the end.
Apply the Marshall-Palmer across all points of an ID.
Turn that into % of ID
Create Partial-Expected: multiply Expected * %
Learn a model for partial-expected
Try adding those up first, but probably a better way is to have something simple aggregate those, perhaps by having each observation bucketed by time and the model's predictions sorted to fit that matrix.

It's easy to do the straightforward one, at least. Perhaps that will help with some diversity.

from rain-part2.

JohnM-TX commented on June 21, 2024

MIght as well try it. I'm going to try one more thing with probability matching and see if there are any gains.

So with 6 more submissions remaining, any priority on how to use them?

from rain-part2.

mlandry22 commented on June 21, 2024

I kept screwing up my folds just on my own code, but did finally get those out. I want to run both of your code as well, but I am really making a lot of mistakes to get through this.
I did not do the MP full data thing yet.

from rain-part2.

mlandry22 commented on June 21, 2024

I should also run John's features through my R code. If I'm reading the XGBoost code right, the only use of MAE is in printing the output.
feval is the final evaluation, so the part that outputs and controls early stopping.
objective = "reg:linear" is the part that calculates the gradient that affects what the trees alter, and that's not an absolute loss (which isn't trivial to implement either).

R's does compute the gradient of MAE, so with the same features, I should be able to get the R model closer to what XGBoost does. It will lose some accuracy due to not column sampling on each tree, but it would seem I could get closer than what we have now.

from rain-part2.

mlandry22 commented on June 21, 2024

Am running my best R gbm settings against John's data (as per the one uploaded here, at least) right now. John, what sort of range should I expect, for MAE? It seems the median on this is about 2.1703 on the training set. If what I have does well, I'd like to submit it to the leaderboard in the morning.
If one of these four things happens, I won't need the 5th-to-last submission (our second "today")

Not sure what a good CV value is
CV value is not good
Slow creating test set (only train/validation on the code, as I see)
Other things get in the way before I can submit (heading to a remote cabin tomorrow)

So if I am not heard of by about noon pacific (deadline - 4 hours), assume I am not getting anything in and somebody can take that second submission.

from rain-part2.

mlandry22 commented on June 21, 2024

Finished that but the results are a little suspicious. Low tree counts, like 100-400 perform best, escalating to the worst as 1000 trees, at which they come back down until stopping at 1300. But 1300 is worse than the simpler models on the validation set.
That's not very reassuring. Holdout scores are around 22.95 - 22.98, on a median of 23.34. If that drop from the median is mirrored in the test set, it would be good for around the range of the best public script, so not great.
Worse, the test set as I calculated it looks quite a bit different from the train set. It's possible that's because the full NA values are removed, so there's nothing to worry about. But it seems like there isn't a compelling reason to submit this today. I'll try to package it up and email it, in case somebody is able to submit on the condition we have nothing better to use at the deadline.
Mark

from rain-part2.

JohnM-TX commented on June 21, 2024

Just now seeing this. I've recently been seeing MAEs from 22.1 to 22.8 on the validation set, which has the Ref=NAs removed. The range has not corresponded to the test set closely enough for me to use it reliably. In other words, I would change something, see a drop of 0.1, submit, and then see a rise of 0.1 or more.

If you're still able to email something I can move it forward.

from rain-part2.

mlandry22 commented on June 21, 2024

Sent it, but it's outside that range and it just doesn't seem right that it's more accurate on the soft side. Strange curve I can't remember seeing before. I blended the high end and the low end of trees.

from rain-part2.

mlandry22 commented on June 21, 2024

So again, if anybody has anything else, use it. Else at least we have one for today. Will be working on this offline tonight hopefully.

from rain-part2.

JohnM-TX commented on June 21, 2024

It 'only' got 23.77. I noticed the min value is 0.254, which is higher than most of our models, and the max is ~25 compared to 35 for the xgboost. There's also a spike between 0.64 and 0.9 with about half of all values in that range. I don't know if that's good or not, just more than I would expect.

from rain-part2.

JohnM-TX commented on June 21, 2024

Not much at all, but got us another 0.004 and 2 places. I started with xgbens-11-04, which upon inspection had a dip in the pd curve that probably shouldn't be there. So fit it to a gamma distribution except that at the tail end, where the gamma started really clipping I kept the original values. Then blended it at 1:2 with our previous best submission - 90-04-03-03 (which probably uses the same xgb).

from rain-part2.

mlandry22 commented on June 21, 2024

23.77 model

Yes, it was intentionally conservative, both on the high and low end. Locally, using a floor of 0.254 beat out 0.01 on both models and the blend I used, so I opted for the conservative model, especially being worried about the volatility of the model from the odd complexity shape.

And about the 0.64 - 0.9, not sure why that would be. I noticed that your XGBoost submission has the highest of just about everything on a summary: mean, median, quartiles. MAE isn't something you can clearly shift around to reach an optimal number, so it likely isn't the case that if we just shifted me and Thakur's models first, we'd see better performance. And not enough submissions to tinker in that way either. But chances are good the XGBoost values are decent ones, but I can't quite think of a way to take advantage of that this late.

Gamma model

Very interesting. I noticed the notion of it being a gamma distribution, and really ought to see what H2O can do with that, as it can fit a gamma distribution.

Last submission for today

Since the model with the highest complexity was starting to improve its accuracy, I am adding 300 more trees to see what happens. If it gets better, at least I'd have something new to try.
I might also pair that with trying to aim for the most common values near each prediction.
As usual, if anybody else has anything better, go ahead and take the submission.

from rain-part2.

JohnM-TX commented on June 21, 2024

Nothing else here.

On Sun, Dec 6, 2015 at 5:10 PM, Mark Landry [email protected]
wrote:

23.77 model

Yes, it was intentionally conservative, both on the high and low end. Locally, using a floor of 0.254 beat out 0.01 on both models and the blend I used, so I opted for the conservative model, especially being worried about the volatility of the model from the odd complexity shape.
And about the 0.64 - 0.9, not sure why that would be. I noticed that your XGBoost submission has the highest of just about everything on a summary: mean, median, quartiles. MAE isn't something you can clearly shift around to reach an optimal number, so it likely isn't the case that if we just shifted me and Thakur's models first, we'd see better performance. And not enough submissions to tinker in that way either. But chances are good the XGBoost values are decent ones, but I can't quite think of a way to take advantage of that this late.

Gamma model

Very interesting. I noticed the notion of it being a gamma distribution, and really ought to see what H2O can do with that, as it can fit a gamma distribution.

Last submission for today

Since the model with the highest complexity was starting to improve its accuracy, I am adding 300 more trees to see what happens. If it gets better, at least I'd have something new to try.
I might also pair that with trying to aim for the most common values near each prediction.

As usual, if anybody else has anything better, go ahead and take the submission.

Reply to this email directly or view it on GitHub:
#3 (comment)

from rain-part2.

mlandry22 commented on June 21, 2024

Well the model was better. But...it got in 11 seconds too late. So it's 23.76221, which isn't too bad, but a costly submission deadline mistake.

from rain-part2.

mlandry22 commented on June 21, 2024

Costly, costly as this is the deadline.So our final best answer is all we have left. Thoughts?

from rain-part2.

JohnM-TX commented on June 21, 2024

No great ideas here. I suppose it would either be another ensemble for
maybe a 0.005 gain or something bolder (I don't know what) that has a shot.
Maybe run a long deep model on h2o gbm and dilute it slightly with our
current best?

On Sun, Dec 6, 2015 at 6:19 PM, Mark Landry [email protected]
wrote:

Costly, costly as this is the deadline.So our final best answer is all we
have left. Thoughts?

—
Reply to this email directly or view it on GitHub
#3 (comment).

from rain-part2.

mlandry22 commented on June 21, 2024

Well I have an interesting gamma model. It appears to test locally pretty well. And something about it seems correct, but that's probably the part of me that hopes it's our 10-spot jump, rather than logical side that thinks that something done the last day is unlikely to be too useful.

Nonetheless, I think that I am going to put a heavy emphasis on this one, just in case.

I'll line up the distributions, and a handful of points and see if I can make heads or tails of it. Starting that now. Plan is a 5-way blend, and we'll get to see the score before we choose it blindly, of course.

from rain-part2.

mlandry22 commented on June 21, 2024

Here are the six summaries:

summary(eAll[,3:8])
      XGB                gamma                DL                RF              rGbm1            rGbm2        
 Min.   :   0.1718   Min.   : 0.04505   Min.   : 0.2383   Min.   : 0.3600   Min.   : 0.010   Min.   : 0.2540  
 1st Qu.:   1.0682   1st Qu.: 0.66955   1st Qu.: 0.6901   1st Qu.: 0.8407   1st Qu.: 0.813   1st Qu.: 0.7748  
 Median :   1.8392   Median : 1.09184   Median : 0.8700   Median : 1.1379   Median : 1.436   Median : 0.8460  
 Mean   :   2.1688   Mean   : 1.81431   Mean   : 1.4045   Mean   : 1.5196   Mean   : 1.600   Mean   : 1.3383  
 3rd Qu.:   2.4303   3rd Qu.: 2.20905   3rd Qu.: 1.2667   3rd Qu.: 1.4932   3rd Qu.: 1.706   3rd Qu.: 1.2100  
 Max.   :1100.0000   Max.   :39.41440   Max.   :66.3600   Max.   :33.5551   Max.   :29.287   Max.   :29.6695

from rain-part2.

mlandry22 commented on June 21, 2024

It wasn't a terrible idea. Who knows why, but we bumped up a couple spots. Just a couple, but I guess it's something. I'll just choose the top two scores, I suppose.

from rain-part2.

mlandry22 commented on June 21, 2024

Ouch. We fell more than anybody up in our range. We still got a 10% out of it. 27th down to 45th. Sorry about that. Well it isn't what we were hoping for, to be sure. But I believe it's John's best finish, so we can be happy there. Though John did most of the work it's certainly fair to say, so nice job, John!

They've been closing contests fairly shortly in the past while, but I'll submit some files after the deadline to see what the culprit might have been. If nothing else, I'll see the 6 models I used in the ensemble for private scores and post those.

from rain-part2.

JohnM-TX commented on June 21, 2024

Thanks for the experience, guys! I was psyched to see the small rise up two
spots, and likewise disappointed to see the drop, but as Mark mentioned top
10% is a first for me and I hope to make that the bar going forward. Hope
to work with you both again sometime!

On Mon, Dec 7, 2015 at 6:04 PM, Mark Landry [email protected]
wrote:

Ouch. We fell more than anybody up in our range. We still got a 10% out of
it. 27th down to 45th. Sorry about that. Well it isn't what we were hoping
for, to be sure. But I believe it's John's best finish, so we can be happy
there. Though John did most of the work it's certainly fair to say, so nice
job, John!

They've been closing contests fairly shortly in the past while, but I'll
submit some files after the deadline to see what the culprit might have
been. If nothing else, I'll see the 6 models I used in the ensemble for
private scores and post those.

—
Reply to this email directly or view it on GitHub
#3 (comment).

from rain-part2.

mlandry22 commented on June 21, 2024

It looks like the predicted outliers are doing most of the damage. The main reason that final model helped was that I dialed down the XGBoost contribution to the model.

But, it's more than that. When doing a submission where I cap the XGBoost model at 40 and keep the gamma down (it wasn't a good model), we would have gotten 34th.
Oh well. Should have, could have, would have. But, what we did isn't bad. At least it doesn't seem that a super model was within our grasp, so we can be content with that.

John, if you want to see how your models did before they turn over the leaderboard, you can see using this:
https://www.kaggle.com/c/how-much-did-it-rain-ii/leaderboard?submissionId=2269690
Only the person who submitted can see it, so I can't see the XGBoost individual models or Thakur's models.

from rain-part2.

mlandry22 commented on June 21, 2024

Well, time to put this one to rest. Sorry it took me so much to get going at the end. The models I ran at the end did OK, but the public/private feedback was misleading, so we wouldn't have known which was which. It is frustrating dealing with such variance in the local vs public vs private. Learn and move on, right! Not a bad finish at all.
Good work, I think we benefited from everybody, so that is nice.

I think I will have to dial down the teams for a little while, though. I became quite unreliable for a big chunk of this competition, and that was also the case for the Deloitte one, and the Rossmann one. Essentially, once H2O World geared up, I became unavailable and have had trouble getting back in it. Which I'm fine with, but I feel guilty being part of a team. So when I get back into doing team ones (hopefully when AutoML is really working for Kaggle well), I'll try and reach out to see if we can do another. Good luck both of you on the current round!
Thanks, again!
Mark

p.s. I'll probably either remove this repository or make it public. Only one no vote for making it public is required, so if you don't want this repository made public, let me know. I'll probably do one more round of inquiries before I really go through with either one.

from rain-part2.

mlandry22 commented on June 21, 2024

If you didn't see it, the winner posted a fantastic write up. Recurrent Neural Networks, complete with a great and easy to digest explanation. Code coming soon, too.

http://simaaron.github.io/Estimating-rainfall-from-weather-radar-readings-using-recurrent-neural-networks/

from rain-part2.

mlandry22 commented on June 21, 2024

Beside the novelty of what he was doing from features (pivoting and gap-filling, rather than aggregating), this is an interesting thing to note, complete with rationale (20-day/10-day split):

I began by splitting off 20% of the training set into a stratified (with respect to the number of radar observations) validation holdout set. I soon began to distrust my setup as some models were severely overfitting on the public leaderboard despite improving local validation scores. By the end of the competition I was training my models using the entire training set and relying on the very limited number of public test submissions (two per day) to validate the models, which is exactly what one is often discouraged from doing! Due to the nature of the training and test sets in this competition (see above), I believe it was the right thing to do.

from rain-part2.

Submission Management about rain-part2 HOT 78 OPEN

Comments (78)

model <- h2o.deeplearning( x=2:22, y=1, training_frame = x1, activation = "RectifierWithDropout", hidden = c(10,10), input_dropout_ratio = 0.1, hidden_dropout_ratio = c(0.1,0.1), loss = "Absolute", epochs = 10 )

Go ahead, John. Thanks for pushing us forward. I'll catch up once we get our conference over next wek :-)

23.77 model

Gamma model

Last submission for today

23.77 model

Gamma model

Last submission for today

As usual, if anybody else has anything better, go ahead and take the submission.

Related Issues (9)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent