Giter Club home page Giter Club logo

laurae's Introduction

Laurae2 R-package

The sequel to Laurae R-package.

Each function has at least one corresponding vignette to look up for an example using help_me("my_function_name").

Installation

It can be computationally expensive to build vignettes. Build without vignettes using the following:

devtools::install_github("Laurae2/Laurae2")

If you want to build vignettes to get a significantly better help:

devtools::install_github("Laurae2/Laurae2", build_vignettes = TRUE)

Pre-requirement installation:

install.packages("devtools")
install.packages(c("knitr", "rmarkdown", "mlrMBO", "lhs", "smoof", "ParamHelpers", "animation"))

xgboost installation, commit dmlc/xgboost@017acf5 seems best currently as it has gblinear improvements. Make sure to use the right compiler below:

devtools::install_github("Laurae2/xgbdl")

# gcc
xgbdl::xgb.dl(compiler = "gcc", commit = "017acf5", use_avx = FALSE, use_gpu = FALSE)

# Visual Studio 2015, use AVX if you wish to
xgbdl::xgb.dl(compiler = "Visual Studio 14 2015 Win64", commit = "017acf5", use_avx = FALSE, use_gpu = FALSE)

# Visual Studio 2017, use AVX if you wish to
xgbdl::xgb.dl(compiler = "Visual Studio 15 2017 Win64", commit = "017acf5", use_avx = FALSE, use_gpu = FALSE)

What can it do?

What can it do:

  • Bayesian Optimization (time-limited, iteration-limited, initialization-limited)
  • Create data.frame from [R,C] matrix-like format
  • Create data.table from [R,C] matrix-like format

Package requirements:

  • knitr
  • rmarkdown
  • mlrMBO
  • lhs
  • smoof
  • ParamHelpers
  • animation
  • xgboost

laurae's People

Contributors

laurae2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

laurae's Issues

Error in read.dcf(path)

I am having trouble installing Laurae2/Laurae into Rstudio version 3.5.2. I write into R:
devtools::install_github("Laurae2/Laurae")

And receive the following error:
Error in read.dcf(path) :
Found continuation line starting ' modeling. ...' at begin of record.

I have found another user having trouble with a different package, but am unsure how to apply the solution to my answer: psoerensen/qgg#3

Cheers!

Regression

It' s possible to use cascade forest to regression?

daForest

Hello Laurae, this is is not an issue per se, but a question/suggestion... is there a way to create the function of daForest using your cascade & MG scanning algorithms? Thank you

daForest.pdf

lgbm.cv example

I'm having trouble figuring out the example in ?lgbm.cv. It looks like it's on the housing price dataset, but I'm not 100% sure. When I try to run it, I get the following error:

Error in outputs[["Models"]][[i]][["Validation"]] : 
  subscript out of bounds

Do you have a working example that runs on your machine I could try out, to make sure my installation is working?

validation_data=NULL

Hi, I try to set validation_data=NULL, but after 1 Layer it stops by this message: Error in alloc.col(dt1, length(colnames(dt1)) + length(cols)) :
alloccol has been passed a NULL dt
Could you help me? Thanks

lightgbm installation problems

I have follow you instructions to install the lightgbm, and generate the exe and DLL success. What's next?
I refer to the instructions "

If you are using a precompiled dll/lib locally, you can move the dll/lib into LightGBM root folder, modify LightGBM/R-package/src/install.libs.R's 2nd line (change use_precompile <- FALSE to use_precompile <- TRUE), and install R-package as usual
."

But, it failed.

install("C:\Users\szkxpc056\LightGBM\R-package")
Installing lightgbm
"D:/software/R-331.1/bin/x64/R" --no-site-file --no-environ
--no-save --no-restore --quiet CMD INSTALL
"C:/Users/szkxpc056/LightGBM/R-package"
--library="D:/software/R-3.3.1/library" --install-tests
installing source package 'lightgbm' ...
** libs
*** arch - i386
D:/software/Rtools/mingw_32/bin/g++ -std=c++0x -I"D:/software/R-33
1.1/include" -DNDEBUG -I../..//include -I -I -I../compute/include -DUSE_SOCKET -DUSE_GPU=1 -I"d:/Compiler/gcc-4.9.3/local330/include" -fopenmp -pthread -std=c++11 -O2 -Wall -mtune=core2 -c lightgbm-all.cpp -o lightgbm-all.o
In file included from ../../src/treelearner/parallel_tree_learner.h:7:0,
from ../../src/treelearner/data_parallel_tree_learner.cpp:1,
from lightgbm-all.cpp:30:
../../src/treelearner/serial_tree_learner.h:24:45: fatal error: boost/align/aligned_allocator.hpp: No such file or directory
#include <boost/align/aligned_allocator.hpp>
compilation terminated.
make: *** [lightgbm-all.o] Error 1
Warning: 运行命令'make -f "Makevars.win" -f "D:/software/R-331.1/etc/i386/Makeconf" -f "D:/software/R-331.1/share/make/winshlib.mk" CXX='$(CXX1X) $(CXX1XSTD)' CXXFLAGS='$(CXX1XFLAGS)' CXXPICFLAGS='$(CXX1XPICFLAGS)' SHLIB_LDFLAGS='$(SHLIB_CXX1XLDFLAGS)' SHLIB_LD='$(SHLIB_CXX1XLD)' SHLIB="lightgbm.dll" '的状态是2
ERROR: compilation failed for package 'lightgbm'
removing 'D:/software/R-3.3.1/library/lightgbm'
Error: Command failed (1)

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] devtools_1.13.2
loaded via a namespace (and not attached):
[1] httr_1.2.1 R6_2.2.1 tools_3.3.1 withr_1.0.2 curl_2.6 memoise_1.1.0
[7] git2r_0.18.0 digest_0.6.12

Training & testing

Hello Laurae, thanks for your response earlier about my question about emulating daForest.

This time I have a question sort of related to validation_data=NULL #6, in that I want to make sure I understand how to properly do training & testing to try to avoid overfitting. I tried running CascadeForest and got excellent results on training and held out validation data (where I knew the labels), but when I applied the model to test data (exclusive of my train & validation data, and where I did not know the labels but the contest website gave me my score), the model did not perform very well. So, I believe I am overfitting.

Basically, I trained CascadeForest using d_train & d_valid like this:

CascadeForest(training_data = d_train,
validation_data = d_valid,
training_labels = labels_train,
validation_labels = labels_valid, ...)

Where: d_train & labels_train = predictor columns & known labels (65% of my total training data)
d_valid & labels_valid = predictor columns & known labels, exclusive of d_train (the other 35% of my total training data).

My AUC was something like 0.96 when I then predicted on d_train and also when I predicted on d_valid. So, that made me happy, and I then applied the predict function to d_test, which is exclusive of d_train and d_valid, and where I don't know the true labels, but I submitted my predictions to the contest website and got a 0.75 AUC, not nearly as good as 0.96.

So that made me think I should use cross-validation in CascadeForest, like this:

CascadeForest(training_data = d_alltrain,
validation_data = NULL,
training_labels = labels_alltrain,
validation_labels = NULL, ...)

Where: d_alltrain is all my training data (= 65% + 35% = 100%), and labels_alltrain is all my known labels for all my training data.

But, I got the error as noted in validation_data=NULL #6. I have not yet tried the solution you suggest to fix the lines of code to work for cross-validation, but is this the proper way to do cross-val? Then if I get a good AUC indicated by the model (on cross-val d_alltrain), and then apply the model to d_test then that is the proper way to try to avoid overfitting and I should hope for a better score?

Thank you very much.

get.max_acc fails to calculate correct accuracy on edge case

Hi Laurae,
Congrats with this superb library.
With few observations xgboost some times returns the same probability to each one, and having the same probability with different labels makes get.max_acc report the inverse of accuracy & the max threshold also is incorrect if we assume that positive obs are > threshold (not >=)

To reproduce it:
get.max_acc(rep(0.502,8),c(0,1,1,1,1,1,1,1))

It gives 0.125 accuracy with 0.502 threshold

Thanks for your time

Error when using xgb.max_f1 as evaluation metric

Hi Laurae,

Thanks for the very nice and useful package! I'm having some troubles when using xgb.max_f1 as evaluation metric for xgb.train. The error comes from sum(labels): "invalid 'type' (closure) of argument". This happens because labels is not defined, and thus, it's a function from the "base" package. Is it possibly related to the version of the XGBoost package I'm using (i.e., 1.3.2.1)? I checked the xgb.train function code and no labels object is returned by the function.

Best regards,
Paulo.

Installation failed: Timeout was reached

Trying to install the Laurae2/sparsity package so i can convert from sparse matrix to SVMLight. however, i keep getting this error message Installation failed: Timeout was reached.
I am behind no proxy, and have all dependent packages installed.
I've also tweaked around with options(download.file.method = "") with various methods but to no avail. Searched the web, tried everything and still no success. Is the package still available for download and installation? I'm using R version:

platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 4.0
year 2017
month 04
day 21
svn rev 72570
language R
version.string R version 3.4.0 (2017-04-21)
nickname You Stupid Darkness

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.