Sometimes it can be a daunting task to select the appropriate model for a given datase

You should look through the source in <a href="https://github.com/tbreloff/OnlineA

High level model builder about onlinestats.jl HOT 14 CLOSED

joshday commented on May 22, 2024

High level model builder

from onlinestats.jl.

Comments (14)

tbreloff commented on May 22, 2024 1

Oh man... I can't believe this was a year ago!

from onlinestats.jl.

joshday commented on May 22, 2024

This is an interesting idea. I don't think I've seen anything that tries to automate model selection like this. It would be an easy and powerful tool, especially since many online algorithms can be designed to be self-tuning. I'm intrigued. Let's talk more.

from onlinestats.jl.

Evizero commented on May 22, 2024

Hi! Just here to drop some links

I've seen many graphics which essentially create a decision tree given lots of high-level information about a data problem, and point you at the right solution type (linear regression vs logistic regression vs dimensionality reduction vs SVM vs random forests vs ???)

The most famous one is probably from scikit-learn

... an ensemble framework which could choose lots of candidate models for you and drop/average/vote on the best predictions. In the online setting, ensembles could be relatively cheap, even for large datasets (especially if the online algorithm allows for parallel fitting)

There is an interesting reference python implementation concerning automatic ensemble building

from onlinestats.jl.

tbreloff commented on May 22, 2024

Thanks for the links. I'm starting to work on ensembles in my package OnlineAI.jl, which extends OnlineStats. I'll certainly use this as a reference.

On Sep 4, 2015, at 1:26 PM, Christof Stocker [email protected] wrote:

Hi! Just here to drop some links

I've seen many graphics which essentially create a decision tree given lots of high-level information about a data problem, and point you at the right solution type (linear regression vs logistic regression vs dimensionality reduction vs SVM vs random forests vs ???)

The most famous one is probably from scikit-learn

... an ensemble framework which could choose lots of candidate models for you and drop/average/vote on the best predictions. In the online setting, ensembles could be relatively cheap, even for large datasets (especially if the online algorithm allows for parallel fitting)

There is an interesting reference python implementation concerning automatic ensemble building

—
Reply to this email directly or view it on GitHub.

from onlinestats.jl.

Evizero commented on May 22, 2024

What is your position on callback functions? (this question actually goes to both of you for OnlineAi and OnlineStats). You two seem to be doing a very good job and also seem to be very active so I would really love to use your work where it makes sense. I do have the design restriction that I would require callback functions that ideally support early stopping. OnlineStats seems to offer this if I use the low-level API as far as I can tell with the update! methods.

Background: I am working on a supervised learning front end (somewhat inspired by scikit learn and caret among others) where I also work on data abstractions for file streaming / in-memory data sets in various forms. I am currently investigating what libraries to use as back-end for specific things. Deterministic optimization seems pretty much set (pending some PRs / issues here and there) on Optim.jl for low-level access, and Regression.jl. Where I am unsure is stochastic optimization. There is SGDOptim.jl but it's not really actively (at least visible) being worked on. I'm also considering Mocha.jl but it does come with a lot of baggage. Your two projects seem very promising in that regard.

What are your thoughts on this?

from onlinestats.jl.

tbreloff commented on May 22, 2024

You should look through the source in
https://github.com/tbreloff/OnlineAI.jl/tree/master/src/nnet. I'm working on a bunch of things that you might be interested in, including various ways to split and sample static datasets, various stochastic gradient algorithms, and lots of cool (and easy to use!) neural net stuff... Dropout, regularization, flexible cost functions and activations, and even a normalization technique that I haven't seen anywhere else which I converted into an online algorithm (google "Batch Normalization"). In my opinion, it's much easier to use than something like Mocha.jl, and opens up streaming or parallel algorithms for big data sets. Not to mention you can combine and leverage all of OnlineStats, including the cool "stream" macro I made.

As for you questions on callbacks... My thought is that the functionality of nnet/solver.jl will end up embedded in the update function, and things like early stopping could be accomplished by setting certain flags and occasionally triggering callbacks to check against a validation set. I'm still actively thinking through design, and my goal is for something that should cover your needs.

On Sep 4, 2015, at 2:46 PM, Christof Stocker [email protected] wrote:

What is your position on callback functions? (this question actually goes to both of you for OnlineAi and OnlineStats). You two seem to be doing a very good job and also seem to be very active so I would really love to use your work where it makes sense. I do have the design restriction that I would require callback functions that ideally support early stopping. OnlineStats seems to offer this if I use the low-level API as far as I can tell with the update! methods.

Background: I am working on a supervised learning front end (somewhat inspired by scikit learn and caret among others) where I also work on data abstractions for file streaming / in-memory data sets in various forms. I am currently investigating what libraries to use as back-end for specific things. Deterministic optimization seems pretty much set (pending some PRs / issues here and there) on Optim.jl for low-level access, and Regression.jl. Where I am unsure is stochastic optimization. There is SGDOptim.jl but it's not really actively (at least visible) being worked on. I'm also considering Mocha.jl but it does come with a lot of baggage. Your two projects seem very promising in that regard.

What are your thoughts on this?

—
Reply to this email directly or view it on GitHub.

from onlinestats.jl.

Evizero commented on May 22, 2024

I am absolutely interested in the neural net stuff. I will look into the code in close detail.

Concerning callbacks: I do have some time before I get to include stochastic optimization, so don't feel rushed.

Something that troubles me at first glance: Do I see right that you use the matrix rows to denote observations? I know this is the usual notation in textbooks but as far as I know from julia using the columns to denote the observations is better for performance because of the array memory layout

from onlinestats.jl.

tbreloff commented on May 22, 2024

Yes I think Josh and I were both more concerned with getting the code correct... I made the decision early on that I could live with the performance implications of row-based matrices. I'm holding out hope that we'll have performant row-based array storage in Julia at some point (even if I have to implement it myself), because no matter how hard I try I find column-based storage annoying to use.

On Sep 4, 2015, at 3:51 PM, Christof Stocker [email protected] wrote:

I am absolutely interested in the neural net stuff. I will look into the code in close detail.

Concerning callbacks: I do have some time before I get to include stochastic optimization, so don't feel rushed.

Something that troubles me at first glance: Do I see right that you use the matrix rows to denote observations? I know this is the usual notation in textbooks but as far as I know from julia using the columns to denote the observations is better for performance because of the array memory layout

—
Reply to this email directly or view it on GitHub.

from onlinestats.jl.

tbreloff commented on May 22, 2024

Also remember that you can update one point at a time by looping over the columns of a column-based matrix... You just lose the short helper function which does the loop for you.

On Sep 4, 2015, at 3:51 PM, Christof Stocker [email protected] wrote:

I am absolutely interested in the neural net stuff. I will look into the code in close detail.

Concerning callbacks: I do have some time before I get to include stochastic optimization, so don't feel rushed.

Something that troubles me at first glance: Do I see right that you use the matrix rows to denote observations? I know this is the usual notation in textbooks but as far as I know from julia using the columns to denote the observations is better for performance because of the array memory layout

—
Reply to this email directly or view it on GitHub.

from onlinestats.jl.

Evizero commented on May 22, 2024

because no matter how hard I try I find column-based storage annoying to use

I absolutely agree on that.

~~However, it does kinda make it hard to interface the library when the column-based format (which I do)~~ looping through the columns should probably do the trick for me as you just described.

I have seen the TransposeView{T} which seems like a good way to internally pretend it's a row-based index. Maybe that might be a solution to make use of the column based performance without the sacrifice of code clarity. Or what is this class for?

from onlinestats.jl.

tbreloff commented on May 22, 2024

TransposeView may work for this (or at least be the beginning of an implementation). I made it so that I could create "tied matrices" in stacked autoencoders... Essentially the weight matrix from one layer is the transpose of the weight matrix from a previous layer. This was straightforward since the layers now share the same underlying matrix.

On Sep 4, 2015, at 6:06 PM, Christof Stocker [email protected] wrote:

because no matter how hard I try I find column-based storage annoying to use
I absolutely agree on that.

However, it does kinda make it hard to interface the library when the column-based format (which I do) looping through the columns should probably do the trick for me as you just described.

I have seen the TransposeView{T} which seems like a good way to internally pretend it's a row-based index. Maybe that might be a solution to make use of the column based performance without the sacrifice of code clarity. Or what is this class for?

—
Reply to this email directly or view it on GitHub.

from onlinestats.jl.

joshday commented on May 22, 2024

I've been traveling...Tom seems to have your questions well covered, but I'll chime in here. I'd love to stay updated with what you're working on and what you'd like to see in OnlineStats. My next OnlineStats project is variance components models, but I'm happy to work on things people are actually using.

from onlinestats.jl.

joshday commented on May 22, 2024

This is definitely JuliaML material.

from onlinestats.jl.

joshday commented on May 22, 2024

Is this essentially the birthplace for @tbreloff's vision of JuliaML? It's a part of history, now.

from onlinestats.jl.

High level model builder about onlinestats.jl HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent