Comments (14)
Oh man... I can't believe this was a year ago!
from onlinestats.jl.
This is an interesting idea. I don't think I've seen anything that tries to automate model selection like this. It would be an easy and powerful tool, especially since many online algorithms can be designed to be self-tuning. I'm intrigued. Let's talk more.
from onlinestats.jl.
Hi! Just here to drop some links
I've seen many graphics which essentially create a decision tree given lots of high-level information about a data problem, and point you at the right solution type (linear regression vs logistic regression vs dimensionality reduction vs SVM vs random forests vs ???)
The most famous one is probably from scikit-learn
... an ensemble framework which could choose lots of candidate models for you and drop/average/vote on the best predictions. In the online setting, ensembles could be relatively cheap, even for large datasets (especially if the online algorithm allows for parallel fitting)
There is an interesting reference python implementation concerning automatic ensemble building
from onlinestats.jl.
Thanks for the links. I'm starting to work on ensembles in my package OnlineAI.jl, which extends OnlineStats. I'll certainly use this as a reference.
On Sep 4, 2015, at 1:26 PM, Christof Stocker [email protected] wrote:
Hi! Just here to drop some links
I've seen many graphics which essentially create a decision tree given lots of high-level information about a data problem, and point you at the right solution type (linear regression vs logistic regression vs dimensionality reduction vs SVM vs random forests vs ???)
The most famous one is probably from scikit-learn
... an ensemble framework which could choose lots of candidate models for you and drop/average/vote on the best predictions. In the online setting, ensembles could be relatively cheap, even for large datasets (especially if the online algorithm allows for parallel fitting)
There is an interesting reference python implementation concerning automatic ensemble building
—
Reply to this email directly or view it on GitHub.
from onlinestats.jl.
What is your position on callback functions? (this question actually goes to both of you for OnlineAi and OnlineStats). You two seem to be doing a very good job and also seem to be very active so I would really love to use your work where it makes sense. I do have the design restriction that I would require callback functions that ideally support early stopping. OnlineStats seems to offer this if I use the low-level API as far as I can tell with the update! methods.
Background: I am working on a supervised learning front end (somewhat inspired by scikit learn and caret among others) where I also work on data abstractions for file streaming / in-memory data sets in various forms. I am currently investigating what libraries to use as back-end for specific things. Deterministic optimization seems pretty much set (pending some PRs / issues here and there) on Optim.jl for low-level access, and Regression.jl. Where I am unsure is stochastic optimization. There is SGDOptim.jl but it's not really actively (at least visible) being worked on. I'm also considering Mocha.jl but it does come with a lot of baggage. Your two projects seem very promising in that regard.
What are your thoughts on this?
from onlinestats.jl.
You should look through the source in
https://github.com/tbreloff/OnlineAI.jl/tree/master/src/nnet. I'm working on a bunch of things that you might be interested in, including various ways to split and sample static datasets, various stochastic gradient algorithms, and lots of cool (and easy to use!) neural net stuff... Dropout, regularization, flexible cost functions and activations, and even a normalization technique that I haven't seen anywhere else which I converted into an online algorithm (google "Batch Normalization"). In my opinion, it's much easier to use than something like Mocha.jl, and opens up streaming or parallel algorithms for big data sets. Not to mention you can combine and leverage all of OnlineStats, including the cool "stream" macro I made.
As for you questions on callbacks... My thought is that the functionality of nnet/solver.jl will end up embedded in the update function, and things like early stopping could be accomplished by setting certain flags and occasionally triggering callbacks to check against a validation set. I'm still actively thinking through design, and my goal is for something that should cover your needs.
On Sep 4, 2015, at 2:46 PM, Christof Stocker [email protected] wrote:
What is your position on callback functions? (this question actually goes to both of you for OnlineAi and OnlineStats). You two seem to be doing a very good job and also seem to be very active so I would really love to use your work where it makes sense. I do have the design restriction that I would require callback functions that ideally support early stopping. OnlineStats seems to offer this if I use the low-level API as far as I can tell with the update! methods.
Background: I am working on a supervised learning front end (somewhat inspired by scikit learn and caret among others) where I also work on data abstractions for file streaming / in-memory data sets in various forms. I am currently investigating what libraries to use as back-end for specific things. Deterministic optimization seems pretty much set (pending some PRs / issues here and there) on Optim.jl for low-level access, and Regression.jl. Where I am unsure is stochastic optimization. There is SGDOptim.jl but it's not really actively (at least visible) being worked on. I'm also considering Mocha.jl but it does come with a lot of baggage. Your two projects seem very promising in that regard.
What are your thoughts on this?
—
Reply to this email directly or view it on GitHub.
from onlinestats.jl.
I am absolutely interested in the neural net stuff. I will look into the code in close detail.
Concerning callbacks: I do have some time before I get to include stochastic optimization, so don't feel rushed.
Something that troubles me at first glance: Do I see right that you use the matrix rows to denote observations? I know this is the usual notation in textbooks but as far as I know from julia using the columns to denote the observations is better for performance because of the array memory layout
from onlinestats.jl.
Yes I think Josh and I were both more concerned with getting the code correct... I made the decision early on that I could live with the performance implications of row-based matrices. I'm holding out hope that we'll have performant row-based array storage in Julia at some point (even if I have to implement it myself), because no matter how hard I try I find column-based storage annoying to use.
On Sep 4, 2015, at 3:51 PM, Christof Stocker [email protected] wrote:
I am absolutely interested in the neural net stuff. I will look into the code in close detail.
Concerning callbacks: I do have some time before I get to include stochastic optimization, so don't feel rushed.
Something that troubles me at first glance: Do I see right that you use the matrix rows to denote observations? I know this is the usual notation in textbooks but as far as I know from julia using the columns to denote the observations is better for performance because of the array memory layout
—
Reply to this email directly or view it on GitHub.
from onlinestats.jl.
Also remember that you can update one point at a time by looping over the columns of a column-based matrix... You just lose the short helper function which does the loop for you.
On Sep 4, 2015, at 3:51 PM, Christof Stocker [email protected] wrote:
I am absolutely interested in the neural net stuff. I will look into the code in close detail.
Concerning callbacks: I do have some time before I get to include stochastic optimization, so don't feel rushed.
Something that troubles me at first glance: Do I see right that you use the matrix rows to denote observations? I know this is the usual notation in textbooks but as far as I know from julia using the columns to denote the observations is better for performance because of the array memory layout
—
Reply to this email directly or view it on GitHub.
from onlinestats.jl.
because no matter how hard I try I find column-based storage annoying to use
I absolutely agree on that.
However, it does kinda make it hard to interface the library when the column-based format (which I do) looping through the columns should probably do the trick for me as you just described.
I have seen the TransposeView{T}
which seems like a good way to internally pretend it's a row-based index. Maybe that might be a solution to make use of the column based performance without the sacrifice of code clarity. Or what is this class for?
from onlinestats.jl.
TransposeView may work for this (or at least be the beginning of an implementation). I made it so that I could create "tied matrices" in stacked autoencoders... Essentially the weight matrix from one layer is the transpose of the weight matrix from a previous layer. This was straightforward since the layers now share the same underlying matrix.
On Sep 4, 2015, at 6:06 PM, Christof Stocker [email protected] wrote:
because no matter how hard I try I find column-based storage annoying to use
I absolutely agree on that.However, it does kinda make it hard to interface the library when the column-based format (which I do) looping through the columns should probably do the trick for me as you just described.
I have seen the TransposeView{T} which seems like a good way to internally pretend it's a row-based index. Maybe that might be a solution to make use of the column based performance without the sacrifice of code clarity. Or what is this class for?
—
Reply to this email directly or view it on GitHub.
from onlinestats.jl.
I've been traveling...Tom seems to have your questions well covered, but I'll chime in here. I'd love to stay updated with what you're working on and what you'd like to see in OnlineStats. My next OnlineStats project is variance components models, but I'm happy to work on things people are actually using.
from onlinestats.jl.
This is definitely JuliaML material.
from onlinestats.jl.
Is this essentially the birthplace for @tbreloff's vision of JuliaML? It's a part of history, now.
from onlinestats.jl.
Related Issues (20)
- Possible type instability in `OnlineStatsBase.jl` HOT 1
- Group with 3 Stats not working for multi-observations? HOT 3
- Julia VS Code extension reports "Possible method call error" for `fit!` HOT 3
- _fit! on AutoCov is not type stable HOT 1
- Extract field of an observation before feeding an OnlineStats - ValueExtractor wrapper HOT 2
- Feature Request: OnlineStat Chaining HOT 1
- Using StatLag without depending on OnlineStats (just OnlineStatsBase) HOT 4
- ExtremeValues doesn't work HOT 2
- Odd interaction of `Group` with broadcast HOT 2
- [speculative] `NullStat` HOT 1
- Plot of GroupBy of HeatMap fails HOT 1
- when fit!-ing a Group to a NamedTuple, the names are ignored HOT 2
- Documentation Request: List which Monoids support merge HOT 1
- Feature Request: PCA wrapper around CovMatrix which also supports transform methods
- Pretty printing is unpretty inside DataFrame HOT 1
- Support `keys` and `values` on `GroupBy` HOT 1
- Bug: Y-Marginals for heatmap are wrong HOT 1
- Allow counts argument in `fit!` HOT 5
- Suggestions for OnlineStats v2 HOT 1
- Standard Deviation - StdDev HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onlinestats.jl.