Giter Club home page Giter Club logo

barmpy's People

Contributors

dvbuntu avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

barmpy's Issues

Implement Posterior Mean Model

BARN models currently only return a single ensemble from the posterior distribution (i.e. a single MCMC replicate). BART, however, allows returning an average over multiple MCMC iterations. Doing such averaging means the final model approximates the expected value of the posterior distribution, not just a single sample from it. This may improve modeling results in some contexts, especially if the variance in the posterior is relatively large (measured by the model sigma estimate).

Practically, there are a few considerations. First, because successive MCMC iterations are correlated, we only want to sample every so many steps (anecdotally, the integrated autocorrelation time is about 7 steps, but that depends on the problem, ensemble size, and other parameters). From a computational perspective, we can save some effort if the same model within the ensemble stays the same (i.e. declines to transition) between two samples in the average. In that case, we can just double weight that model. This requires some additional bookkeeping over just saving every Kth ensemble separately.

The actual output should probably be saved as a new ensemble model (even a barmpy.barn.BARN object itself), just with num_nets*M total networks, where M is the number of samples from the posterior to average over. The final output should also divide by M to ensure it's an average, or we can adjust the weights of the final NN layer to scale similarly (i.e. divide those weights by M instead and sum over the various ensembles).

Develop a contribution guide and code of conduct

To better encourage and manage user-submitted contributions like new methods and custom callbacks, we should add both a contribution guide walking through the process as well as code of conduct to set expectations.

Contribution guide can be a markdown file with a small example, say a custom callback. It should walk a user through the steps of integrating this feature into barmpy. Namely:

  1. Fork/branch with a single new feature implemented and unit test created (if applicable).
  2. Pull request created describing contribution.
  3. Review by one of the barmpy maintainers
  4. Possible revisions.
  5. Merging complete features.

As a practical matter, features added by the primary developers (i.e. Dr. Van Boxel) will likely continue on the main branch directly for now.

The code of conduct can be a short statement, likely as part of the contribution guide, asserting how to engage with the barmpy community. We can pull some examples from https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-code-of-conduct-to-your-project, but the short answer will be treating people with respect, understanding that different opinions can exist, and keeping discussion within barmpy focused on the development of this project (i.e. not wider mathematical discussion, however fun that may be).

Implement BART in `barmpy` library

It'd be great to have a Python implementation of BART in barmpy! Note that BARTPy exists, but it hasn't been updated in several years. It would still serve as an excellent starting place.

This issue should also do some refactoring to barmpy.barn so generic routines can be used in both BARN and BART. That will help future features like BAR-Support Vector Machines and the like.

Port `barmpy` to PyMC

PyMC is a python library that tries to fit bayesian models with Markov Chain Monte Carlo. BARN essentially fits that mold, so it would be instructive and potentially useful to port barmpy to that ecosystem. PyMC is a different approach from sklearn, however, so there may be a bit of learning curve. Some good first steps:

  1. Understand PyMC-BART
  2. Port BARN to PyMC, using PyMC-BART as a starting point
  3. Extract only needed MCMC components from PyMC to be used in BARN, keeping sklearn compatibility.

Implement Bayesian Additive Regression Support Vector Machines

BART and BARN exist, but Support Vector Machines (SVMs) are another machine learning method that might be useful to ensemble this way, giving us Bayesian Additive Regression SVMs (BARS).

BARS most likely will define its state space as the hyperparameters for the kernels (i.e. $d$ in the polynomial kernel, $(\langle x_i, x_j\rangle +1)^d$). What's required then is a transition probability, $T(d,d')$ (say, 50-50 on d+-1), a prior probability, $P(d)$ (some kind of discrete probability in this case perhaps), and the evidence likelihood. This last one is tricky, because if we train the SVMs the way we train NNs in BARN, then the learned parameters are not part of the MCMC state space. So we might as well approximate the integral over them with that maximum likelihood est just like in BARN (use the exact same logic, even?).

Practically, we can use SVMs from sklearn. We'll need an extra argument for the kernel, and we can have the kernel choice affect the prior and transition function as needed. So maybe start with a polynomial kernel, then try a gaussian kernel. Slowly add more kernels, generalizing as we go.

After implementing, we should do some extensive analysis of how well BARS does on benchmark data. Is it better than BARN? Maybe it's faster? This could make a great paper.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.