Giter Club home page Giter Club logo

bayes-scala's People

Contributors

bertranddechoux avatar danielkorzekwa avatar francisdb avatar gitter-badger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bayes-scala's Issues

CanonicalGaussianTest#constructor_linear_gaussian fail

Command line for replication of the issue :

mvn -Dtest=CanonicalGaussianTest#constructor_linear_gaussian test
Results :

Failed tests:   constructor_linear_gaussian(dk.bayes.math.gaussian.CanonicalGaussianTest): expected:<-1.828012123484645[4]> but was:<-1.828012123484645[1]>

Tests run: 1, Failures: 1, Errors: 0, Skipped: 0

The test is concise so there is an obvious fix.

  def apply(a: Matrix, b: Double, v: Double): CanonicalGaussian = apply(a,Matrix(b),Matrix(v))

Is there a reason not for doing so? Performance? Better numeric stability?

Multi task Gaussian process regression

Background:

Predicting sales across different items/stores using multi task gp model with shared parameters is fairly limited. Gaussian Process Regression Networks should help in modelling correlations between items/stores.

Test data: Walmart umbrella competition.

To read:

Multi output GP - shared covariance function on input-dependent features

Correlated outputs via a linear combination of GPs (fixed W coefficient matrix for combining multiple GPs)

GPRN - Correlation between multiple outputs are adaptive. Every cell of W matrix for mixing multiple GPs is a Gaussian Process itself.

Convolved GPs - Convolving kernel functions for multiple outputs with Gaussian white noise allows for obtaining a positive semidefinite covariance matrix

Variational inference + Stochastic variational inference

Other multi output algorithms for non GP models

Publish bayes-scala on the Maven central repository

Currently the only way of adding bayes-scala as a dependency is copying the source or a compiler .jar to a project's lib directory, am I correct? But it would be really nice if you could just add it to libraryDependencies or a Maven build configuration file.

Update scalalogging

Since scalalogging is not further developed (last commit 7 months ago) and is not available for scala 2.11.x I suggest you drop it and migrate to plain slf4j

A more scala-esque interface to the classes

I'm browsing through the examples and wondering why the data structures are built in an imperative manner as opposed to declarative? Is there a particular reason for this to be the case?

Type of features

Hi,
I see that you can define the graphical model based on tables of probabilities of variables (CPD) for Bayes-Scala. Is it possible to define factors with discriminative features, with arbitrary feature functions? For example a factor with the following potential functions:
111
and the overal distribution will be of the following form:
222

In fact, I am not sure, implementation-wise, how different this is from your examples (like when you have the factors as proper probability distributions)

Thanks

Sepsets containing a single variable

I was having an issue with the common benchmarking Bayesian network "B" and so decided to test this with a much smaller network, commonly used when learning what a Bayesian network is (Grass Wet).

image

The code I am using to represent this network is shown below:

  var loopyBP: LoopyBP = _

  def loadNetwork() {
    var rain = Var(1, 2)
    var sprinkler = Var(2, 2)
    var grasswet = Var(3, 2)

    var rainFac = Factor(Array(rain), Array(0.3, 0.7))
    var sprinklerFac = Factor(Array(sprinkler, rain), Array(0.01, 0.99, 0.7, 0.3))
    var grasswetFac = Factor(Array(grasswet, sprinkler, rain), Array(1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0))

    var clusterGraph = GenericClusterGraph()
    clusterGraph.addCluster(1, rainFac)
    clusterGraph.addCluster(2, sprinklerFac)
    clusterGraph.addCluster(3, grasswetFac)

    clusterGraph.addEdges((1, 2), (1, 3), (2, 3))

    loopyBP = LoopyBP(clusterGraph)
    loopyBP.calibrate()
  }

However when adding the edges the following error is obtained:

"requirement failed: Sepset must contain single variable only"

Which can be found within this code:

  private def calcSepsetVariable(cluster1: Cluster, cluster2: Cluster): Var = {
    val intersectVariables = cluster1.getFactor().getVariables().intersect(cluster2.getFactor().getVariables())
    require(intersectVariables.size == 1, "Sepset must contain single variable only")
    val intersectVariable = intersectVariables.head

    intersectVariable
  }

Essentially as far as I am aware this algorithm is saying the network is invalid when actually as a Bayesian network this should be fine.

I am wondering if there is some sort of mistake I am making or if Bayes-Scala does not yet support this?

truncate function of Gaussian seems not work

i use truncate function of Gaussian in bases-scala, but it seems not working
the code:

val truncGaussian = dk.bayes.math.gaussian.Gaussian(0.5, 1).truncate(0.0, true)
for(i<- 0 until 300000){
val d = truncGaussian.draw
println(d)
}

i plot the hist of sampled data draw from truncGaussian, and it is a Gaussian distribution with no truncation

Package structure refactor

New structure:
dk.bayes.alg - low level bayesian algorithms
dk.bayes.dsl - high level langualge for creating bayesian networks
dk.bayes.math - Various math utils, classes,...

Replace ejml with breeze

Why?

  • Simplify the code, use single matrix library instead of two.
  • Breeze with native extenstions seems to be faster than ejml.
  • Breeze is a Scala lib, EJML is a Java lib.
  • Promote breeze, it's a nice lib.

Implement new factor graph

Requirements:

  • compute variable marginals and factor to variable messages only
  • allows for direct access to variables and factors in order to implement custom message passing algorithm.

Use cases for testing:

  • Gaussian Process Classification
  • Hierarchical Gaussian Process Classification

how to use bayes-scala as library

Hi, Daniel!

First of all, I congratulate you with your project. I was learning scala because I have to use it for my Master Thesis. I wander, is it possible to use "bayes-scala" as "jar" library via maven or sbt builder?

Thanks!
Iván

Bayesian Network Issue

Hi there,

I'm currently trying to create a few Bayesian networks for some benchmarking I am doing. However I have run into an issue that I have been unable to resolve. My setup is described below.

I create my variables like so:

    //Create variables
    alcoholism = Var(1, 2)
    vh_amn = Var(2, 2)
    hepatotoxic = Var(3, 2)
    THepatitis = Var(4, 2)
    hospital = Var(5, 2)
    ...

Then create the factors:

    gallstonesFac = Factor(Array(gallstones), Array(0.15307582, 0.84692418))
    choledocholithotomyFac = Factor(Array(gallstones, choledocholithotomy), Array(0.03716216, 0.96283784, 0.71028037, 0.28971963))
    ...

Create a cluster graph:

    //Create ClusterGraph
    clusterGraph = GenericClusterGraph()
    clusterGraph.addCluster(1, alcoholismFac)
    clusterGraph.addCluster(2, vh_amnFac)
    ...

Then finally add the edges and propagate the values with the calibrate function:

    //Add edges between clusters in a cluster graph
    clusterGraph.addEdges((3, 4), ...)

    //Calibrate cluster graph
    loopyBP = LoopyBP(clusterGraph)
    loopyBP.calibrate()

However when creating the factors there is an issue I am having that I do not understand.

When I create this Factor:

ChHepatitisFac = Factor(Array(transfusion, ChHepatitis, injections, vh_amn), Array(0.13095238...

I get the error "java.lang.IllegalArgumentException: requirement failed: Number of potential values must equal to a product of variable dimensions".

So I traced through the code to find where this issue is coming from. Copying the same code to a local version for debugging, I ended up with this:

    val stepSizes: Array[Int] = calcStepSizes(variables)
    val dimProduct = stepSizes(0) * variables(0).dim
    require(dimProduct == values.size, "Number of potential values must equal to a product of variable dimensions")

    def getVariables(): Array[Var] = variables
    def getValues(): Array[Double] = values

  def calcStepSizes(variables: Array[Var]): Array[Int] = {  // Pass in all variables, returns array

    val varNum = variables.size                             // number of vars

    val stepSizes = new Array[Int](varNum)                  // return value equal to size of above

    if (varNum == 1) stepSizes(0) = 1                       // if only 1 variable, set return value to 1
    else {                                                  // else

      var i = varNum - 1                                    // i = number of vars - 1
      var product = 1                                       // p = 1
      while (i >= 0) {                                      // while i >= 0, looping through vars backwards
        stepSizes(i) = product                              // return value (i) = p
        product *= variables(i).dim                         // p = p * (number of states inside variable(i))
        i -= 1                                              // i = i - 1
      }

    }

From what I understand a value is produced based on the number of variables and it tries to match the amount of data with this value produced multiplied by the number of states in the first Variable passed into the Factor.

However I do not understand why this code happens. What is it validating? The network I am trying to create in Bayes-Scala is taken from an example given in GenIe (Heplar II), and already converted and tested in multiple other programs so I know the data and variables should match up. When I test a smaller and simpler network (Asia) this works fine.

If I do not change the number of variables and do not change the amount of data, then the only other two possible changes are to change the number of states (which I may have misunderstood as to what that value is) and to change which is the first variable passed into the factor.

I have tried passing in each variable as the first into the factor and none succeed.

I am saying the number of states is the data like: "Present, Absent".

If you can advise on what I may be doing wrong or if there is a limitation with Bayes-Scala that would be great. I was unsure where the best place to post about this issue would be.

Thanks,
Harry

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.