danielkorzekwa / bayes-scala Goto Github PK

View Code? Open in Web Editor NEW

205.0 205.0 39.0 3.63 MB

Bayesian Networks in Scala

License: Other

Scala 100.00%

bayes-scala's People

Contributors

Stargazers

Watchers

bayes-scala's Issues

CanonicalGaussianTest#constructor_linear_gaussian fail

Command line for replication of the issue :

mvn -Dtest=CanonicalGaussianTest#constructor_linear_gaussian test

Results :

Failed tests:   constructor_linear_gaussian(dk.bayes.math.gaussian.CanonicalGaussianTest): expected:<-1.828012123484645[4]> but was:<-1.828012123484645[1]>

Tests run: 1, Failures: 1, Errors: 0, Skipped: 0

The test is concise so there is an obvious fix.

  def apply(a: Matrix, b: Double, v: Double): CanonicalGaussian = apply(a,Matrix(b),Matrix(v))

Is there a reason not for doing so? Performance? Better numeric stability?

Graphical visualization of Categoricals using Spark Notebook

Hi Daniel,

I've finally managed to create a visualization for Bayesian Networks constructed from Categoricals.
Check out the README of the Gist here:
https://gist.github.com/nightscape/c2fcccac859b3ae34c99#file-readme-md

Could you check if it runs on your machine?
If so we can think about how to maybe integrate this into bayes-scala :)

Best and thanks again for your help!
Martin

Set up codacy code review and add button

https://www.codacy.com

It's free for open source projects

Implement automated variational inference

Nguyen et al. Automated Variational Inference for Gaussian Process Models, 2014

Multi task Gaussian process regression

Background:

Predicting sales across different items/stores using multi task gp model with shared parameters is fairly limited. Gaussian Process Regression Networks should help in modelling correlations between items/stores.

Test data: Walmart umbrella competition.

To read:

Multi output GP - shared covariance function on input-dependent features

Correlated outputs via a linear combination of GPs (fixed W coefficient matrix for combining multiple GPs)

GPRN - Correlation between multiple outputs are adaptive. Every cell of W matrix for mixing multiple GPs is a Gaussian Process itself.

Convolved GPs - Convolving kernel functions for multiple outputs with Gaussian white noise allows for obtaining a positive semidefinite covariance matrix

Variational inference + Stochastic variational inference

Other multi output algorithms for non GP models

Implementation and Application of the Curds and Whey Algorithm to Regression Problems

Non-snapshot release to maven central

As we actually plan to start using bayes-scala in production we would like to see a fixed release.

Publish bayes-scala on the Maven central repository

Currently the only way of adding bayes-scala as a dependency is copying the source or a compiler .jar to a project's lib directory, am I correct? But it would be really nice if you could just add it to libraryDependencies or a Maven build configuration file.

Update scalalogging

Since scalalogging is not further developed (last commit 7 months ago) and is not available for scala 2.11.x I suggest you drop it and migrate to plain slf4j

EM for continuous bayesian network

Hi,

I'm working a continuous bayesian network with hidden variables, can EM in bayes-scala handle this case?

Thanks!
Kaiyang

A (more) fluent API?

I have started a proof of concept for a more fluent API. And I have already rewritten the test networks using this API : SprinklerBN, StudentBN and even TennisDBN.

It is a first draft and it does not support everything (eg only discrete factors) but I wanted to have a first feedback. I clearly need to understand more bayes-scala before stating it is even an improvement.

What's your opinion?

Do not force a logging implementation

The library should have slf4j-log4j12 in test scope so the user can choose his logger framework (slf4j binding)

A more scala-esque interface to the classes

I'm browsing through the examples and wondering why the data structures are built in an imperative manner as opposed to declarative? Is there a particular reason for this to be the case?

Type of features

Hi,
I see that you can define the graphical model based on tables of probabilities of variables (CPD) for Bayes-Scala. Is it possible to define factors with discriminative features, with arbitrary feature functions? For example a factor with the following potential functions:

and the overal distribution will be of the following form:

In fact, I am not sure, implementation-wise, how different this is from your examples (like when you have the factors as proper probability distributions)

Thanks

Sepsets containing a single variable

I was having an issue with the common benchmarking Bayesian network "B" and so decided to test this with a much smaller network, commonly used when learning what a Bayesian network is (Grass Wet).

The code I am using to represent this network is shown below:

  var loopyBP: LoopyBP = _

  def loadNetwork() {
    var rain = Var(1, 2)
    var sprinkler = Var(2, 2)
    var grasswet = Var(3, 2)

    var rainFac = Factor(Array(rain), Array(0.3, 0.7))
    var sprinklerFac = Factor(Array(sprinkler, rain), Array(0.01, 0.99, 0.7, 0.3))
    var grasswetFac = Factor(Array(grasswet, sprinkler, rain), Array(1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0))

    var clusterGraph = GenericClusterGraph()
    clusterGraph.addCluster(1, rainFac)
    clusterGraph.addCluster(2, sprinklerFac)
    clusterGraph.addCluster(3, grasswetFac)

    clusterGraph.addEdges((1, 2), (1, 3), (2, 3))

    loopyBP = LoopyBP(clusterGraph)
    loopyBP.calibrate()
  }

However when adding the edges the following error is obtained:

"requirement failed: Sepset must contain single variable only"

Which can be found within this code:

  private def calcSepsetVariable(cluster1: Cluster, cluster2: Cluster): Var = {
    val intersectVariables = cluster1.getFactor().getVariables().intersect(cluster2.getFactor().getVariables())
    require(intersectVariables.size == 1, "Sepset must contain single variable only")
    val intersectVariable = intersectVariables.head

    intersectVariable
  }

Essentially as far as I am aware this algorithm is saying the network is invalid when actually as a Bayesian network this should be fine.

I am wondering if there is some sort of mistake I am making or if Bayes-Scala does not yet support this?

deploy bayes-scala for Scala 2.11 to snapshot maven repo

Continuation of #12
We are on scala 2.11 and the cross-build is already set up so I suppose this would not be too hard?

Unsetting the value on a Categorical

Currently the setValue takes an int and wraps it in a Some so there is no way to go back to None

truncate function of Gaussian seems not work

i use truncate function of Gaussian in bases-scala, but it seems not working
the code:

val truncGaussian = dk.bayes.math.gaussian.Gaussian(0.5, 1).truncate(0.0, true)
for(i<- 0 until 300000){
val d = truncGaussian.draw
println(d)
}

i plot the hist of sampled data draw from truncGaussian, and it is a Gaussian distribution with no truncation

replace isIdentical/hasUncountable with isClosy/any from breeze

look here: scalanlp/breeze#460

replace inv(K) with invchol(cholesky(K).t) for Hermitian psd matrices (coviarance matrix)

so that numerical stability is improved.

Use sqDist from bayes-scala-gp for covariance functions

performance improvement

Package structure refactor

New structure:
dk.bayes.alg - low level bayesian algorithms
dk.bayes.dsl - high level langualge for creating bayesian networks
dk.bayes.math - Various math utils, classes,...

Replace ejml with breeze

Why?

Simplify the code, use single matrix library instead of two.
Breeze with native extenstions seems to be faster than ejml.
Breeze is a Scala lib, EJML is a Java lib.
Promote breeze, it's a nice lib.

Implement new factor graph

Requirements:

compute variable marginals and factor to variable messages only
allows for direct access to variables and factors in order to implement custom message passing algorithm.

Use cases for testing:

Gaussian Process Classification
Hierarchical Gaussian Process Classification

how to use bayes-scala as library

Hi, Daniel!

First of all, I congratulate you with your project. I was learning scala because I have to use it for my Master Thesis. I wander, is it possible to use "bayes-scala" as "jar" library via maven or sbt builder?

Thanks!
Iván

Scala 2.12 cross build

Would you mind release a cross version build for scala 2.12?
http://www.scala-sbt.org/0.13/docs/Cross-Build.html

Bayesian Network Issue

Hi there,

I'm currently trying to create a few Bayesian networks for some benchmarking I am doing. However I have run into an issue that I have been unable to resolve. My setup is described below.

I create my variables like so:

    //Create variables
    alcoholism = Var(1, 2)
    vh_amn = Var(2, 2)
    hepatotoxic = Var(3, 2)
    THepatitis = Var(4, 2)
    hospital = Var(5, 2)
    ...

Then create the factors:

    gallstonesFac = Factor(Array(gallstones), Array(0.15307582, 0.84692418))
    choledocholithotomyFac = Factor(Array(gallstones, choledocholithotomy), Array(0.03716216, 0.96283784, 0.71028037, 0.28971963))
    ...

Create a cluster graph:

    //Create ClusterGraph
    clusterGraph = GenericClusterGraph()
    clusterGraph.addCluster(1, alcoholismFac)
    clusterGraph.addCluster(2, vh_amnFac)
    ...

Then finally add the edges and propagate the values with the calibrate function:

    //Add edges between clusters in a cluster graph
    clusterGraph.addEdges((3, 4), ...)

    //Calibrate cluster graph
    loopyBP = LoopyBP(clusterGraph)
    loopyBP.calibrate()

However when creating the factors there is an issue I am having that I do not understand.

When I create this Factor:

ChHepatitisFac = Factor(Array(transfusion, ChHepatitis, injections, vh_amn), Array(0.13095238...

I get the error "java.lang.IllegalArgumentException: requirement failed: Number of potential values must equal to a product of variable dimensions".

So I traced through the code to find where this issue is coming from. Copying the same code to a local version for debugging, I ended up with this:

    val stepSizes: Array[Int] = calcStepSizes(variables)
    val dimProduct = stepSizes(0) * variables(0).dim
    require(dimProduct == values.size, "Number of potential values must equal to a product of variable dimensions")

    def getVariables(): Array[Var] = variables
    def getValues(): Array[Double] = values

  def calcStepSizes(variables: Array[Var]): Array[Int] = {  // Pass in all variables, returns array

    val varNum = variables.size                             // number of vars

    val stepSizes = new Array[Int](varNum)                  // return value equal to size of above

    if (varNum == 1) stepSizes(0) = 1                       // if only 1 variable, set return value to 1
    else {                                                  // else

      var i = varNum - 1                                    // i = number of vars - 1
      var product = 1                                       // p = 1
      while (i >= 0) {                                      // while i >= 0, looping through vars backwards
        stepSizes(i) = product                              // return value (i) = p
        product *= variables(i).dim                         // p = p * (number of states inside variable(i))
        i -= 1                                              // i = i - 1
      }

    }

From what I understand a value is produced based on the number of variables and it tries to match the amount of data with this value produced multiplied by the number of states in the first Variable passed into the Factor.

However I do not understand why this code happens. What is it validating? The network I am trying to create in Bayes-Scala is taken from an example given in GenIe (Heplar II), and already converted and tested in multiple other programs so I know the data and variables should match up. When I test a smaller and simpler network (Asia) this works fine.

If I do not change the number of variables and do not change the amount of data, then the only other two possible changes are to change the number of states (which I may have misunderstood as to what that value is) and to change which is the first variable passed into the factor.

I have tried passing in each variable as the first into the factor and none succeed.

I am saying the number of states is the data like: "Present, Absent".

If you can advise on what I may be doing wrong or if there is a limitation with Bayes-Scala that would be great. I was unsure where the best place to post about this issue would be.

Thanks,
Harry