danielkorzekwa / bayes-scala Goto Github PK
View Code? Open in Web Editor NEWBayesian Networks in Scala
License: Other
Bayesian Networks in Scala
License: Other
Command line for replication of the issue :
mvn -Dtest=CanonicalGaussianTest#constructor_linear_gaussian test
Results :
Failed tests: constructor_linear_gaussian(dk.bayes.math.gaussian.CanonicalGaussianTest): expected:<-1.828012123484645[4]> but was:<-1.828012123484645[1]>
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
The test is concise so there is an obvious fix.
def apply(a: Matrix, b: Double, v: Double): CanonicalGaussian = apply(a,Matrix(b),Matrix(v))
Is there a reason not for doing so? Performance? Better numeric stability?
Hi Daniel,
I've finally managed to create a visualization for Bayesian Networks constructed from Categoricals.
Check out the README of the Gist here:
https://gist.github.com/nightscape/c2fcccac859b3ae34c99#file-readme-md
Could you check if it runs on your machine?
If so we can think about how to maybe integrate this into bayes-scala :)
Best and thanks again for your help!
Martin
It's free for open source projects
Nguyen et al. Automated Variational Inference for Gaussian Process Models, 2014
Background:
Predicting sales across different items/stores using multi task gp model with shared parameters is fairly limited. Gaussian Process Regression Networks should help in modelling correlations between items/stores.
Test data: Walmart umbrella competition.
To read:
Multi output GP - shared covariance function on input-dependent features
Correlated outputs via a linear combination of GPs (fixed W coefficient matrix for combining multiple GPs)
GPRN - Correlation between multiple outputs are adaptive. Every cell of W matrix for mixing multiple GPs is a Gaussian Process itself.
Convolved GPs - Convolving kernel functions for multiple outputs with Gaussian white noise allows for obtaining a positive semidefinite covariance matrix
Variational inference + Stochastic variational inference
Other multi output algorithms for non GP models
As we actually plan to start using bayes-scala in production we would like to see a fixed release.
Currently the only way of adding bayes-scala as a dependency is copying the source or a compiler .jar to a project's lib
directory, am I correct? But it would be really nice if you could just add it to libraryDependencies
or a Maven build configuration file.
Since scalalogging
is not further developed (last commit 7 months ago) and is not available for scala 2.11.x I suggest you drop it and migrate to plain slf4j
Hi,
I'm working a continuous bayesian network with hidden variables, can EM in bayes-scala handle this case?
Thanks!
Kaiyang
I have started a proof of concept for a more fluent API. And I have already rewritten the test networks using this API : SprinklerBN, StudentBN and even TennisDBN.
It is a first draft and it does not support everything (eg only discrete factors) but I wanted to have a first feedback. I clearly need to understand more bayes-scala before stating it is even an improvement.
What's your opinion?
The library should have slf4j-log4j12
in test scope so the user can choose his logger framework (slf4j binding)
I'm browsing through the examples and wondering why the data structures are built in an imperative manner as opposed to declarative? Is there a particular reason for this to be the case?
Hi,
I see that you can define the graphical model based on tables of probabilities of variables (CPD) for Bayes-Scala. Is it possible to define factors with discriminative features, with arbitrary feature functions? For example a factor with the following potential functions:
and the overal distribution will be of the following form:
In fact, I am not sure, implementation-wise, how different this is from your examples (like when you have the factors as proper probability distributions)
Thanks
I was having an issue with the common benchmarking Bayesian network "B" and so decided to test this with a much smaller network, commonly used when learning what a Bayesian network is (Grass Wet).
The code I am using to represent this network is shown below:
var loopyBP: LoopyBP = _
def loadNetwork() {
var rain = Var(1, 2)
var sprinkler = Var(2, 2)
var grasswet = Var(3, 2)
var rainFac = Factor(Array(rain), Array(0.3, 0.7))
var sprinklerFac = Factor(Array(sprinkler, rain), Array(0.01, 0.99, 0.7, 0.3))
var grasswetFac = Factor(Array(grasswet, sprinkler, rain), Array(1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0))
var clusterGraph = GenericClusterGraph()
clusterGraph.addCluster(1, rainFac)
clusterGraph.addCluster(2, sprinklerFac)
clusterGraph.addCluster(3, grasswetFac)
clusterGraph.addEdges((1, 2), (1, 3), (2, 3))
loopyBP = LoopyBP(clusterGraph)
loopyBP.calibrate()
}
However when adding the edges the following error is obtained:
"requirement failed: Sepset must contain single variable only"
Which can be found within this code:
private def calcSepsetVariable(cluster1: Cluster, cluster2: Cluster): Var = {
val intersectVariables = cluster1.getFactor().getVariables().intersect(cluster2.getFactor().getVariables())
require(intersectVariables.size == 1, "Sepset must contain single variable only")
val intersectVariable = intersectVariables.head
intersectVariable
}
Essentially as far as I am aware this algorithm is saying the network is invalid when actually as a Bayesian network this should be fine.
I am wondering if there is some sort of mistake I am making or if Bayes-Scala does not yet support this?
Continuation of #12
We are on scala 2.11 and the cross-build is already set up so I suppose this would not be too hard?
Currently the setValue takes an int
and wraps it in a Some
so there is no way to go back to None
i use truncate function of Gaussian in bases-scala, but it seems not working
the code:
val truncGaussian = dk.bayes.math.gaussian.Gaussian(0.5, 1).truncate(0.0, true)
for(i<- 0 until 300000){
val d = truncGaussian.draw
println(d)
}
i plot the hist of sampled data draw from truncGaussian, and it is a Gaussian distribution with no truncation
look here: scalanlp/breeze#460
so that numerical stability is improved.
performance improvement
New structure:
dk.bayes.alg - low level bayesian algorithms
dk.bayes.dsl - high level langualge for creating bayesian networks
dk.bayes.math - Various math utils, classes,...
Why?
Requirements:
Use cases for testing:
Hi, Daniel!
First of all, I congratulate you with your project. I was learning scala because I have to use it for my Master Thesis. I wander, is it possible to use "bayes-scala" as "jar" library via maven or sbt builder?
Thanks!
Iván
Would you mind release a cross version build for scala 2.12?
http://www.scala-sbt.org/0.13/docs/Cross-Build.html
Hi there,
I'm currently trying to create a few Bayesian networks for some benchmarking I am doing. However I have run into an issue that I have been unable to resolve. My setup is described below.
I create my variables like so:
//Create variables
alcoholism = Var(1, 2)
vh_amn = Var(2, 2)
hepatotoxic = Var(3, 2)
THepatitis = Var(4, 2)
hospital = Var(5, 2)
...
Then create the factors:
gallstonesFac = Factor(Array(gallstones), Array(0.15307582, 0.84692418))
choledocholithotomyFac = Factor(Array(gallstones, choledocholithotomy), Array(0.03716216, 0.96283784, 0.71028037, 0.28971963))
...
Create a cluster graph:
//Create ClusterGraph
clusterGraph = GenericClusterGraph()
clusterGraph.addCluster(1, alcoholismFac)
clusterGraph.addCluster(2, vh_amnFac)
...
Then finally add the edges and propagate the values with the calibrate function:
//Add edges between clusters in a cluster graph
clusterGraph.addEdges((3, 4), ...)
//Calibrate cluster graph
loopyBP = LoopyBP(clusterGraph)
loopyBP.calibrate()
However when creating the factors there is an issue I am having that I do not understand.
When I create this Factor:
ChHepatitisFac = Factor(Array(transfusion, ChHepatitis, injections, vh_amn), Array(0.13095238...
I get the error "java.lang.IllegalArgumentException: requirement failed: Number of potential values must equal to a product of variable dimensions".
So I traced through the code to find where this issue is coming from. Copying the same code to a local version for debugging, I ended up with this:
val stepSizes: Array[Int] = calcStepSizes(variables)
val dimProduct = stepSizes(0) * variables(0).dim
require(dimProduct == values.size, "Number of potential values must equal to a product of variable dimensions")
def getVariables(): Array[Var] = variables
def getValues(): Array[Double] = values
def calcStepSizes(variables: Array[Var]): Array[Int] = { // Pass in all variables, returns array
val varNum = variables.size // number of vars
val stepSizes = new Array[Int](varNum) // return value equal to size of above
if (varNum == 1) stepSizes(0) = 1 // if only 1 variable, set return value to 1
else { // else
var i = varNum - 1 // i = number of vars - 1
var product = 1 // p = 1
while (i >= 0) { // while i >= 0, looping through vars backwards
stepSizes(i) = product // return value (i) = p
product *= variables(i).dim // p = p * (number of states inside variable(i))
i -= 1 // i = i - 1
}
}
From what I understand a value is produced based on the number of variables and it tries to match the amount of data with this value produced multiplied by the number of states in the first Variable passed into the Factor.
However I do not understand why this code happens. What is it validating? The network I am trying to create in Bayes-Scala is taken from an example given in GenIe (Heplar II), and already converted and tested in multiple other programs so I know the data and variables should match up. When I test a smaller and simpler network (Asia) this works fine.
If I do not change the number of variables and do not change the amount of data, then the only other two possible changes are to change the number of states (which I may have misunderstood as to what that value is) and to change which is the first variable passed into the factor.
I have tried passing in each variable as the first into the factor and none succeed.
I am saying the number of states is the data like: "Present, Absent".
If you can advise on what I may be doing wrong or if there is a limitation with Bayes-Scala that would be great. I was unsure where the best place to post about this issue would be.
Thanks,
Harry
netlib speeds up breeze for big md and mm operations only.
If you want people to use this you better release the project on maven central.
I also suggest a cross-scala0version sbt build for 2.10 and 2.11
this guide should help:
http://www.scala-sbt.org/0.13/docs/Using-Sonatype.html
or
http://www.cakesolutions.net/teamblogs/publishing-artefacts-to-oss-sonatype-nexus-using-sbt-and-travis-ci
Migrate a private walmart repo to bayes-scala, write a tutorial page.
https://www.kaggle.com/c/walmart-recruiting-sales-in-stormy-weather
Do it after competition is completed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.