renatogeh / gospn Goto Github PK

A free, open-source inference and learning library for Sum-Product Networks (SPN)

License: BSD 3-Clause "New" or "Revised" License

Go 95.26% Shell 4.06% C 0.67%

spn deep-learning graph pgm statistics probability golang image-reconstruction natural-language-processing classification

gospn's People

Contributors

Stargazers

Watchers

Forkers

dconaty bionicles h7474 vseledkin mattconce

gospn's Issues

Should the project have its own website?

I believe that the project should have its own website, preferably a simple one built with github pages. I think it's a good way to register activities, improvements and research on the field.

I would be happy to make this happen if approved :)

Implement language modeling

Description: Implement a language modeling SPN according to the paper Language Modeling with Sum-Product Networks (Cheng et al) INTERSPEECH 2014.

File: models/language.go

References:

http://spn.cs.washington.edu/papers/is14.pdf

OPTICS (Ordering Points To Identify the Clustering Structure) is a clustering algorithm similar to DBSCAN. DBSCAN's major weakness is density tuning. OPTICS attempts to address this issue by ordering points and choosing the best epsilon.

We currently have an incomplete OPTICS implementation at utils/cluster/optics.go. LearnSPN relies heavily on both clustering and variable independence, and having OPTICS should increase its performance.

This isn't a priority though, as plenty other more interesting structure learning algorithms have sprung up recently.

Fix Gens-Domingos complexity analysis

Description: My Gens-Domingos complexity analysis is wrong. It currently only takes into account the current iteration of clustering-independencies. It omits the recursive calls to each child it creates.

File: doc/analysis/analysis.tex

Add a GaussianMixture node

Description: A mixture of k gaussians can be represented as a sum node with k gaussians as children. Learning the weights would then be done with EM.

File: spn/gausmix.go

Improve digits dataset documentation

Description::** Our custom dataset digits has no proper documentation. We should follow add more information regarding the dataset, as shown in [1]. Another idea is to move the dataset to a completely distinct repository, referencing said repository in the GoSPN docs. If this be the case, we should also remove all other datasets (i.e. caltech and olivetti) from this repository.

File: data/digits/README.md

References:

[1] https://www.eol.ucar.edu/projects/trex/dm/documents/data_doc.html

Implement the Vergari-diMauro structure learning algorithm

Description: Implement the Vergari-diMauro structure learning algorithm from the paper Simplifying, Regularizing and Strengthening Sum-Product Network Structure Learning. ECMLPKDD 2015.

File: learn/vergari.go

References:

http://www.di.uniba.it/~vergari/papers/Simplifying,%20Regularizing%20and%20Strengthening%20Sum-Product%20Network%20Structure%20Learning.pdf

Add CSV dataset support

Description: GoSPN only supports a custom .data and .arff dataset formats. We plan on adding .csv support.

File: io/csv.go

Implement DeriveWeights

Description: DeriveWeights should compute the derivative dS/dW, where W is the multiset of weights in S. It should allow the user to choose a data structure to perform the graph search (e.g. DFS with a stack, BFS with a queue). It also should use Storer as a DP table.

File: learn/derive.go

Test coverage

GoSPN currently has little to no coverage of modules. Go has a native unit test library, which we should use.

Though marked as good first issue, we obviously do not expect a single PR with all changes. Instead, please send PRs addressing one checkbox at a time.

The following is our goal coverage list:

app module:
- Classification: should achieve at least a certain percentage of accuracy in an artificial dataset (e.g. DigitsX) (app/image.go)
- Completion: should not pass a certain mean square error threshold of the original image's crop (app/image.go)
common module:
- HSV to RGB (common/color.go)
- ApproxEqual (common/equal.go)
- Queue tests (common/queue.go)
  - Enqueue
  - Dequeue
  - Stress test
- Stack tests (common/stack.go)
  - Enqueue
  - Dequeue
  - Stress test
conc module:
- SingleQueue tests on race condition (conc/queue.go)
data module:
- Dataset manipulation (data/manipulate.go)
  - Cascade Rounding
  - ExtractLabels
  - Partition
  - PartitionByLabels
  - Split
  - Copy
  - Shuffle
  - Join
  - SubtractLabel
  - SubtractVariable
  - Identical
  - MergeLabel
  - Divide
- Dataset download test
io module:
- ARFF dataset (io/arff.go)
  - ARFFToData
  - ParseArff
- HTTP (io/http.go)
  - DownloadFromURL
- Image I/O (io/images.go)
  - SplitHalf
- DATA dataset (io/input.go)
  - ParseData
  - ParseDataNL
  - ParseEvidence
  - ParsePartitionedData
- NPY dataset (io/npy.go)
  - ReadBalanced
  - ReadAll
  - Read
  - Reset
- Output (io/output.go)
  - DrawGraphTools
  - DrawGraph
- SPN I/O (io/spn.go)
  - SaveSPN
  - LoadSPN
learn module:
- SPN derivation (learn/derive.go)
  - DeriveSPN
  - DeriveWeights
  - DeriveWeightsBatch
  - DeriveApplyWeights
  - DeriveHard
- Discriminative gradient descent (learn/discriminative.go)
  - DiscriminativeGD
  - DiscriminativeHardGD
  - DiscriminativeBGD
  - DiscriminativeHardBGD
  - applyDGD
  - storeDGD
  - applyDGDFrom
  - applyHDGD
- Generative gradient descent (learn/generative.go)
  - GenerativeGD
  - GenerativeHardGD
  - GenerativeBGD
  - GenerativeHardBGD
  - applyFastGD
  - applyGD
  - applyFastHGD
  - applyHGD
- Variable and scope (learn/variable.go)
  - GobEncode
  - GobDecode
  - ExtractInstance
  - CompleteDataToMatrix
  - DataToMatrix
  - MatrixToData
  - ReflectScope
  - CopyScope
  - DataToVarData
- Dennis-Ventura clustering structure learning
- Gens-Domingos LearnSPN structure learning
- Poon-Domingos dense architecture
score module:
- Confusion matrix
- Score registration
spn module:
- Gaussian node (spn/gaussian.go)
- Product node (spn/product.go)
- Sum node (spn/sum.go)
- Indicator node (spn/indicator.go)
- Multinomial node (spn/multinom.go)
- Breadth and depth first search (spn/search.go)
- SPN serialization (spn/serial.go)
- Dynamic programming storing (spn/storer.go)
- Topological sorting (spn/topo.go)
- Exact inference bottom-up pass (spn/utils.go)
  - Inference
  - InferenceY
  - StoreInference
- Max-product algorithm (spn/utils.go)
  - StoreMAP
  - TraceMAP
- NormalizeSPN (spn/utils.go)
- ComputeHeight (spn/utils.go)
- ComputeScope (spn/utils.go)
- Complete (spn/utils.go)
- Decomposable (spn/utils.go)
sys module:
- RandComb (sys/rand.go)
utils module:
- Logarithmic operations (utils/log.go)
  - LogSumLog
  - LogSum
  - LogProd
  - LogSumPair
  - Trim
  - LogSumExp
  - LogSumExpPair
- Statistic functions (utils/stats.go)
  - Mean
  - StdDev
  - MuSigma
  - PartitionQuantiles
- UnionFind (utils/unionfind.go)
- Clustering algorithms
  - Auxiliary clustering functions (utils/cluster.go)
  - DBSCAN
  - K-means
  - K-medoid
  - K-mode
  - OPTICS
- Variable independence tests
  - Chi-square Pearson test
  - G-test
  - Independence graph

Implement the Dennis-Ventura structure learning algorithm

Description: Implement the Dennis-Ventura structure learning algorithm from the paper Learning the Architecture of Sum-Product Networks Using Clustering on Variables NIPS 25.

File: learn/dv.go

References:

http://papers.nips.cc/paper/4544-learning-the-architecture-of-sum-product-networks-using-clustering-on-variables.pdf

got NaN value

hi @RenatoGeh

I tried run go main.go

I got the following error (sample):
...
Creating new leaf...
Sample size: 321, scope size: 1
Creating new leaf...
Sample size: 321, scope size: 1
Creating new leaf...
Testing instance 0. Should be classified as 8.
Pr(X=0|E) = antilog(NaN) = NaN
Pr(X=1|E) = antilog(NaN) = NaN
Pr(X=2|E) = antilog(NaN) = NaN
Pr(X=3|E) = antilog(NaN) = NaN
Pr(X=4|E) = antilog(NaN) = NaN
Pr(X=5|E) = antilog(NaN) = NaN
Pr(X=6|E) = antilog(NaN) = NaN
Pr(X=7|E) = antilog(NaN) = NaN
Pr(X=8|E) = antilog(NaN) = NaN
Pr(X=9|E) = antilog(NaN) = NaN
Pr(X=10|E) = antilog(NaN) = NaN
Pr(X=11|E) = antilog(NaN) = NaN
Pr(X=12|E) = antilog(NaN) = NaN
Pr(X=13|E) = antilog(NaN) = NaN
Pr(X=14|E) = antilog(NaN) = NaN
...

Is it ok got NaN value?

THanks
hcr

Implement generative learning

Description: Implement generative parameter learning.

File: learn/generative.go

References:

Tasks:

Gradient descent
Hard expectation-maximization

Implement discriminative learning

Description: Implement discriminative parameter learning.

File: learn/discriminative.go

References:

Implement the Poon-Domingos SPN structure

Description: Implement the Poon-Domingos SPN structure for local dependency as described in [1] and [2]. Rename learn/poon.go to model/poon.go, since it is not a learning algorithm, but a modeling for images and any local dependency dataset.

File: model/poon.go

References:

Implement hard and soft EM clustering for the Gens-Domingos algorithm.

Description: Our Gens implementation currently clusters instances with the DBSCAN and alternatively k-means clustering algorithms. We wish to add the EM clustering as cited in [1]. Additionally, we intend on implementing both hard and soft EM, as explained in the following quote extracted from [1]:

For soft EM, where instances can be fractionally assigned to clusters, T needs to be extended with a weight for each instance, and each instance is passed to each cluster it has nonzero weight in. However, this is considerably less efficient than hard EM, where each instance is wholly assigned to its most probable cluster, and we will use the latter method.

DBSCAN and k-means function like hard EM, assigning instances to the most problable cluster.

Files: learn/gens.go, utils/cluster/em.go

References:

[1] http://spn.cs.washington.edu/papers/slspn.pdf

Why use DBSCAN?

Your work is really awesome!

EM is used in LearnSPN, but DBSCAN is used in this project. I wonder Why? Does it have better performance?

Thank you.

Add continuous variables support

Description: Add continuous variable support for both dataset parsing and Gens-Domingos.

Files: io/arff.go, io/csv.go, learn/gens.go

TensorFlow integration

We seek to integrate TensorFlow with the goal of implementing RAT-SPNs as proposed in [Peharz et al 2018]. However, as of yet, only Python is covered by the API stability guarantees, meaning it's the only language with the API to build models. Having said that, the Go API is still capable of training the model and running inference.

This means that if we were to add TF integration, we would have to add a Python layer for model creation. Once the model has been created, the user would then be able to export it to GoSPN for learning, running inference or whatever else GoSPN may offer.

This is obviously not ideal, but until Google provides a full API for C++, C or Go, this is the best we can do. Suggestions on how to deal with this are welcome.

References

[Peharz et al 2018] - https://arxiv.org/pdf/1806.01910.pdf

Implement Fisher's Exact Test

Fisher's exact test is a contingency table variable independence test ideal for small sized datasets. GoSPN currently implements Pearson's Chi-Square test and the G-test, both of which provide approximations that are quite bad on smaller sample sizes. Since LearnSPN recursively splits data at each step downsizing samples potentially in an exponential rate, an exact test is preferred.

There are, as of yet (and as far as I know), no implementation of the Fisher exact test for the general mxn contingency table in Go, C or C++. I have found implementations in R and Python, though I'm not sure the latter is correctly implemented (we should definitely implement unit tests here).

The test itself should be placed at utils/indep/fisher.go, and its unit test under utils/indep/fisher_test.go.

Implement DeriveSPN

Description: DeriveSPN should compute the derivative dS/dS_i of each node in an SPN. It should allow the user to choose a data structure to perform the graph search (e.g. DFS with a stack, BFS with a queue). It also should use Storer as a DP table.

File: learn/derive.go

renatogeh / gospn Goto Github PK

gospn's People

Contributors

Stargazers

Watchers

Forkers

gospn's Issues

Recommend Projects

Recommend Topics

Recommend Org