datamicroscopes / irm Goto Github PK
View Code? Open in Web Editor NEWInfinite relational model (IRM) for datamicroscopes
License: BSD 3-Clause "New" or "Revised" License
Infinite relational model (IRM) for datamicroscopes
License: BSD 3-Clause "New" or "Revised" License
May you recommend some easy reading to understand IRM to be able to use your code?
Hi,
I was wondering if I train the model, how I can sample from the posterior. I want to train the model and sample a partition (assignment of rows/columns) from the posterior. Is this possible?
Thanks
@stephentu not sure what would be causing this - i get an error from the version i installed from conda
Windows installation giver error
Microsoft Windows [Version 10.0.17763.615]
(c) 2018 Microsoft Corporation. All rights reserved.
C:\WINDOWS\system32>conda install -c datamicroscopes -c distributions microscopes-irm
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
PackagesNotFoundError: The following packages are not available from current channels:
Current channels:
To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.
C:\WINDOWS\system32>
When running against the current Conda builds of irm, this script segfaults in a call to delete_group
:
import pickle
import time
import itertools as it
import numpy as np
from multiprocessing import cpu_count
# ###We've made a set of utilities especially for this dataset, `enron_utils`. We'll include these as well.
#
# `enron_crawler.py` downloads the data and preprocesses it as suggested by [Ishiguro et al. 2012](http://www.kecl.ntt.co.jp/as/members/ishiguro/open/2012AISTATS.pdf). The results of the scirpt have been stored in the `results.p`.
#
# ##Let's load the data and make a binary matrix to represent email communication between individuals
#
# In this matrix, $X_{i,j} = 1$ if and only if person$_{i}$ sent an email to person$_{j}$
# In[2]:
import enron_utils
with open('results.p') as fp:
communications = pickle.load(fp)
def allnames(o):
for k, v in o:
yield [k] + list(v)
names = set(it.chain.from_iterable(allnames(communications)))
names = sorted(list(names))
namemap = { name : idx for idx, name in enumerate(names) }
N = len(names)
# X_{ij} = 1 iff person i sent person j an email
communications_relation = np.zeros((N, N), dtype=np.bool)
for sender, receivers in communications:
sender_id = namemap[sender]
for receiver in receivers:
receiver_id = namemap[receiver]
communications_relation[sender_id, receiver_id] = True
print "%d names in the corpus" % N
# ##Let's visualize the communication matrix
# ##Now, let's learn the underlying clusters using the Inifinite Relational Model
#
# Let's import the necessary functions from datamicroscopes
#
# ##There are 5 steps necessary in inferring a model with datamicroscopes:
#1. define the model
#2. load the data
#3. initialize the model
#4. define the runners (MCMC chains)
#5. run the runners
# In[5]:
from microscopes.common.rng import rng
from microscopes.common.relation.dataview import numpy_dataview
from microscopes.models import bb as beta_bernoulli
from microscopes.irm.definition import model_definition
from microscopes.irm import model, runner, query
from microscopes.kernels import parallel
from microscopes.common.query import groups, zmatrix_heuristic_block_ordering, zmatrix_reorder
defn = model_definition([N], [((0, 0), beta_bernoulli)])
views = [numpy_dataview(communications_relation)]
prng = rng()
nchains = 1
latents = [model.initialize(defn, views, r=prng, cluster_hps=[{'alpha':1}]) for _ in xrange(nchains)]
kc = runner.default_assign_kernel_config(defn)
print kc
r = runner.runner(defn, views, latents[0], kc)
# ##From here, we can finally run each chain of the sampler 1000 times
# In[ ]:
start = time.time()
print start
r.run(r=prng, niters=1)
print "inference took {} seconds".format(time.time() - start)
Here's the output from lldb:
(irm-test)4dmac:examples 4d$ lldb -- `which python` enron_test.py
(lldb) target create "/Users/4d/anaconda/envs/irm-test/bin/python"
Current executable set to '/Users/4d/anaconda/envs/irm-test/bin/python' (x86_64).
(lldb) settings set -- target.run-args "enron_test.py"
(lldb) r
Process 22921 launched: '/Users/4d/anaconda/envs/irm-test/bin/python' (x86_64)
115 names in the corpus
[('assign', [0])]
1436202306.76
Process 22921 stopped
* thread #1: tid = 0x923b8, 0x0000000104df9e3c libmicroscopes_irm.dylib`microscopes::irm::state<4l>::delete_group(unsigned long, unsigned long) + 28, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x18)
frame #0: 0x0000000104df9e3c libmicroscopes_irm.dylib`microscopes::irm::state<4l>::delete_group(unsigned long, unsigned long) + 28
libmicroscopes_irm.dylib`microscopes::irm::state<4l>::delete_group:
-> 0x104df9e3c <+28>: movq 0x18(%rdi), %rax
0x104df9e40 <+32>: leaq (%rsi,%rsi,2), %rcx
0x104df9e44 <+36>: movq (%rax,%rcx,8), %rdx
0x104df9e48 <+40>: movq 0x8(%rax,%rcx,8), %rax```
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.