I am still a noob in BayesPy and am trying to implement this variant of LDA described

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Note: This message is Jupyter Notebook. You can <a href="https://github.com/bayesp

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Custom gating of plates - Question about bayespy HOT 11 OPEN

bayespy commented on June 12, 2024

Custom gating of plates - Question

from bayespy.

Comments (11)

viveksck commented on June 12, 2024

Now that I think of it: Is it possible to deterministically (force set) the value of several other categorical nodes based on the value of a parent. For example if categorical node p takes on value 1 then I need children c_0 through c_10 to also take value 1 and if p takes value 0 can call its children just take value 0. I checked Deterministic node documentation, but dont see such support yet.

If I can do that, then I can broadcast my entity assignment to each word and proceed.

from bayespy.

jluttine commented on June 12, 2024

I quickly looked into this, but the model is a bit complex, so I'd need to spend a bit more time thinking about this. Anyway, the broadcasting error is raised because the plates of the two nodes are incompatible.

I like to denote the overall shape of a node in two parts: the plates and the dimensionality/range/etc of the variable. For instance, entity_assignments has plates (numDocuments, numEntities) and range(?) (numPersonas), that is, each of the elements is an integer in range [0, numPersonas]. Similarly, psi has plates (numPersonas, numRoles) and dimensionality (numTopics), that is, each variable is a numTopics-dimensional vector from a Dirichlet distribution.

Now, at least using gated_plate=-2 perfectly matches the range of entity_assignments with the second last plate axis of psi, so that's fine. However, the plates of entity_assignments (numDocuments, numEntities) should be broadcastable to the remaining plates of psi (numRoles,), but it isn't. So there's a mismatch. Are the plates incorrect for either of the nodes, are you combining wrong nodes or would you want to get resulting plates like (numDocuments, numEntities, numRoles)? I should take a bit more careful look to figure out how to write the model, but at least there's an obvious error.

With complex models like this, I typically like to write the plates and "dimensionality" of the node as a comment above the node definition so I can keep track of them easily. Then, usually you are doing things the right way if all the plates and dimensionalities match nicely. ;)

I can try to take a more careful look some other day but I wanted to at least write this in case it helps you to solve the issues.

About the deterministic mapping, I'm not sure why would you need map 0->0 and 1->1 deterministically to a child node, why can't you use the parent node directly in such a case? Anyway, Take node provides one way of defining deterministic mappings, but I don't think it's needed here.

I hope this helps at least a bit. I'll get back to you another day if you want.

from bayespy.

viveksck commented on June 12, 2024

Thanks foe the note. I will get back to you in a day or two.

from bayespy.

viveksck commented on June 12, 2024

@jluttine Thanks a lot for your comments. Here is my scenario:

I have topics that can be indexed by 2 dimensions (instead of just one).

So for each word w in my corpus, there is a topic assignment determined by 2 indices: (a) row index and (b) column index

So to specify this in my model, I need to gate on both dimensions. I was hoping to do this by first gating on rows and then nesting that to gate on columns.

Here is a small snippet

import bayespy
word_topic_dist = bayespy.nodes.Dirichlet([1,1,1,1], plates=(4,3)) # There are 12 topics indexed by a 4X3 matrix. Each topic is a distrubution over 4 words

row_assignments = bayespy.nodes.Categorical([0.2, 0.1, 0.2, 0.5], plates=(1000,)) # The row assignments to index the topics for each of the 1000 words in my corpus

col_assignments = bayespy.nodes.Categorical([0.2, 0.1, 0.7], plates=(1000,)) # The column assignments to index the topics for each of the 1000 words in my corpus

# So word *i* must be drawn from topic indexed (row_assignments[i], col_assignments[i])  
# How can I achieve this using BayesPy (I am thinking using Gates is the right approach) ?

from bayespy.

jluttine commented on June 12, 2024

Sorry for the huge delay. I've been very busy during the last few weeks. I took a look again and it takes a quite a lot of time for me to parse the exact model definition. Many words but few mathematical formulas. :) If you have time to write the exact model definition as conditional probability distributions or some generative formulas which describe how each variable defines some others, and what is the input data like, that would help a lot. Implementing that in BayesPy is then rather simple, hopefully. But in any case, I'll continue on this today evening or some other day again. Cheers!

from bayespy.

jluttine commented on June 12, 2024

FYI, I'm reading section 4.1 from the paper, that probably gives enough information.

from bayespy.

jluttine commented on June 12, 2024

Note: This message is Jupyter Notebook. You can download it or run it interactively.

Ok, I now sketched an implementation of the Dirichlet Persona Model. You should double check that this is what you wanted, I'm not absolutely sure. I think I made at least one minor change: persona distribution is global, not document/movie specific.

Anyway, define the configuration

import bayespy as bp
import numpy as np

numTopics = 10      # number of topics 
numPersonas = 4     # protagonist, villain, ...
numRoles = 3        # agent verb, patient verb, attribute
sizeVocabulary = 50 # size of vocabulary
#numDocuments = 8   # number of documents (not used now)
numCharacters = 15  # total number of characters
sizeCorpus = 10000  # size of the dataset

# Generate random dataset from the model
# Data are a set of tuples (word, role, character)
# So, each "datapoint" has a word-index, role-index and character-index.
data_characters = bp.nodes.Categorical(
    np.ones(numCharacters) / numCharacters,
    plates=(sizeCorpus,)
).random()
data_roles = bp.nodes.Categorical(
    np.ones(numRoles) / numRoles,
    plates=(sizeCorpus,)
).random()
data_personas = bp.nodes.Categorical(
    np.ones(numPersonas) / numPersonas,
    plates=(numCharacters,)
).random()
data_topic_dist = bp.nodes.Dirichlet(
    np.ones(numTopics),
    plates=(numPersonas, numRoles)
).random()
data_topics = bp.nodes.Categorical(
    data_topic_dist[data_personas[data_characters], data_roles]
).random()
data_word_dist = bp.nodes.Dirichlet(
    np.ones(sizeVocabulary) / sizeVocabulary,
    plates=(numTopics,)
).random()
data_words = bp.nodes.Categorical(
    data_word_dist[data_topics],
    plates=(sizeCorpus,)
).random()

Below is the model:

# Word distribution for each topic
# (numTopics) x (numWords)
word_dist_in_topics = bp.nodes.Dirichlet(
    np.ones(sizeVocabulary),
    plates=(numTopics,)
)

# Topic distribution for each role and persona
# (numPersonas, numRoles) x (numTopics)
topic_dist_in_personas_and_roles = bp.nodes.Dirichlet(
    np.ones(numTopics),
    plates=(numPersonas, numRoles)
)

# Persona distribution (make this document specific?)
persona_dist = bp.nodes.Dirichlet(
    np.ones(numPersonas)
)

# Persona assignments of the characters
# (numCharacters) x (numPersonas)
personas_of_characters = bp.nodes.Categorical(
    persona_dist,
    plates=(numCharacters,)
)

# Persona assignments for each data point (i.e., each word in the corpus)
# (sizeCorpus) x (numPersonas)
personas = bp.nodes.Gate(
    data_characters,
    personas_of_characters
)

# Topic assignment for each data point (i.e., each word in the corpus)
# (sizeCorpus) x (numTopics)
topics = bp.nodes.Categorical(
    bp.nodes.Gate(
        personas,
        bp.nodes.Gate(
            data_roles[:,None], # a trick to make plates match in this case
            topic_dist_in_personas_and_roles
        )
    )
)

# Words in the corpus
# (sizeCorpus) x (sizeVocabulary)
words = bp.nodes.Categorical(
    bp.nodes.Gate(
        topics,
        word_dist_in_topics
    )
)

Create VB object, initialize some nodes randomly and observe the data. Note that characters and roles data were used as "inputs" in the above model.

Q = bp.inference.VB(
    words,
    word_dist_in_topics,
    topics,
    topic_dist_in_personas_and_roles,
    personas_of_characters,
    persona_dist,
)
topics.initialize_from_random()
personas_of_characters.initialize_from_random()
topic_dist_in_personas_and_roles.initialize_from_random()
persona_dist.initialize_from_random()
words.observe(data_words)

Run inference:

Q.update(repeat=1000)

You can visualize the posterior of the nodes for instance as:

%matplotlib notebook
bp.plot.plt.figure(); bp.plot.hinton(personas_of_characters)
bp.plot.plt.figure(); bp.plot.hinton(word_dist_in_topics)

I hope this helps!

from bayespy.

viveksck commented on June 12, 2024

@jluttine : Thanks a ton ! This is awesome :) . Its pretty much the model I had in mind (except that the persona disttribution is document specific, but I think I can get that working).. I was having trouble defining the topics node (where you have a nested gate). Nice trick to use the data_roles to index the roles I had modeled roles as probabilistic variable, which I had to set to observed, which complicated my model). Thanks a ton ! I will definitely be using your wonderful package for my research !

from bayespy.

jluttine commented on June 12, 2024

In the paper, the document specific persona distributions seem to share the concentration parameters of the Dirichlet distributions. I think that sharing these parameters would currently require a custom node which implements an estimation algorithm. I'll make a separate issue about that feature request and see what I can do about it. Anyway, I don't think it makes much of a difference whether you use document specific persona distributions by sharing the the concentration parameters or use the same persona distribution for all documents. It might make a significant difference if you have a relatively large number of personas in each document AND the distributions of those personas are very different in each document. If I understood correctly, this is probably not the case.

And at least, don't use document specific persona distributions without sharing the concentration parameters, you'll probably end up getting very little information about the persona distributions on each document.

from bayespy.

jluttine commented on June 12, 2024

For your information, develop branch now supports learning the concentration parameter of a Dirichlet distribution. The node is called DirichletConcentration and it takes one mandatory argument: the dimensionality of the probability vector. That is, one integer. This node can then be used as a parent node for Dirichlet nodes in order to learn models where many Dirichlet variables share a common unknown concentration parameter.

from bayespy.

viveksck commented on June 12, 2024

Thanks a ton !I will give it a shot.

from bayespy.

Custom gating of plates - Question about bayespy HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent