iffsid / mmvae Goto Github PK
View Code? Open in Web Editor NEWMultimodal Mixture-of-Experts VAE
License: GNU General Public License v3.0
Multimodal Mixture-of-Experts VAE
License: GNU General Public License v3.0
Thanks for sharing your model implementation. I have installed all dependencies as specified in the readme file. However, there is something that it is wrong. In the function embed_umap()
there is an argument called transform_seed
. It seems like transform_seed
was first introduced in umap-learn==0.3.0
and not in umap-learn==0.1.1
. It does not help to install umap-learn==0.3.0
because that will trigger an error message since there is a dependency with joblib
in sklearn
. So sklearn must be an older version, but I am not sure which one. I maneged to run python main.py --model mnist_svhn
by replacing UMAP
with TSNE
in the function embed_umap
.
Another issue is that the code seems to be running very slow, eventhough it runs in a GPU. Any idea what can it be happening? can it be the function embed_umap()
that it is triggered at each epoch?
Hi,
In the dreg function, why do you detach the encoder params using the following line of code?
qz_x = model.qz_x(*[p.detach() for p in model.qz_x_params]) # stop-grad for \phi
Does it mean that the encoder parameters are not updated during training?
Thanks,
VR
Hello, thanks for sharing the code.
This is not an issue, but rather a question of where the Mixture-of-Experts operation is involved in the mmvae.py code. I thought it would be used in forward()
but I cannot somehow clearly identify it in these lines:
def forward(self, x, K=1):
qz_xs, zss = [], []
# initialise cross-modal matrix
px_zs = [[None for _ in range(len(self.vaes))] for _ in range(len(self.vaes))]
for m, vae in enumerate(self.vaes):
qz_x, px_z, zs = vae(x[m], K=K)
qz_xs.append(qz_x)
zss.append(zs)
px_zs[m][m] = px_z # fill-in diagonal
for e, zs in enumerate(zss):
for d, vae in enumerate(self.vaes):
if e != d: # fill-in off-diagonal
px_zs[e][d] = vae.px_z(*vae.dec(zs))
return qz_xs, px_zs, zss
It seems like qz_xs
don't mix and px_zs
are only computed with the posterior distributions of single modalities. Don't you need to use mixture of experts to combine here the posterior distributions qz_xs
?
Thanks,
Hi! Thanks for sharing this great project!
I trained the model with your suggested settings and also evaluated your provided trained model, but in both ways I can't reproduce results as reported in Table 2 and Table 4, especially joint coherence.
Can you give any hints?
Thanks.
Hi,
Thanks for sharing the code.
This is not an issue, rather a query.
I was wondering how did you come up with the scale of individual modality likelihood? (e.g; 0.75 for MNIST). Also, how do I decide that scale for a new dataset?
Thanks,
Hi, I can't find cub.vocab of Caltech-UCSD Birds (CUB) dataset. Can you upload the file? Thanks!
python main.py --model cubISft
modelC = getattr(models, 'VAE_{}'.format(args.model))
model = modelC(args).to(device)
Thanks for sharing the code. Can you please tell me how many IWAE samples did you used for training the model.
Also did you tried training the model with the vanilla iwae estimator? If yes, then can you please share what likelihoods did you get from it and from the DREG estimator.
Thanks a lot.
Hi! Thanks for sharing this amazing project.
I'm having problem unzipping the pre-trained models you uploaded.
Could you check it?
Thanks.
Sorry problem solved. Thanks
Dear authors,
I'm following your work and I find link to cleaned-up version Caltech-UCSD Birds (CUB) dataset has expired. Could you please share a new link to that dataset?
Thank you!
what is the difference between "mmvae_cub_images_sentences_ft" and "mmvae_cub_images_sentences"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.