deepgenerativemodels / notes Goto Github PK
View Code? Open in Web Editor NEWCourse notes
License: MIT License
Course notes
License: MIT License
I have been watching this course on youtube, and it's really really great. However, I can't find any homeworks on the site. Is it possible for you to share homework assignments?
Thanks
In https://deepgenerativemodels.github.io/notes/vae/, paragraph
states that
As we have seen previously, optimizing an empirical estimate of the KL divergence is equivalent to maximizing the marginal log-likelihood logp(x) over
$D$
This isn't mentioned anywere in the rest of the course notes, . It would be useful for the learner to add the proof of this equivalence, or at least a reference to it.
The compiled HTML that is live here (https://deepgenerativemodels.github.io/notes/introduction/) still contains errors like "our goal is to learn the paraemeters" (misspelling), but this is not present in the markdown. We should compile the markdown to update this.
Reading the following paragraphs in the VAE notes:
Next, we introduce a variational family Q of distributions that approximate the true, but intractable posterior p(z∣x). Further henceforth, we will assume a parameteric setting where any distribution in the model family Px,z is specified via a set of parameters θ∈Θ and distributions in the variational family Q are specified via a set of parameters λ∈Λ.
Given Px,z and Q, we note that the following relationships hold true1 for any x and all variational distributions qλ(z)∈Q
If "qλ(z)" is intended to approximate the distribution "p(z∣x)", then I'm confused as to why "qλ(z)" doesn't include "x". Should it be "qλ(z∣x)", or it is actually approximating the distribution "p(z)"?
Apologies if this sounds like an ignorant question--my understanding of probability notation isn't too sharp.
This is from Autoregressive Models Chapter
To see why, let us consider the conditional for the last dimension, given by
$p(x_n|x_{\lt n})$ . In order to fully specify this conditional, we need to specify a probability for$2^{n−1}$ configurations of the variables$x_1,x_2,\ldots,x_{n−1}$ . Since the probabilities should sum to$1$ , the total number of parameters for specifying this conditional is given by$2^{n−1}−1$ . Hence, a tabular representation for the conditionals is impractical for learning the joint distribution factorized via chain rule.
Shouldn't it be
In the variational auto-encoder chapter, the gradient of the encoder is computed as:
However the term also depends on the parameter \lambda. According to the formula (4) in the "Gradient Estimation Using Stochastic Computation Graphs, NIPS 2015"
When estimating the encoder's gradient, the first gradient term is missing according to the equation above
In the "Black-Box Variational Inference" section of the VAE notes:
We first do per-sample optimization of q by iteratively applying the update
λ(i) ← λ(i) + ∇̃ λ ELBO(x(i); θ, λ(i))
We then perform a single update step based on the mini-batch
θ ← θ + ∇̃ θ ∑i ELBO(x(i); θ, λ(i))
If I understood correctly, x(i) is the ith sample from a batch B of the dataset D, and λ is a vector of parameters of the distribution "qλ(z)". What is λ(i)?
Is it the ith parameter of λ? That would imply that the length of B is equal to the dimension of the λ--if so, it's unclear to me why they would be equal.
Another possibility is that λ(i) is the ith update to λ. If so, perhaps it would be better rewritten like this:
λ(i+1) ← λ(i) + ∇̃ λ ELBO(x(i); θ, λ(i))
But if that's the case, then it's unclear to me why it appears in the θ update:
θ ← θ + ∇̃ θ ∑i ELBO(x(i); θ, λ(i))
Apologies if I've missed something obvious here. Also, thanks for notes--they've been very helpful!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.