Giter Club home page Giter Club logo

cs228-notes's People

Contributors

20dbm avatar aditya-grover avatar amrege avatar avati avatar awchen avatar bindra41048 avatar c-j-cundy avatar chrisyeh96 avatar elaineyale avatar ermonste avatar ethnyc avatar hmishfaq avatar jiamings avatar jmswong avatar jygrinberg avatar kandluis avatar kuleshov avatar liaoruowang avatar lmartak avatar makaimann avatar mmistele avatar mr-easy avatar parovicm avatar rafflesintown avatar raunakkmr avatar reviewanon avatar scottfleming avatar shengjiazhao avatar stephenbates19 avatar yenchenlin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cs228-notes's Issues

Coherence in JT Algorithm Lecture Notes

Hi! In the Junction Tree Algorithm Lecture Notes, I noted some inconsistencies w.r.t. factor graphs. In - that order, I quote:

  • First: "Sum-product message passing can also be applied to factor trees with a slight modification. Recall that a factor graph is a bipartite graph with edges going between variables and factors, with an edge signifying a factor depends on a variable." but factor trees are not mentioned before in the lecture notes.
  • Later, note 5. : "Arbitrary potentials can be handled using an algorithm called LBP on factor graphs. We will include this material at some point in the future."

Reading the notes and understanding what factor trees/graphs is doable, but the phrasing/order is confusing. I'll try submitting a pull request later in the quarter.

Thank you!

Possible confusing explanations in sections 2.5 and 2.6.

In section 2.5, you do not explain what I is supposed to be, but it might be necessary to explain it, as not everyone might be familiar with this indicator function.

In section 2.6, you describe the Bernoulli random variable as follows

one if a coin with heads probability p comes up heads, zero otherwise.

Specifically, the expression "heads probability" is a little confusing. You should have been more explicit "with probability p of getting heads".

It could be formulated as (or something like that)

this random variable takes the value p, if the input to the random variable is X=(x=1) (or x=heads), and it takes the value 1 - p, if X=(x=0) (or x=tails). In other words, the probability of taking the value "heads" (1) is p and the probability of taking the value "tails" (0) is 1 - p.

At least, you I would rephrase your explanations to make them less ambiguous and confusing.

I think that the explanations of the other random variables are also quite poor. For example, the geometric random variable is quite related to the binomial random variable, but you do not explain it. See: https://stats.stackexchange.com/q/263141/82135. In general, I think you should spend 2-3 sentences to motivate better the use of these particular random variables. This would make your notes even more useful!

Possible typo in Sampling Method notes

Should that read "if $p(e \mid z)$ is not too far from uniform..." instead of $p(z \mid e)$? If so I can open a PR I just wanted to verify first.

where $$w_e(z) = p(e, z)/q(z)$$. Unlike rejection sampling, this will use all the examples; if $$p(z \mid e)$$ is not too far from uniform, this will converge to the true probability after only a very small number of samples.

Typo in language models application

<p>Knowing the probability distribution can also help us model natural langauge utterances. In this case, we want to construct a probability distribution <script type="math/tex">p(x)</script> over sequences of words or characters <script type="math/tex">x</script> that assigns high probability to proper (English) sentences. This distribution can be learned from a variety of sources, such as Wikipedia articles.</p>

It should be language utterances instead of langauge utterances.
The typo also exists in file preliminaries/applications/index.md.

Discrepancy in front-end and back-end text for VE

If you look at the front end of the VE chapter under "An illustrative example" (as far as I can tell), you see in the first line "...simplicity that we a \n given a chain..."; but the text in the source clearly has "....we are given a chain....".

Is it just me or is there something going on here?

Bayesian Learning MLE Confidence

Currently the notes on Bayesian Learning imply that the MLE estimate does not have improved confidence bounds with more data. However, I believe the sample MLE estimate errors from the true parameter achieves a normal distribution with covariance given by the Fischer information. The confidence bounds then improve with n^(1/2) as we get more data

Make citable?

Thanks so much for posting these, they are awesome!

I would like to cite some parts of this in my thesis and was wondering if you might consider assigning them a DOI for this purpose. Its pretty easy to do with GitHub repos on Zenodo

Inconsistency between definition of ELBO in sections "Auto-encoding variational Bayes" and "The variational lower bound"

In the section Auto-encoding variational Bayes, you state

Recall that in variational inference, we are interested in maximizing the ...

However, in the section The variational lower bound, you define the expectation in a different way. Specifically, \tilde{p} and q are functions of x (E[log(p(x)) - log(q(x))]), whereas in the section Auto-encoding variational Bayes, the distributions are a function of x and z (an unobserved variable) and, further, q is a conditional probability: E[log(p(x, z)) - log(q(z | x))]. So, I think you should explain these differences.

Typo: At the end of the variational auto-encoder section, the variable x is erroneously used instead of z

At the end of the Variational Auto-encoder section (https://ermongroup.github.io/cs228-notes/extras/vae/), the following is written:

A variational auto-encoder uses the AEVB algorithm to learn a specific model p using a particular encoder q. The model p is parametrized as
p(x∣z)=N(x;μ(z),diag(σ(x))^2)
p(z)=N(z;0,I),
where μ(z),σ(z) are parametrized by a neural network (typically, two dense hidden layers of 500 units each).

For p(x∣z), you erroneously wrote σ(x) instead of σ(z) for the covariance matrix, which is probably a typo.

preliminaries/introduction "Thus, it will best" should be "Thus, it will be best"

A bird’s eye overview of the course

Our discussion of graphical models will be divided into three major parts: representation (how to specify a model), inference (how to ask the model questions), and learning (how to fit a model to real-world data). These three themes will also be closely linked: to derive efficient inference and learning algorithms, the model will need to be adequately represented; furthermore, learning models will require inference as a subroutine. Thus, it will best to always keep the three tasks in mind, rather than focusing on them in isolation . . .

explain the chain rule?

In Bayesian Networks (the first meaty chapter),

Recall that by the chain rule, we can write any probability pp as...

Search Probability review for definition of the chain rule - nothing.
Search any earlier chapter for mention of the chain rule - nothing.

Clarification in Belief Propagation notes

In the section Max-product message passing,

The key observation is that the sum and max operators both distribute over products.

I think this holds true only because the factors are always positive and might not hold in general. And should be mentioned in the notes.

Usage of the term "potentials" in discussion of Markov random fields

I have a comment regarding the following equation here
$$\tilde p(A,B,C,D) = \phi(A,B)\phi(B,C)\phi(C,D)\phi(D,A), $$
and the rest of the section where $\phi()$ are referred to as "potentials".

In the language of physics, the potential would refer to the the log likelihood, since potentials add when different probability factors multiply (just like the energies of different non-interacting subsystems add up).

It's possible that the jargon got scrambled when moving from one community to another. If the document's usage is consistent with conventions in the field, then it might be worth adding a clarifying footnote where $\phi$ is introduced.

Section "Introducing evidence" is unclear

The section Introducing evidence is quite unclear. For example, you say

P(X, Y, E) is a probability distribution

However, in the equation above that statement there is no term P(X, Y, E).

You say that sometimes we are interested in computing the posterior given the evidence. So, what does X have to do with this?

The following statement

We can compute this probability by performing variable elimination once on ...

is also not clear. Can you explain it further?

The last paragraph is also quite unclear. What do you mean by scope?

Some more practical examples

Is there a possibility to show some examples and code ? You may include some links. Someone like me or others can try to code too based on the examples. Is Reinforcement learning a good example ?

Generative vs discriminative models

I've been reading your notes on PGMs , and also some other resources, particularly [1]. In [1] , in the introduction, they mention the difference between generative and discriminative models, i.e. modelling joint distribution p(y, x) opposed to p(y|x). You mention the same difference in the The difficulties of probabilistic modeling.
The confusing part is in Real-World Applications/Probabilistic Models of Images where you write

one of the reasons why generative models are powerful lie in the fact that they have many fewer parameters than the amount of data that they are trained with,

which seems contradictory to the The difficulties of probabilistic modeling section. The confusing part is (in boldface) that the sentence above says that generative models have many fewer parameters than the amount of data that they are trained with, which is inconsistent with the argument made in The difficulties of probabilistic modeling.

Am I interpreting something wrong, or the terminology is ambiguous?

[1] An Introduction to Conditional Random Fields

Add a side note that links to a resource that proves that the probability distribution defined on a Bayesian network is valid

At the end of the section where you formally describe Bayesian networks, you state

It is not hard to see that a probability represented by a Bayesian network will be valid: clearly, it will be non-negative and one can show using an induction argument (and using the fact that the CPDs are valid probabilities) that the sum over all variable assignments will be one. Conversely, we can also show by counter-example that when G contains cycles, its associated probability may not sum to one.

I think you should have a side note that links to a paper or a resource that proves these statements.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.