yousuketakada / prml_errata Goto Github PK
View Code? Open in Web Editor NEWMore PRML Errata
Home Page: https://yousuketakada.github.io/prml_errata/
License: Other
More PRML Errata
Home Page: https://yousuketakada.github.io/prml_errata/
License: Other
In the 10.1.2, with minimizing KL(q||p), we can found that the unique solution
q(z) = q_1^{}(z_1) * q_2^{}(z_2) = N(z_1| \mu_1, \Lambda_11^{-1}) * N(z_2| \mu_2, \Lambda_22^{-1})
as author showed in Exercise 10.2.
We also have same solution as above by minimizing KL(p||q) in (10.17) and (2.98).
So, the solution q(z) is ,in general, not the spherical Gaussian and can not be looked similar
to Figure 10.2.
The correct contour plot of the solution looks like in the following figure in (a) with purpled colored lines.
https://drive.google.com/file/d/1cSnMA-_hheAmnCBLKrp551BcszI15vpq/view?usp=sharing
I think that Figure 10.2 is originated from Mackay, 2003(https://www.inference.org.uk/itprnn/book.pdf)
at page 436 and the related problem is exercise 33.5 (p.434) and in this problem, the solution is
restricted to spherical Gaussian.
If my understanding is correct, Figure 10.2 is not relevant to the contexts in 10.1.2.
Please check my opinion and let me know your comments.
Thank you.
Hi Yousuketakada
According to the official errata fix in https://www.microsoft.com/en-us/research/wp-content/uploads/2016/05/prml-errata-3rd-20110921.pdf, at page 101 ,a = 1 + beta/2 should be read a = (1 + beta)/2.
However I think that this fix is wrong and the original correct., a = 1 + beta/2.
In (2.153), lambda^(beta/2) should be lambda^(a-1) by definition of gamma function.
So, a - 1 = beta/2 --> a = 1 + beta/2
Is my understanding is correct?
Hi
I found simple typo in your PRML Errata.
In the equation number (148) at page 35, \bm{\phi}_N should be \bm{\phi}_M.
Please check and I am also waiting for my previous pull request.
Thank you.
In 3.57 on p. 156, we have p(t|t,alpha,beta). I believe there should be an x in the conditioning too, as it is in 3.58.
Although the Bernoulli distribution (2.2) is well-defined for
If
In fact, some authors adopt the restriction
Also, in the context of Bayesian inference in which we regard
Similar discussions also apply to other discrete distributions, i.e., the Binomial, the multinoulli, and the multinomial.
For the prior distributions, i.e., the beta and the Dirichlet, the domain of
The link to the support page should be
https://www.microsoft.com/en-us/research/people/cmbishop/prml-book/
Possible errors are found at:
Page 174, Exercise 3.4, Line -4
Page 238, Equation (5.34)
Page 248, Equations (5.75) and (5.76)
Page 289, Equation (5.208)
Page 307, Equation (6.62)
Page 314, Equation (6.75)
Page 563, Equation (12.7)
We also need to add a mention to ``Mathematical Notation'' for PRML on Pages xi--xii for the first correction to these errors.
Split lengthy paragraphs into shorter ones and introduce paragraph headers where appropriate to help the reader better understand the organization of the text.
Paragraph headers are also useful for introducing important concepts, e.g., big O notation, score function, etc.
Show the normalization (B.79) as well as the expectations (B.80) and (B.81) of the Wishart distribution (B.78) in a self-contained manner.
We have already pointed out that some appropriate citation, e.g., Anderson (2003), is needed for the Wishart distribution because it has been introduced without any proof. However, most multivariate statistics textbooks, including Anderson (2003), motivate the Wishart distribution differently from PRML; they typically introduce the Wishart distribution as the distribution over the scatter matrix.
The derivation of the Wishart distribution along this line is indirect for our purpose (we are mainly interested in its conjugacy). I would rather like to show the normalization (B.79) as well as the expectations (B.80) and (B.81) directly just as we have done for the gamma distribution (2.146).
We show the Wishart distribution based on the matrix factorization called the Cholesky decomposition as well as the associated Jacobian. We also introduce the multivariate gamma function, which simplifies the form of the normalization constant (B.79). The expectations (B.80) and (B.81) are shown by making use of the fact that the expectation of the score function vanishes.
You say that using a very broad prior distribution leads to insufficient regularization and thus overfitting. I'm guessing you have in mind using the MAP estimator. But this isn't a Bayesian thing to do. A Bayesian would produce a predictive distribution (section 3.3.2), or if required to produce a point prediction for every input, might produce something like the predictive mean or predictive median, depending on what the ultimate loss function is.
In page 701, second paragraph, "A matrix A is said to be positive definite, denoted by A > 0, if w^TAw > 0 for all non-zero values of the vector w. Equivalently, a positive definite matrix has \lambda_i > 0 for all of its eigenvalues ..." has some error as following.
The conditions in "Equivalently Equivalently, a positive definite matrix has \lambda_i > 0 for all of its eigenvalues ..." is true only when A is "symmetric" positive definite matrix. That is not all positive definite matrix is symmetric. For example,
\begin{pmatrix} 1 & 1 \-1 & 1\end{pmatrix} is not symmetric but positive definite and does not have positive eigenvalues.
(As David Mitra showed in https://math.stackexchange.com/questions/83134/does-non-symmetric-positive-definite-matrix-have-positive-eigenvalues)
So, my suggestion to correct is "Equivalently, a positive definite matrix" --> "If A is a symmetric, positive definite matrix".
Take the determinants of the two block diagonalizations we use to show the general push-through identity [a generalized version of (C.5)] and the Woodbury identity (C.7). The determinants are both equal to det(M) so that they are equated and setting A and D equal to identities (possibly of different dimensionalities) gives (C.14) with some reparameterization.
Note that we cannot take the determinants of both sides of (C.6) and then cancel the det(A) factor because A is not necessarily square nor nonsingular.
Figure 5.10 shows that at M=4, the sum of square errors is least, which accords with Figure 5.9. M=3 maybe better because there is no great improvement from 3 to 4.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.