telmo-correa / all-of-statistics Goto Github PK

View Code? Open in Web Editor NEW

847.0 16.0 259.0 11.04 MB

Self-study on Larry Wasserman's "All of Statistics"

Jupyter Notebook 100.00%

statistics solutions self-study probability

all-of-statistics's People

Contributors

Stargazers

Watchers

Forkers

eazyds rockinhumingbird ghshu anhnguyendepocen connoisseures belousm yurpl depocen runqizhao andreipit xd-han coveredinchocolate paccionesawyer jshang-0 sjin09 zhaxylykova laurencewm jaredxq mary-sam hwqdqwh zsingsun xhhadg slowisfaster simonzgreen catonchen icestephanie sxhedan cwbcheng arvinshaffer georgyofulu danielzou24 positive-song asalimw choco9966 deephaejoong lukeeeeee atrng keunjusong planetceres pankajkarman liaozhihui wangdepin chadihelwe cmdlinetips mtoqeerpk kesnel bsllee billliu22 sahuabinash lucimix kkc-krish skat00sh raghuhemadri mozarzaa yichang-yang jiaqiliangbgi dreamyang-liu dpmine325 brijow alext11223344 curl1969 vishwassathish saidamir zhaoshuyuan00 mr-90-style anacronistica canyonpaladin sabuja2002 shola-lawal acraba mmkuang shrejais allensmile songqp2020 tianjiu233 ltdung ashishpatel26 pluveto liukaizhijia sukanya-rs abeusher vegemiteenjoyer xiangyuchang mathieubordere mrrobo24 lazarevev hidasri grqft jairesc hinojosaad vasily-sizov mepino rrrroujiya hengli700 medivh99 evandufraisse wang-h-w jiwooshin-git gafieldcn feixwang

all-of-statistics's Issues

Exercise 11.9.10

There is an error with the presented variance of the MLE $\hat{\lambda}=\overline{X}$. The correct expression would be: $\hat{SE}(\hat{\lambda}) = \sqrt{1/I_n(\hat{\lambda})}=\sqrt{\frac{\overline{X}}{n}}$.

This implies changes over the Wald statistic and the simulation, of course.

Exercise 10.13.7

The solution presented from part a) to part c) is not general, it already assumes $X_1\sim Binomial(n_1,p_1)$ and $X_2\sim Binomial(n_2,p_2)$ only consist of 1 datapoint each, which we only know from part d).

Do not confuse the number of datapoints of $X_1$ and $X_2$ (let us call them $N_1$ and $N_2$) with the number of data contained in each $X_i$, which is given in the exercise as $n_i$ (i.e., $n_i$ is the number of $Bernoulli(p_i)$ repetitions).

The MLE of $p_i$ from $X_i \sim Binomial(n_i,p_i)$ consisting of $N_i$ datapoints comes from maximizing
$$l_{N_i}(p_i)= \log\left(L_{N_i}(p_i)\right)=log(p_i) \sum_{j=1}^{N_i} X_{i,j} + log(1-p_i) \sum_{j=1}^{N_i} \left(n_i-X_{i,j}\right) +\sum_{j=1}^{N_i} log\binom{n_i}{X_{i,j}}$$

Taking the derivative of $l_{N_i}(p_i)$ respect to $p_i$ and equaling to zero lead us to the estimator
$$\hat{p_i}= \frac{1}{n_i N_i} \sum_{j=1}^{N_i} X_{i,j}={\overline{X_i} \over n_i}$$
The MLE estimator of $\psi=p_1-p_2$ would be then
$$\hat{\psi} = \hat{p_1} - \hat{p_2}= {\overline{X_1} \over n_1} - {\overline{X_2} \over n_2}$$
The Fisher's Matrix would be

$$\begin{eqnarray} I(\hat{p_1},\hat{p_2})=\begin{bmatrix} \frac{\hat{p_1} \left(1- \hat{p_1}\right)}{N_1 n_1} & 0 \\ 0 & \frac{\hat{p_2} \left(1- \hat{p_2}\right)}{N_2 n_2} \end{bmatrix} \end{eqnarray}$$

As in d) it is revealed that we only have 1 datapoint for $X_1$ and $X_2$, then $N_1=1$ and $N_2=1$, which lead to the correct expression showed in this solution.

Exercise 8.5.8 Standard Error mistake

Hi!

First of all thanks for this repo! It's been a big help for self-studying the book.
In question 8.5.8 I think se = r.std() should be replaced by se = r.sem() or math.sqrt(r.var() / len(r)). The std calculates the standard deviation of the values themselves where as we want the standard error of the mean estimate which is calculated using:

Minor isssues with Exercise 9.6.6

There is an issue with the variance of true sampling distribution, which should be $\mathbb{V}\left(\overline{X_n}\right)=1/n$ (not $1/n^2$). Thus, we will have $\overline{X_n}=\frac{1}{n} \sum X_i \sim \frac{1}{n} N(n \mu,n) = N(\mu, 1/n)$.

This change will induce a less concentrated distribution in comparison with the solution presented at this moment (as the scale should be $1/ \sqrt{n}$).

Solution to Exercise 10.13.6 (b)

the answer consists both big phi and small phi instead of only big phi

Exercise 10.13.6

On part b) the correct 95% CI would be $\Phi(\overline{X}) \pm z_{.025} \frac{\phi(\overline{X})}{\sqrt{n}}$.
On part e) As correctly expressed here, the true value of $\psi= 1-F_X(0)$. What I think is the correct procedure for the rest of the exercise is the following: By the WLLN $\overline{X}$ converges in probability to $\mu_X=\mathbb{E}(X)$ (the true mean of the RVs $X_i$). According to this, $\hat{\psi}$ tends in probability to $\Phi(\mu_X)$. To finalize, in general we have that $1-F_X(0)\neq \Phi(\mu_X)$.

Minor corrections on Exercise 6.8.13

In the current status of the solution it is used that $\mathbb{P}(Z_i>x/n)=\int_0^{x/n} f(u) \rm{d}u$, $F(0)=1$, and $F'(0)= -\lambda$. None of these statements are correct.

The correct statements are:

$\mathbb{P}(Z_i>x/n)=\int_0^{\infty} f(u) \rm{d}u - \int_0^{x/n} f(u) \rm{d}u= 1 - \int_0^{x/n} f(u) \rm{d}u$ = 1 - F(x/n).
$F(0) = \int_0^0 f(u) \rm{d}u = 0$.
$F'(0) = \lim_{x \to 0} f(x) = \lambda$.

All these changes lead us to the same expected result.

Issues with Exercise 5.6.4 b) (Chapter 5: Inequalities)

As pointed out by the statement of the question $\mathbb{P}(p \in C_n) \geq 1- \alpha = 0.95$, for $\alpha=0.05$. According to this, the Coverage should always be above 0.95, which is not what we see in the simulation provided here.

Sincerely, I do not understand the simulation provided since I am not familiar with "tqdm" and maybe other things. If it helps, what I did was to generate 100 Bernoulli(p) RV of length N=10.000, calculate the 100 cumulative RVs (of length N=10000) and average the coverage over the 100 repetitions. My caveats: I am not sure I really need to average over a given number of random vectors (which I arbitrarily decided to be 100)

Here mycode:

```
import math
import random as rd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import bernoulli
np.random.seed(1)

N = 10000
p = 0.4
alpha=0.05
nn=np.arange(1,N+1)
XX_Cum=[np.cumsum(bernoulli.rvs(p,size=N))/nn for i in range(100)]

def epsilon(N,alpha):
    return np.sqrt(1/(2*N)*np.log(2/alpha))

Counter=np.array([[np.abs(XX_Cum[j][i-1]-p)<=epsilon(i,alpha) for i in range(1,N+1)] for j in range(100)])

Coverage=[np.count_nonzero(Counter.transpose()[i])/100 for i in range(len(Counter.transpose()))]

plt.plot(nn, Coverage,label='Coverage')
plt.plot(nn, [1-alpha for i in range(len(Counter.transpose()))],label='Hoeffding s lower bound')
# naming the x axis
plt.xlabel('N')
# naming the y axis
plt.ylabel('$P(p \in C_n)$')
# giving a title to my graph
plt.legend()
  
# function to show the plot
plt.show()

```

Mistake in risk estimation

Hello! Thank you for a great work! It seems like a mistake in Chapter 21 - Nonparametric Curve Estimation in function j_hat_kde, since as mentioned in the book and in your text K^(2) is N(0,2) the function should look like (also the comment contains mistake too, h is a bandwidth), however the comment contains the information that dataset is rescaled to [0, 1], but I did not find any evidence for such transformation in the book as well as code of this function does not contain it, please could you double check this:

def j_hat_kde(X, h):
    """
    Calculate the approximated estimated KDE risk J_hat for a N(0, 1) Gaussian kernel
    
      \hat{J}(h) = \frac{1}{hn^2}\sum_{i, j} K^* \left( \frac{X_i - X_j}{h} \right) + \frac{2}{nh} K(0)
      
    where:
      n is the dataset size
      h is the bandwidth for the rescaled [0, 1] dataset
      K^* is K^{(2)}(x) - 2 K(x), and K^{(2)} is the convolved kernel, K^{(2)}(z) = \int K(z - y) K(y) dy
      K is the original kernel
    """
    n = len(X)
    Kstar_args = np.array([X.iloc[i] - X.iloc[j] for i, j in product(range(n), range(n))]) / h
    sum_value = np.sum(norm.pdf(Kstar_args, loc=0, scale = 2) - 2 * norm.pdf(Kstar_args, loc=0, scale = 1))
    return sum_value / (h * n * n) + 2 * norm.pdf(0) / (n * h)

Proof 5.4

There is a typo: $Y-i$ shoold be $Y_i$

Exercise 9.6.1

It looks to me like you are picking a random lsat and a random gpa and making them a pair for the bootstrap.
I think we want to pick a random pair directly instead.

e.g.
lsat = [1,2]
gpa = [3,4]

We want to pick randomly from (1,3) and (2,4).
So (1,4) would be impossible.

Exercise 4.7.13

all-of-statistics/Chapter 04 - Expectation.ipynb

Line 1292 in e8aa207

 "**Solution**. We have $X = C U_1 + (1 - C)U_2$, where $U \\sim \\text{Bernoulli}(1/2)$, $U_1 \\sim \\text{Uniform}(0, 1)$ and $U_2 \\sim \\text{Uniform}(0, 2)$ are all independent." 

is $U \\sim \\text{Bernoulli}(1/2)$ meant to be $C \\sim \\text{Bernoulli}(1/2)$ ?
also is U2 $U_2 \\sim \\text{Uniform}(3, 4)$

Exercise 3.14.14 contains an error.

Hello,

The right answer for f(r) should be 2r instead of 2pi*r. Otherwise, you can't normalize f(r) to 1.

Best wishes,
Kirill

Little issue with Exercise 6.8.8

The proposed solution is algebraically correct on the left hand side of the inequality of the first probability, but $\sigma$ stands for the standard deviation of the Poisson($\lambda=1$), which is $\sqrt{\lambda} =1$. This changes the rest of the solution as follows:

$\mathbb{P}\left(Y<90\right)=\mathbb{P}\left(\overline{X}_n<{90 \over n}\right)=\mathbb{P}\left(\sqrt{n}\frac{\left(\overline{X}_n-\mu\right)}{\sigma}<\sqrt{n}\frac{\left({90 \over n}-\mu\right)}{\sigma} \right) \approx \mathbb{P}\left( Z<\sqrt{100}\left({90 \over 100}-1\right) \right)= \Phi(-1)$.

Issue with exercise 8.5.2

In the Cauchy runs to test nonparametric confidence band, the cdf the code uses to compare the cdf is from the normal distribution, not the cauchy distribution as it should be. This leads to many more instances of the actual cdf being outside the confidence band.

Issues with Exercise 9.6.5

In the sections a) and b) (and by extension in the rest of the exercise) there are some issues with the meaning of $\mathbb{E}\left(\overline{X_n^\star}\vert X_1, ... ,X_n\right )$ and $\mathbb{V}\left(\overline{X_n^\star} \vert X_1, ... ,X_n\right)$. As these expressions are conditioned over random variables, the outcomes should also be random variables, not values (as it is expressed in this solution). My solution is the following:

$$\mathbb{E}\left(\overline{X_n^\star}\vert X_1, ... ,X_n\right ) = \frac{1}{n} \sum_i \mathbb{E}\left(X_i^\star \vert X_1, ... ,X_n\right ) = \frac{1}{n} \sum_i \frac{1}{n}\sum_j X_j = \frac{1}{n} \sum_i \overline{X_n} = \overline{X_n} $$

$$\begin{eqnarray} \mathbb{V}\left(\overline{X_n^\star} \vert X_1, ... ,X_n\right) &=& \frac{1}{n^2} \sum_i \mathbb{V}\left(X_i^\star \vert X_1, ... ,X_n\right ) = \frac{1}{n} \mathbb{V}\left(X_1^\star \vert X_1, ... ,X_n\right ) = \frac{1}{n} \left[ \mathbb{E}\left({X_1^\star}^2 \vert X_1, ... ,X_n\right ) - \mathbb{E}^2\left({X_1^\star} \vert X_1, ... ,X_n\right) \right] \\ &=& \frac{1}{n} \left[ \frac{1}{n} \sum_i X_i^2 - \overline{X_n}^2 \right]= \frac{1}{n^2} \sum_i \left(X_i - \overline{X_n} \right)^2 = \frac{1}{n} \hat{\sigma} = \frac{n-1}{n^2} Sn^2 \end{eqnarray}$$

As I previously said, these changes produce some effects over the rest of the solution. However, the modifications I propose lead to the same final results of c) and d).

Issues with Exercise 9.6.2

1- I don't think that the "fourth method" that the exercise ask for is the Parametric bootstrap. but the "The Jacknife" Confidence Interval that is explained in the Appendix of the chapter.

2- The second part of the exercise ask for "the true coverage", which means the "the percentage in which the true value of the statistic" (in this case the Skewness parameter). My way to solve this is the following (excuse me if my code is not very fluent)

import random as rd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
from tqdm.notebook import trange, tqdm
np.random.seed(1)

def skew(x):
    mean=np.mean(x)
    se=np.std(x)
    return np.mean((x-mean)**3)/se**3

B=10000
def T_boot(x):
    t_boot=np.empty(B)
    n_x=len(x)
    for i in tqdm(range(B)):
        x_star=np.random.choice(x,n_x,replace=True)
        t_boot[i]=skew(x_star)
    return t_boot,np.std(t_boot)

#Creation of 1000 random vectors of n=50 datapoints
Y2=norm.rvs(loc=0, scale=1, size=(1000,50))
X2=np.exp(Y2)

#Mapping each random vector with the bootstrap to produce the confidence intervals
t_bootV=list(map(T_boot,X2))

#Generating a list with the confidence intervals from every random vector
#Normal 95% confidence intervals
z_95=norm.ppf(0.975)
NCI=np.array(list(map(lambda x,y: (skew(x)-z_95*y,skew(x)+z_95*y),X2,list(zip(*t_bootV))[1])))

#Percentile 95% confidence intervals
PerCI=np.array(list(map(lambda x: (np.quantile(x,0.025),np.quantile(x,0.975)),list(zip(*t_bootV))[0])))

#Pivotal 95% confidence intervals
PivCI=np.array(list(map(lambda x,y: (2*skew(x)-np.quantile(y,0.975),2*skew(x)-np.quantile(y,0.025)),X2,list(zip(*t_bootV))[0])))

According to theory:

$E(X)=E(e^{Y})= \int_{-\infty}^\infty e^{y} {\rm{d}}F_Y(y)=e^{1/2}$.
$\sigma_X^2=V(X)=E[X-E(X)]^2=E[e^{Y}-E(e^Y)]^2= \int_{-\infty}^\infty \left(e^{y} - e^{1/2}\right)^2 {\rm{d}}F_Y(y)=e(e-1)$.
$E[X-E(X)]^3=E[e^{Y}-E(e^Y)]^3 = \int_{-\infty}^\infty \left(e^{y} - e^{1/2}\right)^3 {\rm{d}}F_Y(y) = e^{3/2} (2-3 e +e^3)$.

So that the skewness parameter of $X=e^Y$ is $\theta=T(F)=\int (x-\mu)^{3} {\rm d}F(x) /\sigma_X^3=\frac{E[X-E(X)]^3}{\left(E[X-E[X)]^2\right)^{3/2}}=\frac{e^{3/2} (2-3 e +e^3)}{e^{3/2}(e-1)^{3/2}}= \frac{(2-3 e +e^3)}{(e-1)^{3/2}} \approx 6.18$.
This will be the value we expect to be within our confidences intervals (not only for each method, but also for each simulated random vector). The "true coverage" estimations continues as follow:

#The true parameter:
skewness_f=(2-3*np.exp(1)+np.exp(1)**3)/(np.exp(1)-1)**(3/2)

#Comparison with the Normal 95% Confidence Intervals
cov_NCI=np.count_nonzero((NCI.T[1]>=skewness_f) == (skewness_f>=NCI.T[0]))/len(NCI.T[0])
cov_NCI
#Comparison with the Percentile 95% Confidence Intervals
cov_PerCI=np.count_nonzero((PerCI.T[1]>=skewness_f) == (skewness_f>=PerCI.T[0]))/len(PerCI.T[0])
cov_PerCI

#Comparison with the Pivotal 95% Confidence Intervals
cov_PivCI=np.count_nonzero((PivCI.T[1]>=skewness_f) == (skewness_f>=PivCI.T[0]))/len(PivCI.T[0])
cov_PivCI

print('The true coverage for the simulated Normal confidence intervals is: %.3f' % cov_NCI)
print('The true coverage for the simulated Percentile confidence intervals is: %.3f' % cov_PerCI)
print('The true coverage for the simulated Pivotal confidence intervals is: %.3f' % cov_PivCI)

The output of the simulation:

This means that our 50 datapoints are not enough to make our bootstrap 95% confidence intervals to reach the expected coverage...

With $n=1000$ datapoints the coverage gets better:

With $n=100000$ datapoints the coverage gets closer to the expected 95%,

Assuming that my procedure is correct, I only need more data points (which is more computationally expensive, reason of which I do not continue reporting results)...

I would appreciate some feedback to compare for.

Important issues with Exercise 12.11.2

As correctly showed in the beginning of part b), $\mu \lvert X^n \sim Normal\left(\overline{X} , \frac{\sigma^2}{n}\right)$, with $\sigma=1$. In part d), what it is expected is the posterior of $\theta= e^{\mu}, i.e., we want \mu \lvert X^n. It is important to notice what in the posterior analysis we treat the parameters as random values and the data as constants, so the solutions presented in part d) is completely incorrect. The correct procedure is the following:

$\theta$'s CDF:

$$\begin{eqnarray} F_{\Theta}\left(\theta \vert X^n \right) &=& \mathbb{P}\left(\Theta \leq \theta \vert X^n \right) = \mathbb{P}\left(e^{\mu} \leq \theta \lvert X^n \right) = \mathbb{P}\left(\mu \leq \log(\theta) \vert X^n \right) \\ &=& \mathbb{P}\left(\left(\mu-\overline{X}\right) \frac{\sqrt{n}}{\sigma} \leq \left(\log(\theta)-\overline{X}\right) \frac{\sqrt{n}}{\sigma} \bigg\vert X^n \right) \\ &\approx& \mathbb{P}\left( Z \leq \left(\log(\theta)-\overline{X}\right) \frac{\sqrt{n}}{\sigma} \bigg\vert X^n \right)= \Phi\left[\left(\log(\theta)-\overline{X}\right) \frac{\sqrt{n}}{\sigma}\right] \end{eqnarray}$$

$\theta$'s PDF comes out by taking the derivative respect to $\theta$:
$f_\Theta(\theta)= \phi\left[\left(\log(\theta)-\overline{X}\right) \frac{\sqrt{n}}{\sigma}\right] \frac{\sqrt{n}}{\sigma} \frac{1}{\theta}$
Notice that this solutions depends on the sample size as it is expected for a posteriori, which applies both to $\mu \lvert X^n$ and $\theta\lvert X^n$
I also suggest to respect the symbol definitions in the preface in order to avoid confusions

Small Correction

Exercise 2.10.1.

lim𝑛→∞ℙ(𝐴𝑛)=1−ℙ(𝐴) (3)

RHS should be ℙ(𝐴)

14.9.5

beta1 and beta2 are not independent. Need to include the covariance

Chapter 10 exersice 3

On the very last code cell, the MSE is the variance not the standard error.

Chapter 12 exercise 2 question d - Answer for different question

The question asked for $e^{\mu}$ not $e^{X}$

Issues in Chapter03, solution to exercise 3.14.18 (e)

https://github.com/telmo-correa/all-of-statistics/blob/master/Chapter%2003%20-%20Random%20Variables.ipynb
Hi, there might be an issue with Chapter03, on the solution to
In equation (13), it shows : 1-P(|X|<c)=1-P(-c<X<-c), it should be 1-P(-c<X<c).
Then in equation (14), it should be 1-P((-c-3)/4<Z<(c-3)/4), because it may not be symmetric in the normal distribution.
So finally I got P(|X|>|x|)=Phi((x+3)/4)-Phi((x-3)/4)=0.005, and using brute-force finding that the solution of |x| is 9.5464.
But surely I don't know if I was right.

Typo in first and second exercises of chapter 10

A line break is missing between the equations of the two moments $\alpha_{1}$ and $\alpha_{2}$