Giter Club home page Giter Club logo

all-of-statistics's People

Contributors

a-wei-william avatar ang-gr avatar ayenks avatar icestephanie avatar jakobsrepository avatar lukeeeeee avatar pluveto avatar telmo-correa avatar zufchan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

all-of-statistics's Issues

Exercise 11.9.10

There is an error with the presented variance of the MLE $\hat{\lambda}=\overline{X}$. The correct expression would be: $\hat{SE}(\hat{\lambda}) = \sqrt{1/I_n(\hat{\lambda})}=\sqrt{\frac{\overline{X}}{n}}$.

This implies changes over the Wald statistic and the simulation, of course.

Exercise 10.13.7

The solution presented from part a) to part c) is not general, it already assumes $X_1\sim Binomial(n_1,p_1)$ and $X_2\sim Binomial(n_2,p_2)$ only consist of 1 datapoint each, which we only know from part d).

Do not confuse the number of datapoints of $X_1$ and $X_2$ (let us call them $N_1$ and $N_2$) with the number of data contained in each $X_i$, which is given in the exercise as $n_i$ (i.e., $n_i$ is the number of $Bernoulli(p_i)$ repetitions).

The MLE of $p_i$ from $X_i \sim Binomial(n_i,p_i)$ consisting of $N_i$ datapoints comes from maximizing
$$l_{N_i}(p_i)= \log\left(L_{N_i}(p_i)\right)=log(p_i) \sum_{j=1}^{N_i} X_{i,j} + log(1-p_i) \sum_{j=1}^{N_i} \left(n_i-X_{i,j}\right) +\sum_{j=1}^{N_i} log\binom{n_i}{X_{i,j}}$$

  • Taking the derivative of $l_{N_i}(p_i)$ respect to $p_i$ and equaling to zero lead us to the estimator
    $$\hat{p_i}= \frac{1}{n_i N_i} \sum_{j=1}^{N_i} X_{i,j}={\overline{X_i} \over n_i}$$

  • The MLE estimator of $\psi=p_1-p_2$ would be then
    $$\hat{\psi} = \hat{p_1} - \hat{p_2}= {\overline{X_1} \over n_1} - {\overline{X_2} \over n_2}$$

  • The Fisher's Matrix would be

$$\begin{eqnarray} I(\hat{p_1},\hat{p_2})=\begin{bmatrix} \frac{\hat{p_1} \left(1- \hat{p_1}\right)}{N_1 n_1} & 0 \\ 0 & \frac{\hat{p_2} \left(1- \hat{p_2}\right)}{N_2 n_2} \end{bmatrix} \end{eqnarray}$$

  • As in d) it is revealed that we only have 1 datapoint for $X_1$ and $X_2$, then $N_1=1$ and $N_2=1$, which lead to the correct expression showed in this solution.

Exercise 8.5.8 Standard Error mistake

Hi!

First of all thanks for this repo! It's been a big help for self-studying the book.
In question 8.5.8 I think se = r.std() should be replaced by se = r.sem() or math.sqrt(r.var() / len(r)). The std calculates the standard deviation of the values themselves where as we want the standard error of the mean estimate which is calculated using:
image

Minor isssues with Exercise 9.6.6

There is an issue with the variance of true sampling distribution, which should be $\mathbb{V}\left(\overline{X_n}\right)=1/n$ (not $1/n^2$). Thus, we will have $\overline{X_n}=\frac{1}{n} \sum X_i \sim \frac{1}{n} N(n \mu,n) = N(\mu, 1/n)$.

This change will induce a less concentrated distribution in comparison with the solution presented at this moment (as the scale should be $1/ \sqrt{n}$).

Exercise 10.13.6

  • On part b) the correct 95% CI would be $\Phi(\overline{X}) \pm z_{.025} \frac{\phi(\overline{X})}{\sqrt{n}}$.
  • On part e) As correctly expressed here, the true value of $\psi= 1-F_X(0)$. What I think is the correct procedure for the rest of the exercise is the following: By the WLLN $\overline{X}$ converges in probability to $\mu_X=\mathbb{E}(X)$ (the true mean of the RVs $X_i$). According to this, $\hat{\psi}$ tends in probability to $\Phi(\mu_X)$. To finalize, in general we have that $1-F_X(0)\neq \Phi(\mu_X)$.

Minor corrections on Exercise 6.8.13

In the current status of the solution it is used that $\mathbb{P}(Z_i>x/n)=\int_0^{x/n} f(u) \rm{d}u$, $F(0)=1$, and $F'(0)= -\lambda$. None of these statements are correct.

The correct statements are:

  • $\mathbb{P}(Z_i>x/n)=\int_0^{\infty} f(u) \rm{d}u - \int_0^{x/n} f(u) \rm{d}u= 1 - \int_0^{x/n} f(u) \rm{d}u$ = 1 - F(x/n).

  • $F(0) = \int_0^0 f(u) \rm{d}u = 0$.

  • $F'(0) = \lim_{x \to 0} f(x) = \lambda$.

All these changes lead us to the same expected result.

Issues with Exercise 5.6.4 b) (Chapter 5: Inequalities)

As pointed out by the statement of the question $\mathbb{P}(p \in C_n) \geq 1- \alpha = 0.95$, for $\alpha=0.05$. According to this, the Coverage should always be above 0.95, which is not what we see in the simulation provided here.

Sincerely, I do not understand the simulation provided since I am not familiar with "tqdm" and maybe other things. If it helps, what I did was to generate 100 Bernoulli(p) RV of length N=10.000, calculate the 100 cumulative RVs (of length N=10000) and average the coverage over the 100 repetitions. My caveats: I am not sure I really need to average over a given number of random vectors (which I arbitrarily decided to be 100)

Here mycode:

```
import math
import random as rd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import bernoulli
np.random.seed(1)

N = 10000
p = 0.4
alpha=0.05
nn=np.arange(1,N+1)
XX_Cum=[np.cumsum(bernoulli.rvs(p,size=N))/nn for i in range(100)]

def epsilon(N,alpha):
    return np.sqrt(1/(2*N)*np.log(2/alpha))

Counter=np.array([[np.abs(XX_Cum[j][i-1]-p)<=epsilon(i,alpha) for i in range(1,N+1)] for j in range(100)])

Coverage=[np.count_nonzero(Counter.transpose()[i])/100 for i in range(len(Counter.transpose()))]

plt.plot(nn, Coverage,label='Coverage')
plt.plot(nn, [1-alpha for i in range(len(Counter.transpose()))],label='Hoeffding s lower bound')
# naming the x axis
plt.xlabel('N')
# naming the y axis
plt.ylabel('$P(p \in C_n)$')
# giving a title to my graph
plt.legend()
  
# function to show the plot
plt.show()

```

image

Mistake in risk estimation

Hello! Thank you for a great work! It seems like a mistake in Chapter 21 - Nonparametric Curve Estimation in function j_hat_kde, since as mentioned in the book and in your text K^(2) is N(0,2) the function should look like (also the comment contains mistake too, h is a bandwidth), however the comment contains the information that dataset is rescaled to [0, 1], but I did not find any evidence for such transformation in the book as well as code of this function does not contain it, please could you double check this:

def j_hat_kde(X, h):
    """
    Calculate the approximated estimated KDE risk J_hat for a N(0, 1) Gaussian kernel
    
      \hat{J}(h) = \frac{1}{hn^2}\sum_{i, j} K^* \left( \frac{X_i - X_j}{h} \right) + \frac{2}{nh} K(0)
      
    where:
      n is the dataset size
      h is the bandwidth for the rescaled [0, 1] dataset
      K^* is K^{(2)}(x) - 2 K(x), and K^{(2)} is the convolved kernel, K^{(2)}(z) = \int K(z - y) K(y) dy
      K is the original kernel
    """
    n = len(X)
    Kstar_args = np.array([X.iloc[i] - X.iloc[j] for i, j in product(range(n), range(n))]) / h
    sum_value = np.sum(norm.pdf(Kstar_args, loc=0, scale = 2) - 2 * norm.pdf(Kstar_args, loc=0, scale = 1))
    return sum_value / (h * n * n) + 2 * norm.pdf(0) / (n * h)

Proof 5.4

There is a typo: $Y-i$ shoold be $Y_i$

Exercise 9.6.1

It looks to me like you are picking a random lsat and a random gpa and making them a pair for the bootstrap.
I think we want to pick a random pair directly instead.

e.g.
lsat = [1,2]
gpa = [3,4]

We want to pick randomly from (1,3) and (2,4).
So (1,4) would be impossible.

Exercise 4.7.13

"**Solution**. We have $X = C U_1 + (1 - C)U_2$, where $U \\sim \\text{Bernoulli}(1/2)$, $U_1 \\sim \\text{Uniform}(0, 1)$ and $U_2 \\sim \\text{Uniform}(0, 2)$ are all independent."

is $U \\sim \\text{Bernoulli}(1/2)$ meant to be $C \\sim \\text{Bernoulli}(1/2)$?
also is U2 $U_2 \\sim \\text{Uniform}(3, 4)$

Little issue with Exercise 6.8.8

The proposed solution is algebraically correct on the left hand side of the inequality of the first probability, but $\sigma$ stands for the standard deviation of the Poisson($\lambda=1$), which is $\sqrt{\lambda} =1$. This changes the rest of the solution as follows:

$\mathbb{P}\left(Y&lt;90\right)=\mathbb{P}\left(\overline{X}_n&lt;{90 \over n}\right)=\mathbb{P}\left(\sqrt{n}\frac{\left(\overline{X}_n-\mu\right)}{\sigma}&lt;\sqrt{n}\frac{\left({90 \over n}-\mu\right)}{\sigma} \right) \approx \mathbb{P}\left( Z&lt;\sqrt{100}\left({90 \over 100}-1\right) \right)= \Phi(-1)$.

Issue with exercise 8.5.2

In the Cauchy runs to test nonparametric confidence band, the cdf the code uses to compare the cdf is from the normal distribution, not the cauchy distribution as it should be. This leads to many more instances of the actual cdf being outside the confidence band.

Issues with Exercise 9.6.5

In the sections a) and b) (and by extension in the rest of the exercise) there are some issues with the meaning of $\mathbb{E}\left(\overline{X_n^\star}\vert X_1, ... ,X_n\right )$ and $\mathbb{V}\left(\overline{X_n^\star} \vert X_1, ... ,X_n\right)$. As these expressions are conditioned over random variables, the outcomes should also be random variables, not values (as it is expressed in this solution). My solution is the following:

$$\mathbb{E}\left(\overline{X_n^\star}\vert X_1, ... ,X_n\right ) = \frac{1}{n} \sum_i \mathbb{E}\left(X_i^\star \vert X_1, ... ,X_n\right ) = \frac{1}{n} \sum_i \frac{1}{n}\sum_j X_j = \frac{1}{n} \sum_i \overline{X_n} = \overline{X_n} $$

$$\begin{eqnarray} \mathbb{V}\left(\overline{X_n^\star} \vert X_1, ... ,X_n\right) &=& \frac{1}{n^2} \sum_i \mathbb{V}\left(X_i^\star \vert X_1, ... ,X_n\right ) = \frac{1}{n} \mathbb{V}\left(X_1^\star \vert X_1, ... ,X_n\right ) = \frac{1}{n} \left[ \mathbb{E}\left({X_1^\star}^2 \vert X_1, ... ,X_n\right ) - \mathbb{E}^2\left({X_1^\star} \vert X_1, ... ,X_n\right) \right] \\ &=& \frac{1}{n} \left[ \frac{1}{n} \sum_i X_i^2 - \overline{X_n}^2 \right]= \frac{1}{n^2} \sum_i \left(X_i - \overline{X_n} \right)^2 = \frac{1}{n} \hat{\sigma} = \frac{n-1}{n^2} Sn^2 \end{eqnarray}$$

As I previously said, these changes produce some effects over the rest of the solution. However, the modifications I propose lead to the same final results of c) and d).

Issues with Exercise 9.6.2

1- I don't think that the "fourth method" that the exercise ask for is the Parametric bootstrap. but the "The Jacknife" Confidence Interval that is explained in the Appendix of the chapter.

2- The second part of the exercise ask for "the true coverage", which means the "the percentage in which the true value of the statistic" (in this case the Skewness parameter). My way to solve this is the following (excuse me if my code is not very fluent)

import random as rd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
from tqdm.notebook import trange, tqdm
np.random.seed(1)

def skew(x):
    mean=np.mean(x)
    se=np.std(x)
    return np.mean((x-mean)**3)/se**3

B=10000
def T_boot(x):
    t_boot=np.empty(B)
    n_x=len(x)
    for i in tqdm(range(B)):
        x_star=np.random.choice(x,n_x,replace=True)
        t_boot[i]=skew(x_star)
    return t_boot,np.std(t_boot)

#Creation of 1000 random vectors of n=50 datapoints
Y2=norm.rvs(loc=0, scale=1, size=(1000,50))
X2=np.exp(Y2)

#Mapping each random vector with the bootstrap to produce the confidence intervals
t_bootV=list(map(T_boot,X2))

#Generating a list with the confidence intervals from every random vector
#Normal 95% confidence intervals
z_95=norm.ppf(0.975)
NCI=np.array(list(map(lambda x,y: (skew(x)-z_95*y,skew(x)+z_95*y),X2,list(zip(*t_bootV))[1])))

#Percentile 95% confidence intervals
PerCI=np.array(list(map(lambda x: (np.quantile(x,0.025),np.quantile(x,0.975)),list(zip(*t_bootV))[0])))

#Pivotal 95% confidence intervals
PivCI=np.array(list(map(lambda x,y: (2*skew(x)-np.quantile(y,0.975),2*skew(x)-np.quantile(y,0.025)),X2,list(zip(*t_bootV))[0])))

According to theory:

  • $E(X)=E(e^{Y})= \int_{-\infty}^\infty e^{y} {\rm{d}}F_Y(y)=e^{1/2}$.
  • $\sigma_X^2=V(X)=E[X-E(X)]^2=E[e^{Y}-E(e^Y)]^2= \int_{-\infty}^\infty \left(e^{y} - e^{1/2}\right)^2 {\rm{d}}F_Y(y)=e(e-1)$.
  • $E[X-E(X)]^3=E[e^{Y}-E(e^Y)]^3 = \int_{-\infty}^\infty \left(e^{y} - e^{1/2}\right)^3 {\rm{d}}F_Y(y) = e^{3/2} (2-3 e +e^3)$.

So that the skewness parameter of $X=e^Y$ is $\theta=T(F)=\int (x-\mu)^{3} {\rm d}F(x) /\sigma_X^3=\frac{E[X-E(X)]^3}{\left(E[X-E[X)]^2\right)^{3/2}}=\frac{e^{3/2} (2-3 e +e^3)}{e^{3/2}(e-1)^{3/2}}= \frac{(2-3 e +e^3)}{(e-1)^{3/2}} \approx 6.18$.
This will be the value we expect to be within our confidences intervals (not only for each method, but also for each simulated random vector). The "true coverage" estimations continues as follow:

#The true parameter:
skewness_f=(2-3*np.exp(1)+np.exp(1)**3)/(np.exp(1)-1)**(3/2)

#Comparison with the Normal 95% Confidence Intervals
cov_NCI=np.count_nonzero((NCI.T[1]>=skewness_f) == (skewness_f>=NCI.T[0]))/len(NCI.T[0])
cov_NCI
#Comparison with the Percentile 95% Confidence Intervals
cov_PerCI=np.count_nonzero((PerCI.T[1]>=skewness_f) == (skewness_f>=PerCI.T[0]))/len(PerCI.T[0])
cov_PerCI

#Comparison with the Pivotal 95% Confidence Intervals
cov_PivCI=np.count_nonzero((PivCI.T[1]>=skewness_f) == (skewness_f>=PivCI.T[0]))/len(PivCI.T[0])
cov_PivCI

print('The true coverage for the simulated Normal confidence intervals is: %.3f' % cov_NCI)
print('The true coverage for the simulated Percentile confidence intervals is: %.3f' % cov_PerCI)
print('The true coverage for the simulated Pivotal confidence intervals is: %.3f' % cov_PivCI)

The output of the simulation:
image
This means that our 50 datapoints are not enough to make our bootstrap 95% confidence intervals to reach the expected coverage...

With $n=1000$ datapoints the coverage gets better:
image

With $n=100000$ datapoints the coverage gets closer to the expected 95%,
image

Assuming that my procedure is correct, I only need more data points (which is more computationally expensive, reason of which I do not continue reporting results)...

I would appreciate some feedback to compare for.

Important issues with Exercise 12.11.2

As correctly showed in the beginning of part b), $\mu \lvert X^n \sim Normal\left(\overline{X} , \frac{\sigma^2}{n}\right)$, with $\sigma=1$. In part d), what it is expected is the posterior of $\theta= e^{\mu}, i.e., we want \mu \lvert X^n. It is important to notice what in the posterior analysis we treat the parameters as random values and the data as constants, so the solutions presented in part d) is completely incorrect. The correct procedure is the following:

  • $\theta$'s CDF:

$$\begin{eqnarray} F_{\Theta}\left(\theta \vert X^n \right) &=& \mathbb{P}\left(\Theta \leq \theta \vert X^n \right) = \mathbb{P}\left(e^{\mu} \leq \theta \lvert X^n \right) = \mathbb{P}\left(\mu \leq \log(\theta) \vert X^n \right) \\ &=& \mathbb{P}\left(\left(\mu-\overline{X}\right) \frac{\sqrt{n}}{\sigma} \leq \left(\log(\theta)-\overline{X}\right) \frac{\sqrt{n}}{\sigma} \bigg\vert X^n \right) \\ &\approx& \mathbb{P}\left( Z \leq \left(\log(\theta)-\overline{X}\right) \frac{\sqrt{n}}{\sigma} \bigg\vert X^n \right)= \Phi\left[\left(\log(\theta)-\overline{X}\right) \frac{\sqrt{n}}{\sigma}\right] \end{eqnarray}$$

  • $\theta$'s PDF comes out by taking the derivative respect to $\theta$:
    $f_\Theta(\theta)= \phi\left[\left(\log(\theta)-\overline{X}\right) \frac{\sqrt{n}}{\sigma}\right] \frac{\sqrt{n}}{\sigma} \frac{1}{\theta}$

  • Notice that this solutions depends on the sample size as it is expected for a posteriori, which applies both to $\mu \lvert X^n$ and $\theta\lvert X^n$

  • I also suggest to respect the symbol definitions in the preface in order to avoid confusions

Small Correction

Exercise 2.10.1.

limπ‘›β†’βˆžβ„™(𝐴𝑛)=1βˆ’β„™(𝐴) (3)

RHS should be β„™(𝐴)

14.9.5

beta1 and beta2 are not independent. Need to include the covariance

Issues in Chapter03, solution to exercise 3.14.18 (e)

https://github.com/telmo-correa/all-of-statistics/blob/master/Chapter%2003%20-%20Random%20Variables.ipynb
Hi, there might be an issue with Chapter03, on the solution to
In equation (13), it shows : 1-P(|X|<c)=1-P(-c<X<-c), it should be 1-P(-c<X<c).
Then in equation (14), it should be 1-P((-c-3)/4<Z<(c-3)/4), because it may not be symmetric in the normal distribution.
So finally I got P(|X|>|x|)=Phi((x+3)/4)-Phi((x-3)/4)=0.005, and using brute-force finding that the solution of |x| is 9.5464.
But surely I don't know if I was right.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.