统计学: Sampling Distribution Theory

Statistic analysis requires us to obtain a proper sample from a population of interest that have measured characteristics. The sampling distribution of a statistic is that probability distribution obtained from all possible samples of the same number of observations drawn from the population.

Let $N$ denote the population and $n$ denote the sample size. When $N$ is large, it’s impossible for us to list all samples. Thus, we need to analyze the population from sample.

The Law of Large Numbers

If $x_i, i=1,2,\dots,n$ are independent and identically distributed (i.i.d.) with mean $\mu$ , then $\bar x$ approaches $\mu$ as $n \to \infin$ , regardless of distribution of $x_i$ .

The “approach” means “is consistent to $\mu$ ”, which states that: TBC…

The Central Limit Theorem

If $x_i,i=1,2,\dots,n$ are i.i.d. with mean $\mu$ and variance $\sigma^2$ , then the distribution of

z=\frac{\bar x-\mu}{\sigma/\sqrt{n}}

approaches that of $\mathcal{N}(0,1)$ as $n\to\infin$ .

Sampling Distributions of Sample Means

Mean of Sample Means

For random sampling with replacement, we have

\begin{aligned} \mathbb{E}[\bar{x}] &= \mathbb{E}[\frac{x_1 + \dots + x_n}{n}] \\ &= \frac{\mathbb{E}[x_1] + \dots + \mathbb{E}[x_n]}{n} = \mu \end{aligned}

Here $x_i$ has the same distribution as $X$ .

For random sampling without replacement, we also have $\mathbb{E}[\bar{x}]=\mu$

Variance of Sample Means

If with replacement, we have $Var(\bar{x})=\frac{\sigma^2}{n}$ ;
If without replacement, we have $Var(\bar{x})=\frac{\sigma^2}{n}\frac{N-n}{N-1}$

Here, the $\frac{N-n}{N-1}$ factor is called finite population correction factor.

We denote $\sigma_{\bar{x}}=\sqrt{Var(\bar{x})}$ . If the population follows the normal distribution, then $\bar{x}$ follows a normal distribution $\mathcal{N}(\mu, \frac{\sigma^2}{n})$ , which is equivalent to

z=\frac{\bar{x}-\mu}{\sigma/\sqrt{n}} \sim \mathcal{N}(0,1)

Sampling Distribution of Sample Proportion

Let $X:=\sum_{i=1}^n x_i\sim \text{Binomial}(n,p)$ . The sample proportion is $\hat p=\frac{X}{n}$ . Then we have

\begin{aligned} \mathbb{E}[\hat p] &= p \\ \sigma_{\hat p} &= \sqrt{\frac{p(1-p)}{n}} \end{aligned}

As $n\to\infin$ , $\hat p$ approaches $p$ by LLN, and that

z=\frac{\hat p-p}{\sigma_{\hat p}} \sim \mathcal{N}(0,1)

Sampling Distribution of Sample Variance

Recall that $s^2=\frac{\sum_i (x_i-\bar x)^2}{n-1}$ is a natural estimator of $\sigma^2$ , we have $\mathbb{E}[s^2]=\sigma^2$ . And if the population r.v. $X$ is normally distributed, then $Var(s^2)=\frac{2\sigma^4}{n-1}$ and $\chi^2=\frac{(n-1)s^2}{\sigma^2}\sim \chi_{n-1}^2$ .

What is $\chi^2$ distribution

If $Z_1,\dots,Z_v$ are i.i.d. such that $Z_i\sim\mathcal{N}(0,1)$ , then $X=\sum_i^v Z_i^2$ follows $\chi_v^2$ distribution with degree of freedom $v$ .

Suppose $\chi^2$ follows $\chi^2_v$ distribution, then $\mathbb{E}[\chi^2]=v$ and $Var(\chi^2_v)=2v$ .

Thus with the knowledge of chi-square distribution, we know $\mathbb{E}[\frac{(n-1)s^2}{\sigma^2}]=n-1$ , thus $\mathbb{E}[s^2]=\sigma^2$ .

The conclusion holds even if $x_i$ does not follow a normal distribution. Note that $\sum_i (x_i-\bar x)^2=\sum_i (x_i-\mu)^2 - n(\bar x-\mu)^2$ , so $\mathbb{E}[\sum_i (x_i-\bar x)^2]=(n-1)\sigma^2$ , and thus $\mathbb{E}[s^2]=\sigma^2$ .

So why it’s $\chi_{n-1}^2$ distribution? Why there’s a loss in degree of freedom?

The reason is that the $n$ values $x_i-\mu$ has only $n-1$ degree of freedom, i.e., if we know $x_i-\mu, i=1,2,\dots,n-1$ , we can immediately know $x_n-\mu$

Without replacement cases. In random sampling without replacement, $\mathbb{E}[s^2]=\frac{N}{N-1}\sigma^2$ , and an unbiased estimator of $Var(\bar)=\frac{S^2}{n}\cdot\frac{N-n}{N}$ is $\frac{s^2}{n}\frac{N-n}{N}$ .

In random sampling with replacement, but $X$ is not normally distributed, $Var(s^2)=\frac{\mu_4}{n}-\frac{n-3}{n(n-1)}\sigma^4$ .