Statistic analysis requires us to obtain a proper sample from a population of interest that have measured characteristics. The sampling distribution of a statistic is that probability distribution obtained from all possible samples of the same number of observations drawn from the population.

Let NN denote the population and nn denote the sample size. When NN is large, it’s impossible for us to list all samples. Thus, we need to analyze the population from sample.

The Law of Large Numbers

If xi,i=1,2,,nx_i, i=1,2,\dots,n are independent and identically distributed (i.i.d.) with mean μ\mu, then xˉ\bar x approaches μ\mu as nn \to \infin, regardless of distribution of xix_i.

The “approach” means “is consistent to μ\mu”, which states that: TBC…

The Central Limit Theorem

If xi,i=1,2,,nx_i,i=1,2,\dots,n are i.i.d. with mean μ\mu and variance σ2\sigma^2, then the distribution of

z=xˉμσ/nz=\frac{\bar x-\mu}{\sigma/\sqrt{n}}

approaches that of N(0,1)\mathcal{N}(0,1) as nn\to\infin.

Sampling Distributions of Sample Means

Mean of Sample Means

For random sampling with replacement, we have

E[xˉ]=E[x1++xnn]=E[x1]++E[xn]n=μ\begin{aligned} \mathbb{E}[\bar{x}] &= \mathbb{E}[\frac{x_1 + \dots + x_n}{n}] \\ &= \frac{\mathbb{E}[x_1] + \dots + \mathbb{E}[x_n]}{n} = \mu \end{aligned}

Here xix_i has the same distribution as XX.

For random sampling without replacement, we also have E[xˉ]=μ\mathbb{E}[\bar{x}]=\mu

Variance of Sample Means

  • If with replacement, we have Var(xˉ)=σ2nVar(\bar{x})=\frac{\sigma^2}{n};
  • If without replacement, we have Var(xˉ)=σ2nNnN1Var(\bar{x})=\frac{\sigma^2}{n}\frac{N-n}{N-1}

Here, the NnN1\frac{N-n}{N-1} factor is called finite population correction factor.

We denote σxˉ=Var(xˉ)\sigma_{\bar{x}}=\sqrt{Var(\bar{x})}. If the population follows the normal distribution, then xˉ\bar{x} follows a normal distribution N(μ,σ2n)\mathcal{N}(\mu, \frac{\sigma^2}{n}), which is equivalent to

z=xˉμσ/nN(0,1)z=\frac{\bar{x}-\mu}{\sigma/\sqrt{n}} \sim \mathcal{N}(0,1)


Sampling Distribution of Sample Proportion

Let X:=i=1nxiBinomial(n,p)X:=\sum_{i=1}^n x_i\sim \text{Binomial}(n,p). The sample proportion is p^=Xn\hat p=\frac{X}{n}. Then we have

E[p^]=pσp^=p(1p)n\begin{aligned} \mathbb{E}[\hat p] &= p \\ \sigma_{\hat p} &= \sqrt{\frac{p(1-p)}{n}} \end{aligned}

As nn\to\infin, p^\hat p approaches pp by LLN, and that

z=p^pσp^N(0,1)z=\frac{\hat p-p}{\sigma_{\hat p}} \sim \mathcal{N}(0,1)


Sampling Distribution of Sample Variance

Recall that s2=i(xixˉ)2n1s^2=\frac{\sum_i (x_i-\bar x)^2}{n-1} is a natural estimator of σ2\sigma^2, we have E[s2]=σ2\mathbb{E}[s^2]=\sigma^2. And if the population r.v. XX is normally distributed, then Var(s2)=2σ4n1Var(s^2)=\frac{2\sigma^4}{n-1} and χ2=(n1)s2σ2χn12\chi^2=\frac{(n-1)s^2}{\sigma^2}\sim \chi_{n-1}^2.

What is χ2\chi^2 distribution

If Z1,,ZvZ_1,\dots,Z_v are i.i.d. such that ZiN(0,1)Z_i\sim\mathcal{N}(0,1), then X=ivZi2X=\sum_i^v Z_i^2 follows χv2\chi_v^2 distribution with degree of freedom vv.

Suppose χ2\chi^2 follows χv2\chi^2_v distribution, then E[χ2]=v\mathbb{E}[\chi^2]=v and Var(χv2)=2vVar(\chi^2_v)=2v.

Thus with the knowledge of chi-square distribution, we know E[(n1)s2σ2]=n1\mathbb{E}[\frac{(n-1)s^2}{\sigma^2}]=n-1, thus E[s2]=σ2\mathbb{E}[s^2]=\sigma^2.

The conclusion holds even if xix_i does not follow a normal distribution. Note that i(xixˉ)2=i(xiμ)2n(xˉμ)2\sum_i (x_i-\bar x)^2=\sum_i (x_i-\mu)^2 - n(\bar x-\mu)^2, so E[i(xixˉ)2]=(n1)σ2\mathbb{E}[\sum_i (x_i-\bar x)^2]=(n-1)\sigma^2, and thus E[s2]=σ2\mathbb{E}[s^2]=\sigma^2.

So why it’s χn12\chi_{n-1}^2 distribution? Why there’s a loss in degree of freedom?

The reason is that the nn values xiμx_i-\mu has only n1n-1 degree of freedom, i.e., if we know xiμ,i=1,2,,n1x_i-\mu, i=1,2,\dots,n-1, we can immediately know xnμx_n-\mu

Without replacement cases. In random sampling without replacement, E[s2]=NN1σ2\mathbb{E}[s^2]=\frac{N}{N-1}\sigma^2, and an unbiased estimator of Var()ˉ=S2nNnNVar(\bar)=\frac{S^2}{n}\cdot\frac{N-n}{N} is s2nNnN\frac{s^2}{n}\frac{N-n}{N}.

In random sampling with replacement, but XX is not normally distributed, Var(s2)=μ4nn3n(n1)σ4Var(s^2)=\frac{\mu_4}{n}-\frac{n-3}{n(n-1)}\sigma^4.