Statistic analysis requires us to obtain a proper sample from a population of interest that have measured characteristics. The sampling distribution of a statistic is that probability distribution obtained from all possible samples of the same number of observations drawn from the population.
Let N denote the population and n denote the sample size. When N is large, it’s impossible for us to list all samples. Thus, we need to analyze the population from sample.
The Law of Large Numbers
If xi,i=1,2,…,n are independent and identically distributed (i.i.d.) with mean μ, then xˉapproachesμ as n→∞, regardless of distribution of xi.
The “approach” means “is consistent to μ”, which states that: TBC…
The Central Limit Theorem
If xi,i=1,2,…,n are i.i.d. with mean μ and variance σ2, then the distribution of
z=σ/nxˉ−μ
approaches that of N(0,1) as n→∞.
Sampling Distributions of Sample Means
Mean of Sample Means
For random sampling with replacement, we have
E[xˉ]=E[nx1+⋯+xn]=nE[x1]+⋯+E[xn]=μ
Here xi has the same distribution as X.
For random sampling without replacement, we also have E[xˉ]=μ
Variance of Sample Means
If with replacement, we have Var(xˉ)=nσ2;
If without replacement, we have Var(xˉ)=nσ2N−1N−n
Here, the N−1N−n factor is called finite population correction factor.
We denote σxˉ=Var(xˉ). If the population follows the normal distribution, then xˉ follows a normal distribution N(μ,nσ2), which is equivalent to
z=σ/nxˉ−μ∼N(0,1)
Sampling Distribution of Sample Proportion
Let X:=∑i=1nxi∼Binomial(n,p). The sample proportion is p^=nX. Then we have
E[p^]σp^=p=np(1−p)
As n→∞, p^ approaches p by LLN, and that
z=σp^p^−p∼N(0,1)
Sampling Distribution of Sample Variance
Recall that s2=n−1∑i(xi−xˉ)2 is a natural estimator of σ2, we have E[s2]=σ2. And if the population r.v. X is normally distributed, then Var(s2)=n−12σ4 and χ2=σ2(n−1)s2∼χn−12.
What is χ2 distribution
If Z1,…,Zv are i.i.d. such that Zi∼N(0,1), then X=∑ivZi2 follows χv2 distribution with degree of freedom v.
Suppose χ2 follows χv2 distribution, then E[χ2]=v and Var(χv2)=2v.
Thus with the knowledge of chi-square distribution, we know E[σ2(n−1)s2]=n−1, thus E[s2]=σ2.
The conclusion holds even if xi does not follow a normal distribution. Note that ∑i(xi−xˉ)2=∑i(xi−μ)2−n(xˉ−μ)2, so E[∑i(xi−xˉ)2]=(n−1)σ2, and thus E[s2]=σ2.
So why it’s χn−12 distribution? Why there’s a loss in degree of freedom?
The reason is that the n values xi−μ has only n−1 degree of freedom, i.e., if we know xi−μ,i=1,2,…,n−1, we can immediately know xn−μ
Without replacement cases. In random sampling without replacement, E[s2]=N−1Nσ2, and an unbiased estimator of Var()ˉ=nS2⋅NN−n is ns2NN−n.
In random sampling with replacement, but X is not normally distributed, Var(s2)=nμ4−n(n−1)n−3σ4.