Mean

  • Population mean: μ=i=1NxiN\mu=\frac{\sum_{i=1}^N x_i}{N}
  • Sample mean: xˉ=i=1nxin\bar{x}=\frac{\sum_{i=1}^n x_i}{n}

Median

Mode 众数

常用于 categorical data.一个众数 = unimodal;两个众数 = bimodal;multimodal

Percentiles and Quartiles

measures that indicate the location, or position, of a value relative to the entire set of data, 适合描述大数据集

pp-th percentile = approximately p%p\% of observations \le this value. value located at p100(n+1)\frac{p}{100}(n+1)-th position.

Quartiles:

  • Q1:25%Q_1: 25\% percentiles
  • Q2:50%Q_2: 50\% percentiles
  • Q3:75%Q_3: 75\% percentiles

Five-Number Summary

minimum, first quartile, median, third quartile, maximum

Range

maxmin\max-\min

sensitive to outliers

Interquartile Range

IQR=Q3Q1IQR=Q_3-Q_1

Box-and-Whisker Plot

Inner box 表示 Q1Q3Q_1\sim Q_3 的范围,inner box 中间的分割线表示 median,两条 whiskers 分别从 Q1Q_1 延伸到 minimum 和从 Q3Q_3 延伸到 maximum.

对于实数的 boxplot,upper whisker ends at min(Vmax,Q3+1.5×IQR)\min(V_{\max}, Q_3+1.5\times IQR), lower whisker ends at max(Q11.5×IQR,Vmin)\max(Q_1-1.5\times IQR, V_{\min})


Variance

Population variance

σ2=i=1N(xiμ)2N=i=1Nxi2Nμ2\sigma^2=\frac{\sum_{i=1}^N (x_i-\mu)^2}{N}=\frac{\sum_{i=1}^N x_i^2}{N}-\mu^2

Sample variance

s2=i=1n(xixˉ)2n1=i=1nxi2nxˉ2n1s^2=\frac{\sum_{i=1}^n (x_i-\bar{x})^2}{n-1}=\frac{\sum_{i=1}^n x_i^2 -n\bar{x}^2}{n-1}

Coefficient of Variance (CV)

measure of relative dispersion that expresses the standard deviation as a percentage of the mean

population CV = σμ×100%\frac{\sigma}{\mu}\times 100\%, sample CV = sxˉ×100%\frac{s}{\bar{x}}\times 100\%

Empirical Rule

  • 大约 0.680.68 的数据落在 [μ±σ][\mu\plusmn \sigma] 的区间里
  • 大约 0.950.95 的数据落在 [μ±2σ][\mu\plusmn 2\sigma] 的区间里
  • 大约 0.9970.997 的数据落在 [μ±3σ][\mu\plusmn 3\sigma] 的区间里

z-Score

measures the location or position of a value relative to the mean of the distribution: it is a standardized value that indicates the number of standard deviations a value is from the mean

Population zi=xiμσz_i=\frac{x_i-\mu}{\sigma}, sample zi=xixˉsz_i=\frac{x_i-\bar{x}}{s}

Skewness of Distribution

skewness=1ni=1n(xixˉ)3s3\text{skewness}=\frac{1}{n}\frac{\sum_{i=1}^n (x_i-\bar{x})^3}{s^3}

Covariance

Population covariance

Cov(x,y)=σx,y=i=1N(xiμx)(yiμy)NCov(x,y)=\sigma_{x,y}=\frac{\sum_{i=1}^N (x_i-\mu_x)(y_i-\mu_y)}{N}

Sample covariance

sx,y=i=1n(xiμx)(yiμy)ns_{x,y}=\frac{\sum_{i=1}^n (x_i-\mu_x)(y_i-\mu_y)}{n}

Correlation

free of units and provides both the direction and strength of a relationship

Population correlation

ρxy=σxyσxσy\rho_{xy}=\frac{\sigma_{xy}}{\sigma_x\sigma_y}

Sample correlation

rxy=sxysxsyr_{xy}=\frac{s_{xy}}{s_xs_y}

Relationship 存在:rxy2n|r_{xy}|\ge \frac{2}{\sqrt{n}}