Linear Regression

Polynomial Regression

对每一个 feature xix_i 都映射到 xi1,xi2,xidx_i^1,x_i^2,\dots x_i^d,以及可能的交叉项。

Decision Tree (Random Forest) Regression

Random Forest: 视为 ensemble of 多个 linear functions

Impurity for continuous variables

I(t)=MSE(t)=1NtiDt(y(i)y^t)2 I(t)=MSE(t)=\frac{1}{N_t}\cdot \sum_{i\in D_t}\Big( y^{(i)}-\hat y_t \Big)^2

其中 NtN_t 是节点 tt 内的样本数量,DtD_t 代表这个节点对应的所有样本,y^t\hat y_t 代表预测样本值(其实是 sample mean),y(i)y^{(i)} 表示样本真实值

y^t=1NtiDty(i) \hat y_t=\frac{1}{N_t}\sum_{i\in D_t}y^{(i)}

使用随机森林时,在构建单棵决策树的时候,predicted target variable is calculated as the average prediction over all decision trees