Regression

Linear Regression

对每一个 feature $x_i$ 都映射到 $x_i^1,x_i^2,\dots x_i^d$ ，以及可能的交叉项。

Random Forest: 视为 ensemble of 多个 linear functions

Impurity for continuous variables

I(t)=MSE(t)=\frac{1}{N_t}\cdot \sum_{i\in D_t}\Big( y^{(i)}-\hat y_t \Big)^2

其中 $N_t$ 是节点 $t$ 内的样本数量， $D_t$ 代表这个节点对应的所有样本， $\hat y_t$ 代表预测样本值（其实是 sample mean）， $y^{(i)}$ 表示样本真实值

\hat y_t=\frac{1}{N_t}\sum_{i\in D_t}y^{(i)}

使用随机森林时，在构建单棵决策树的时候，predicted target variable is calculated as the average prediction over all decision trees