Untitled

2. a) Steps to fit a regression tree that predicts Y values using predictors X1, X2, X3:
1. For fitting a regression tree, we would use recursive binary splitting. To do this, we start with full predictor space R = {X1, ... , Xp}. For this question, we will have R = {X1, X2, X3}. Partitioning predictor space, also referred to as recursive binary splitting, is an approach that's top-down (begins with a full region and then successively splits the predictor space) and that's greedy (at each step the best split is made  regardless of next steps).
2. For any 'j' and 's', we define the pair of half-planes:
R1(j, s) = {X|Xj < s}; 	 R2(j, s) = {X|Xj ≥ s}
3. We seek for values of j and s that minimize the RSS equation. This is a top down approach where we recursively and greedily find the best single partitioning of the data such that the reduction of the RSS is the greatest. This process is applied to each of the split parts.
4. We repeat steps 2 and 3 to try and split the data further by minimizing the RSS, until  a stopping criterion (e.g. "no region contains more than 5 observations") is reached.
5. We get a final set of regions R1 ,..., RJ, and later predict the response for a test observation from region Rj via the mean of training observations.
#####################################################
Decision Tree: Full Algorithm:
1) Use recursive binary splitting to grow a large tree T0 on the training data. }. Partitioning predictor space, also referred to as recursive binary splitting, is an approach that's top-down (begins with a full region and then successively splits the predictor space) and that's greedy (at each step the best split is made  regardless of next steps).
2) Pick a grid of values for a tuning parameter 'a' (from 0 to some large number).
3) Apply cost complexity pruning to the large tree T0 from step 1, in order to obtain a sequence of best subtrees minimizing criteria (1), each subtree corresponding to a value on a-grid from step 2. When we refer to cost complexity pruning, we mean that instead of all possible subtrees, we consider a sequence of trees indexed by a tuning parameter 'a' > 0. Each value of 'a' corresponds to a different subtree which minimized the equation:


4) Use K-fold cross-validation to choose 'a' value (the subtree yielding minimum CV error wins).
5) Return the subtree from step 3 that corresponds to the value of 'a' chosen in step 4.
(b) Two ways to test a regression tree model's performance is "Tree Pruning" and "decision trees". By aggregating many decision trees (e.g. bagging, random forests), the predictive performance of trees can be vastly improved.