ECON 570 Problem Set 2

ECON 570 Problem Set 2

1 Decision Trees
A. Load the Boston house prices dataset with the load_boston method from the module sklearn.datasets. What is the target variable and what are the features?
B. Import the class DecisionTreeRegressor from the module sklearn.tree. Using 5- fold cross-validation, plot the training error and test error as you vary the parameter max_depth from 1 to 8 in predicting the target variable from the features. Use mean squared error as the evaluation metric.
C. What is the optimal max_depth?

2 Ensemble Estimators
Throughout the following exercises, continue using mean squared error as the evaluation metric. Furthermore, continue to do 5-fold cross-validation and make sure that folds used in each part below are the same.
A. Construct a bagging estimator from the base estimator DecisionTreeRegressor. Consider two possible values for max_depth: 2 and 6. Let B be the number of trees aggregated. Plot the test error as B increases from 1 to 200, for both possible values of max_depth.
B. Use the class RandomForestRegressor from the module sklearn.ensemble and set the number of features to consider when looking for the best split to be the square root of the number of features. Consider again two possible values for the parameter max_depth: 2 and 6, and let B be the number of trees aggregated. Plot the test error
as B increases from 1 to 200, for both possible values of max_depth.

C. Install the Python package xgboost and use the class XGBRegressor to again plot the test error as the number of trees aggregated increase from 1 to 200, for max_depth equal to 2 and 6

SAMPLE ASSIGNMENT
Powered by WordPress