Similar to Support Vector Machines (SVMs), trees are very good in multiclass classification. Essentially, however, there is no need for special techniques, such as one-vs-one or one-vs-all for SVMs, to handle multiclass problems. Instead, the majority voting procedure used to assign classes to terminal nodes implies a kind of one-vs-all strategy by default.
As in the SVM tutorial, we will use the bfi dataset to
predict level of education by the Big-5 personality traits. However,
here we do not select a subset of observations that has balanced
educational levels. The reason is that trees are much better in handling
unbalanced data, as we will see below.
For simplicity, we treat education as a categorical
variable here, although it is actually an ordinal variable (i.e., 1 <
2 < 3 < 4 < 5).
Type ?psych::bfi into your console for more information on the
dataset. Note that the Big-5 triats agree,
conscientious, extra, neuro, and
open were created by averaging each participant’s targets
to the five survey items per trait (e.g.,
A1-A5).
Read the data file module2-bfi-imbalanced.csv into R (assign it to a variable called “dat”).
Transform all discrete variables to factors for the tree algorithm to work properly.
Build a tree model to predict the target education
by all features. Make sure to set the learner’s keep_model
argument to TRUE, which is needed for task 4. (Hint: Avoid including the
identifier CASE in the feature set; Hint: Set the seed to
ensure reproducibility of your results, e.g., if your model has to
randomly break ties)
Visualize your result from task 3 as a binary decision tree.
Prune your tree from task 3 by means of 10-fold cross-validation.
That is, choose the complexity penalty parameter cp
(between 0 and 0.05 in steps of 0.01) to remove unnecessary terminal
nodes and reduce overfitting. (Hint: Set the seed to ensure
reproducibility of your results)
Visualize the final result (i.e., best model) of your tuning from task 5 as a tree. Would your pruned tree be able to predict all available class labels? In other words, are there any educational levels for which no combination of features would result in the tree making a corresponding prediction?
Because of the instability of a single tree, build an ensamble of
trees using the random forest approach and default tuning parameter
settings. Make sure to set the learner’s importance
argument to “permutation”, which is needed for task 8. (Hint: Set the
seed to ensure reproducibility of your results)
Plot the feature importance of all features used in your random forest from task 7.
Build a random forest and tune the hyperparameters
num.trees from 500 to 1500 in steps of 500 and
mtry from 2 to 5 in steps of 1. Again make sure to set the
learner’s importance argument to “permutation”, which is
needed for task 10. (Hint: Set the seed to ensure reproducibility of
your results)
Bonus: Plot the feature importance of the CV-tuned random forest from task 9 and compare the ranking to the feature importance plot of the untuned random forest fit with default hyperparameter settings from task 7. Are there any substantial differences between the two plots? Which ranking is more reliable?