Support Vector Machines
Support Vector Machines (SVMs) can also be applied for multiclass
classification tasks through techniques such as one-vs-one or
one-vs-all. In the one-vs-one strategy, SVM constructs multiple binary
classifiers, each trained to distinguish between pairs of classes. In
the one-vs-all strategy, SVM constructs a single classifier for each
class, which is trained to distinguish that class from all other
classes.
Description of data set
We use a version of the bfi dataset from class to
predict the level of education by Big-5 personality traits. For the
data, a subset of observations is chosen from the original dataset where
educational levels are balanced. The reason is that classifiers often
struggle with imbalanced classes (e.g., majority of
education being 3).
For simplicity, we treat education as a categorical
variable here, although it is actually an ordinal variable (i.e., 1 <
2 < 3 < 4 < 5).
Type ?psych::bfi into your console for more information on the
dataset. Note that the Big-5 triats agree,
conscientious, extra, neuro, and
open were created by averaging each participant’s targets
to the five survey items per trait (e.g.,
A1-A5).
Tasks
- Read the data file modeul2-bfi.csv into R (assign it to a variable
called “dat”).
- Transform the education variable to a factor and assign the data set
“dat” to a
mlr3 classification task called “tsk” with
education as target and agree and
conscientious as features.
- Randomly separate the dataset into 80% training and 20% testing data
(Hint: Set the seed to ensure reproducibility of your results).
- Use the training sample to build a SVM (with default settings) to
predict the target
education with agree and
conscientious as features.
- Visualize the classifier for agreeableness on the x-axis and
conscientiousness on the y-axis.
- Now use the training sample to build a SVM (with default settings)
for
education as target and all Big-5 traits as
features.
- Predict the educational levels of the observations in the training
sample as well as in the held-out test sample. Also calculate the the
in-sample training classification error and compare it to the
out-of-sample testing classification error. Why is the former likely
(much) smaller than the latter?
- Assess the expected out-of-sample performance of your learner from
task 6 using 10-fold cross-validation (CV). Does CV improve the
prediction of your model’s out-of-sample classification performance?
(Hint: Set the seed to ensure reproducibility of your results)
- Bonus: Using 10-fold cross-validation, choose a value for the tuning
parameter \(C\) (
cost)
from the set (1, 10, 50, 100). (Hint: Set the seed to
ensure reproducibility of your results)
LS0tDQp0aXRsZTogIk1vZHVsZSAyOiBUdXRvcmlhbDogU3VwcG9ydCBWZWN0b3IgTWFjaGluZXMiDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCmVkaXRvcl9vcHRpb25zOiANCiAgY2h1bmtfb3V0cHV0X3R5cGU6IGlubGluZQ0KLS0tDQoNCiMgU3VwcG9ydCBWZWN0b3IgTWFjaGluZXMNCg0KU3VwcG9ydCBWZWN0b3IgTWFjaGluZXMgKFNWTXMpIGNhbiBhbHNvIGJlIGFwcGxpZWQgZm9yIG11bHRpY2xhc3MgY2xhc3NpZmljYXRpb24gdGFza3MgdGhyb3VnaCB0ZWNobmlxdWVzIHN1Y2ggYXMgb25lLXZzLW9uZSBvciBvbmUtdnMtYWxsLiBJbiB0aGUgb25lLXZzLW9uZSBzdHJhdGVneSwgU1ZNIGNvbnN0cnVjdHMgbXVsdGlwbGUgYmluYXJ5IGNsYXNzaWZpZXJzLCBlYWNoIHRyYWluZWQgdG8gZGlzdGluZ3Vpc2ggYmV0d2VlbiBwYWlycyBvZiBjbGFzc2VzLiBJbiB0aGUgb25lLXZzLWFsbCBzdHJhdGVneSwgU1ZNIGNvbnN0cnVjdHMgYSBzaW5nbGUgY2xhc3NpZmllciBmb3IgZWFjaCBjbGFzcywgd2hpY2ggaXMgdHJhaW5lZCB0byBkaXN0aW5ndWlzaCB0aGF0IGNsYXNzIGZyb20gYWxsIG90aGVyIGNsYXNzZXMuDQoNCiMjIERlc2NyaXB0aW9uIG9mIGRhdGEgc2V0DQoNCldlIHVzZSBhIHZlcnNpb24gb2YgdGhlIGBiZmlgIGRhdGFzZXQgZnJvbSBjbGFzcyB0byBwcmVkaWN0IHRoZSBsZXZlbCBvZiBlZHVjYXRpb24gYnkgQmlnLTUgcGVyc29uYWxpdHkgdHJhaXRzLiBGb3IgdGhlIGRhdGEsIGEgc3Vic2V0IG9mIG9ic2VydmF0aW9ucyBpcyBjaG9zZW4gZnJvbSB0aGUgb3JpZ2luYWwgZGF0YXNldCB3aGVyZSBlZHVjYXRpb25hbCBsZXZlbHMgYXJlIGJhbGFuY2VkLiBUaGUgcmVhc29uIGlzIHRoYXQgY2xhc3NpZmllcnMgb2Z0ZW4gc3RydWdnbGUgd2l0aCBpbWJhbGFuY2VkIGNsYXNzZXMgKGUuZy4sIG1ham9yaXR5IG9mIGBlZHVjYXRpb25gIGJlaW5nIDMpLg0KDQpGb3Igc2ltcGxpY2l0eSwgd2UgdHJlYXQgYGVkdWNhdGlvbmAgYXMgYSBjYXRlZ29yaWNhbCB2YXJpYWJsZSBoZXJlLCBhbHRob3VnaCBpdCBpcyBhY3R1YWxseSBhbiBvcmRpbmFsIHZhcmlhYmxlIChpLmUuLCAxIFw8IDIgXDwgMyBcPCA0IFw8IDUpLg0KDQpUeXBlID9wc3ljaDo6YmZpIGludG8geW91ciBjb25zb2xlIGZvciBtb3JlIGluZm9ybWF0aW9uIG9uIHRoZSBkYXRhc2V0LiBOb3RlIHRoYXQgdGhlIEJpZy01IHRyaWF0cyBgYWdyZWVgLCBgY29uc2NpZW50aW91c2AsIGBleHRyYWAsIGBuZXVyb2AsIGFuZCBgb3BlbmAgd2VyZSBjcmVhdGVkIGJ5IGF2ZXJhZ2luZyBlYWNoIHBhcnRpY2lwYW50J3MgdGFyZ2V0cyB0byB0aGUgZml2ZSBzdXJ2ZXkgaXRlbXMgcGVyIHRyYWl0IChlLmcuLCBgQTFgLWBBNWApLg0KDQojIyBUYXNrcw0KDQoxLiAgUmVhZCB0aGUgZGF0YSBmaWxlIG1vZGV1bDItYmZpLmNzdiBpbnRvIFIgKGFzc2lnbiBpdCB0byBhIHZhcmlhYmxlIGNhbGxlZCAiZGF0IikuDQoNCmBgYHtyfQ0KDQpgYGANCg0KMi4gIFRyYW5zZm9ybSB0aGUgZWR1Y2F0aW9uIHZhcmlhYmxlIHRvIGEgZmFjdG9yIGFuZCBhc3NpZ24gdGhlIGRhdGEgc2V0ICJkYXQiIHRvIGEgYG1scjNgIGNsYXNzaWZpY2F0aW9uIHRhc2sgY2FsbGVkICJ0c2siIHdpdGggYGVkdWNhdGlvbmAgYXMgdGFyZ2V0IGFuZCBgYWdyZWVgIGFuZCBgY29uc2NpZW50aW91c2AgYXMgZmVhdHVyZXMuDQoNCmBgYHtyfQ0KDQpgYGANCg0KMy4gIFJhbmRvbWx5IHNlcGFyYXRlIHRoZSBkYXRhc2V0IGludG8gODAlIHRyYWluaW5nIGFuZCAyMCUgdGVzdGluZyBkYXRhIChIaW50OiBTZXQgdGhlIHNlZWQgdG8gZW5zdXJlIHJlcHJvZHVjaWJpbGl0eSBvZiB5b3VyIHJlc3VsdHMpLg0KDQpgYGB7cn0NCg0KYGBgDQoNCjQuICBVc2UgdGhlIHRyYWluaW5nIHNhbXBsZSB0byBidWlsZCBhIFNWTSAod2l0aCBkZWZhdWx0IHNldHRpbmdzKSB0byBwcmVkaWN0IHRoZSB0YXJnZXQgYGVkdWNhdGlvbmAgd2l0aCBgYWdyZWVgIGFuZCBgY29uc2NpZW50aW91c2AgYXMgZmVhdHVyZXMuDQoNCmBgYHtyfQ0KDQpgYGANCg0KNS4gIFZpc3VhbGl6ZSB0aGUgY2xhc3NpZmllciBmb3IgYWdyZWVhYmxlbmVzcyBvbiB0aGUgeC1heGlzIGFuZCBjb25zY2llbnRpb3VzbmVzcyBvbiB0aGUgeS1heGlzLg0KDQpgYGB7ciBvdXQud2lkdGg9IjUwJSIsIGZpZy5hbGlnbj0nY2VudGVyJ30NCg0KYGBgDQoNCjYuICBOb3cgdXNlIHRoZSB0cmFpbmluZyBzYW1wbGUgdG8gYnVpbGQgYSBTVk0gKHdpdGggZGVmYXVsdCBzZXR0aW5ncykgZm9yIGBlZHVjYXRpb25gIGFzIHRhcmdldCBhbmQgYWxsIEJpZy01IHRyYWl0cyBhcyBmZWF0dXJlcy4NCg0KYGBge3J9DQoNCmBgYA0KDQo3LiAgUHJlZGljdCB0aGUgZWR1Y2F0aW9uYWwgbGV2ZWxzIG9mIHRoZSBvYnNlcnZhdGlvbnMgaW4gdGhlIHRyYWluaW5nIHNhbXBsZSBhcyB3ZWxsIGFzIGluIHRoZSBoZWxkLW91dCB0ZXN0IHNhbXBsZS4gQWxzbyBjYWxjdWxhdGUgdGhlIHRoZSBpbi1zYW1wbGUgdHJhaW5pbmcgY2xhc3NpZmljYXRpb24gZXJyb3IgYW5kIGNvbXBhcmUgaXQgdG8gdGhlIG91dC1vZi1zYW1wbGUgdGVzdGluZyBjbGFzc2lmaWNhdGlvbiBlcnJvci4gV2h5IGlzIHRoZSBmb3JtZXIgbGlrZWx5IChtdWNoKSBzbWFsbGVyIHRoYW4gdGhlIGxhdHRlcj8NCg0KYGBge3J9DQoNCmBgYA0KDQo4LiAgQXNzZXNzIHRoZSBleHBlY3RlZCBvdXQtb2Ytc2FtcGxlIHBlcmZvcm1hbmNlIG9mIHlvdXIgbGVhcm5lciBmcm9tIHRhc2sgNiB1c2luZyAxMC1mb2xkIGNyb3NzLXZhbGlkYXRpb24gKENWKS4gRG9lcyBDViBpbXByb3ZlIHRoZSBwcmVkaWN0aW9uIG9mIHlvdXIgbW9kZWwncyBvdXQtb2Ytc2FtcGxlIGNsYXNzaWZpY2F0aW9uIHBlcmZvcm1hbmNlPyAoSGludDogU2V0IHRoZSBzZWVkIHRvIGVuc3VyZSByZXByb2R1Y2liaWxpdHkgb2YgeW91ciByZXN1bHRzKQ0KDQpgYGB7cn0NCg0KYGBgDQoNCjkuICBCb251czogVXNpbmcgMTAtZm9sZCBjcm9zcy12YWxpZGF0aW9uLCBjaG9vc2UgYSB2YWx1ZSBmb3IgdGhlIHR1bmluZyBwYXJhbWV0ZXIgJEMkIChgY29zdGApIGZyb20gdGhlIHNldCBgKDEsIDEwLCA1MCwgMTAwKWAuIChIaW50OiBTZXQgdGhlIHNlZWQgdG8gZW5zdXJlIHJlcHJvZHVjaWJpbGl0eSBvZiB5b3VyIHJlc3VsdHMpDQoNCmBgYHtyfQ0KDQpgYGANCg==