Support Vector Machines (SVMs) can also be applied for multiclass classification tasks using techniques such as one-vs-one or one-vs-all. For the one-vs-one strategy, SVM constructs multiple binary classifiers, each trained to distinguish between pairs of classes. For the one-vs-all strategy, SVM constructs a single classifier for each class, trained to distinguish that class from all other classes.
We use a version of the bfi dataset from class to
predict the level of education by Big-5 personality traits. For the
data, a subset of observations with balanced educational levels is
chosen from the original dataset. The reason is that classifiers often
struggle with imbalanced classes (e.g., majority of
education being 3 in the original data).
For simplicity, we treat education as a categorical
variable here, although it is actually an ordinal variable (i.e., 1 <
2 < 3 < 4 < 5).
Type ?psych::bfi into your console for more information on the
dataset. Note that the Big-5 traits agree,
conscientious, extra, neuro, and
open were created by averaging each participant’s targets
to the five survey items per trait (e.g.,
A1-A5).
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
dat <- read.csv('module2-bfi.csv', header = TRUE)
mlr3 classification task called “tsk” with
education as target and agree and
conscientious as features.dat$education <- factor(dat$education)
library(mlr3verse)
## Lade nötiges Paket: mlr3
tsk <- as_task_classif(education ~ agree + conscientious, data = dat)
set.seed(42)
row_ids <- partition(tsk, ratio = 0.8)
row_ids
## $train
## [1] 2 3 4 5 6 8 9 10 12 14 15 16 17 18 20 21 22 23 24
## [20] 25 26 27 28 29 30 32 33 34 35 36 37 38 39 40 41 42 43 44
## [39] 45 47 48 49 50 51 52 54 55 56 57 58 61 63 65 66 67 68 69
## [58] 70 71 72 74 76 78 79 80 81 83 84 87 88 89 91 92 93 94 95
## [77] 96 97 99 100
##
## $test
## [1] 1 7 11 13 19 31 46 53 59 60 62 64 73 75 77 82 85 86 90 98
##
## $validation
## integer(0)
education with agree and
conscientious as features.mdl = lrn("classif.svm")
mdl$train(tsk, row_ids = row_ids$train)
summary(mdl$model)
##
## Call:
## svm.default(x = data, y = task$truth(), probability = (self$predict_type ==
## "prob"))
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 80
##
## ( 15 19 16 15 15 )
##
##
## Number of Classes: 5
##
## Levels:
## 1 2 3 4 5
autoplot(mdl, task = tsk)
education as target and the full set of Big-5
traits as features.tsk <- as_task_classif(education ~ agree + conscientious + extra + neuro + open, data = dat)
mdl$train(tsk, row_ids = row_ids$train)
summary(mdl$model)
##
## Call:
## svm.default(x = data, y = task$truth(), probability = (self$predict_type ==
## "prob"))
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 80
##
## ( 15 19 16 15 15 )
##
##
## Number of Classes: 5
##
## Levels:
## 1 2 3 4 5
mes <- msrs("classif.ce")
# In-sample performance:
pred <- mdl$predict(tsk, row_ids = row_ids$train)
pred$confusion
## truth
## response 1 2 3 4 5
## 1 9 0 0 1 0
## 2 1 15 3 2 3
## 3 2 2 10 1 2
## 4 2 1 2 10 1
## 5 1 1 1 1 9
pred$score(mes)
## classif.ce
## 0.3375
# Out-of-sample performance:
pred <- mdl$predict(tsk, row_ids = row_ids$test)
pred$confusion
## truth
## response 1 2 3 4 5
## 1 1 0 0 0 0
## 2 3 0 1 1 0
## 3 0 0 1 3 4
## 4 1 0 1 1 0
## 5 0 1 1 0 1
pred$score(mes)
## classif.ce
## 0.8
The in-sample training classification error is typically (much) smaller than the out-of-sample testing classification error due to overfitting the training data. Cross-validation (CV) helps to address this issue by partitioning the data into multiple subsets, allowing the model to be trained and evaluated on different combinations of training and validation sets, providing a more robust estimate of its performance on unseen data.
# 10-fold CV:
set.seed(42)
tsk_cv <- as_task_classif(education ~ agree + conscientious + extra + neuro + open, data = dat[row_ids$train,])
cv <- rsmp("cv", folds = 10)
mdl_cv <- resample(learner = mdl, task = tsk_cv, resampling = cv)
## INFO [15:48:23.739] [mlr3] Applying learner 'classif.svm' on task 'dat[row_ids$train, ]' (iter 1/10)
## INFO [15:48:23.874] [mlr3] Applying learner 'classif.svm' on task 'dat[row_ids$train, ]' (iter 2/10)
## INFO [15:48:23.950] [mlr3] Applying learner 'classif.svm' on task 'dat[row_ids$train, ]' (iter 3/10)
## INFO [15:48:23.994] [mlr3] Applying learner 'classif.svm' on task 'dat[row_ids$train, ]' (iter 4/10)
## INFO [15:48:24.030] [mlr3] Applying learner 'classif.svm' on task 'dat[row_ids$train, ]' (iter 5/10)
## INFO [15:48:24.052] [mlr3] Applying learner 'classif.svm' on task 'dat[row_ids$train, ]' (iter 6/10)
## INFO [15:48:24.076] [mlr3] Applying learner 'classif.svm' on task 'dat[row_ids$train, ]' (iter 7/10)
## INFO [15:48:24.097] [mlr3] Applying learner 'classif.svm' on task 'dat[row_ids$train, ]' (iter 8/10)
## INFO [15:48:24.122] [mlr3] Applying learner 'classif.svm' on task 'dat[row_ids$train, ]' (iter 9/10)
## INFO [15:48:24.147] [mlr3] Applying learner 'classif.svm' on task 'dat[row_ids$train, ]' (iter 10/10)
mdl_cv$aggregate(mes)
## classif.ce
## 0.7875
The classification error derived from cross-validation should be much closer to the actual out-of-sample classification error (e.g., as observed in task 7).
cost)
from the set (1, 10, 50, 100) based on the full dataset.
Also investigate the final (best) SVM by printing the summary of the
model. (Hint: Set the seed to ensure reproducibility of your
results)set.seed(42)
# Define set of cost parameter values to be tested
C_cv <- c(1, 10, 50, 100)
# Set up the conditions for the hyperparameter tuning
mdl_cv = auto_tuner(
learner = lrn("classif.svm", type = 'C-classification', cost = to_tune(levels = C_cv)),
resampling = rsmp("cv", folds = 10),
measure = msr("classif.ce"),
tuner = tnr("grid_search"),
terminator = trm("none")
)
# Actually tune the hyperparameter (i.e., cp) and fit the final model
invisible({capture.output({ #remove console output from html document
mdl_cv$train(tsk)
})})
# Print the output of the tuning
mdl_cv$archive %>%
as.data.table() %>%
select(cost, classif.ce) %>%
arrange(as.numeric(cost))
mdl_cv$tuning_result
# Final model:
summary(mdl_cv$learner$model)
##
## Call:
## svm.default(x = data, y = task$truth(), type = "C-classification",
## cost = 50, probability = (self$predict_type == "prob"))
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 50
##
## Number of Support Vectors: 97
##
## ( 20 20 20 17 20 )
##
##
## Number of Classes: 5
##
## Levels:
## 1 2 3 4 5
Note that it is not possible (in mlr3; and quite complex
in general) to plot classifiers using more than two features. Therefore,
we cannot plot the classification surface of the final (best) SVM model
here.