Clustering

Description of data set

In this tutorial, we will use the dataset from Wulff et al. (2023), which is part of the moursetrap package. The dataset, as prepared in the following code chunk, contains the mouse movement trajectories of participants in a two-options forced-choice paradigm. The trajectories are normalized using the mt_length_normalize() function from the moursetrap package so that all trajectories consist of 50 points (default is 20) in a 2D space.

library(mousetrap)
## Warning: package 'mousetrap' was built under R version 4.4.2
## Welcome to mousetrap 3.2.3!
## Summary of recent changes: http://pascalkieslich.github.io/mousetrap/news/
## Forum for questions: https://forum.cogsci.nl/index.php?p=/categories/mousetrap
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
dat <- data(KH2017)

# Preprocess trajectory data
dat <- KH2017 %>% mt_length_normalize(n_points = 50)
dat <- dat$ln_trajectories
dat[1:5,1:5,'xpos']; dat[1:5,1:5,'ypos'] #examles
##        [,1]      [,2]       [,3]       [,4]       [,5]
## id0001    0 -18.06069 -38.967198 -57.753756 -76.540313
## id0002    0 -15.60052 -32.227659 -48.262305 -64.296952
## id0003    0 -10.02617 -17.001541 -21.124366 -23.588654
## id0004    0 -20.06305 -36.929651 -52.368759 -65.752596
## id0005    0   0.00000   1.080633   3.535698   5.587012
##        [,1]      [,2]      [,3]      [,4]      [,5]
## id0001    0  7.695611 13.135988  23.98456  34.83314
## id0002    0 11.244849 24.194179  37.87079  51.54740
## id0003    0 16.565420 39.008474  62.18148  85.59222
## id0004    0 -2.531525  6.049725  20.71095  37.44075
## id0005    0 40.034589 80.048229 119.99954 159.97921
dat2 <- data.frame(cbind(dat[,,'xpos'], dat[,,'ypos']))

We can use the mt_heatmap() function from the moursetrap package to visualize the trajectories. The resulting plot contains 1064 mouse movement trajectories of participants. In this tutorial, we want to cluster these trajectories to make sense of this kind of data (i.e., shed light on the processes of information integration and preference formation; Wulff et al., 2023).

mt_heatmap(dat, colors = c('white', 'black'), verbose = FALSE)

Tasks

  1. Cluster the trajectories from dat2 (i.e., treating the x- and y-coordinates as features) into 5 clusters by means of agglomerative hierarchical clustering using the agnes algorithm via mlr3.
k = 5

library(mlr3verse)
## Loading required package: mlr3
tsk = as_task_clust(dat2)

set.seed(42)
mdl = lrn("clust.agnes", stand = T, k = k)
mdl$train(tsk)
  1. Predict the cluster of each individual trajectory using the model from task 1 and use the predictions to calculate a frequency table for the relative proportions of instances in each cluster.
pred <- mdl$predict(tsk)
## Warning: Learner 'clust.agnes' doesn't predict on new data and predictions may
## not make sense on new data.
round(table(pred$partition)/length(pred$partition) * 100, 2)
## 
##     1     2     3     4     5 
## 95.68  0.38  3.10  0.75  0.09
  1. Redo task 1 and cluster the trajectories into 5 clusters using agglomerative hierarchical clustering and Ward’s method (i.e., set the learner’s method argument to “ward”). Also predict the clusters using the new model. Did the relative frequencies improve (in terms of a more balanced clustering)?
# Clustering
set.seed(42)
mdl = lrn("clust.agnes", stand = T, k = k, method = "ward")
mdl$train(tsk)

# Predictions
pred <- mdl$predict(tsk)
## Warning: Learner 'clust.agnes' doesn't predict on new data and predictions may
## not make sense on new data.
round(table(pred$partition)/length(pred$partition) * 100, 2)
## 
##     1     2     3     4     5 
## 26.88 21.33 30.26 17.95  3.57

The resulting clustering is more balanced and more closely resembles the frequency proportions as reported in Wulff et al. (2023, Figure 8; see also the slides for module 3). However, to reassure that our clustering distinguishes between the same movement patterns, we also have to visualize the clusters (see next task).

  1. Plot the trajectories according to the clustering from task 4. For instance, use a for-loop and the mt_heatmap() function from above to produce a separate heatmap for each cluster of movement trajectories.
par(mfrow = c(1,5))
for (j in 1:k) {
  cat(mt_heatmap(dat[pred$partition == j,,], colors = c('white', 'black'), verbose = FALSE))
}

It appears that our clustering distinguishes similar choice strategies as found in Wulff et al. (2023, Figure 8). However, not all five prototypes (straight, curved, cCoM = continuous change of mind, dCoM = discrete change of mind, and dCoM2 = discrete double change of mind) are unambiguously recovered.

  1. Redo task 1 and cluster the trajectories into 5 clusters using partitional (i.e., \(k\)-means) clustering. Then, redo task 2 and predict the clusters using the new model.
# Clustering
set.seed(42)
mdl = lrn("clust.kmeans", centers = k)
mdl$train(tsk)

# Predictions
pred <- mdl$predict(tsk)
round(table(pred$partition)/length(pred$partition) * 100, 2)
## 
##     1     2     3     4     5 
##  4.04 39.38 12.88 10.81 32.89
  1. Compare the results from task 5 to the results with the hierarchical clustering by redoing task 4, that is, visualize the trajectories of the partitional clustering. Do you spot any major (performance) differences?
par(mfrow = c(1,5))
for (j in 1:k) {
  cat(mt_heatmap(dat[pred$partition == j,,], colors = c('white', 'black'), verbose = FALSE))
}

Partitional clustering seems to perform slightly better in discriminating between all five prototype strategies as originally reported in Wulff et al. (2023; i.e., straight, curved, cCoM = continuous change of mind, dCoM = discrete change of mind, and dCoM2 = discrete double change of mind). This can also be confirmed by plotting the prototypes of each cluster (see next task).

  1. Bonus: Also plot the prototypes of your \(k\)-means clustering solution.
df <- array(NA, c(2, 50, 2), dimnames = list(NULL, NULL, c('xpos', 'ypos'))) #auxiliary data frame to submit the prototypes in the format needed for plotting with mt_heatmap()

par(mfrow = c(1,5))
for (j in 1:k) {
  df[1,,'xpos'] <- mdl$model$centers[j,paste0('X',1:50)]
  df[1,,'ypos'] <- mdl$model$centers[j,paste0('X',51:100)]
  
  mt_heatmap(df, colors = c('white', 'blue'), verbose = FALSE)
}

The prototypes are indeed very similar to the original prototypes reported in Wulff et al. (2023).

  1. Bonus: Prior to performing the clustering, do a principal component analysis (PCA). Then, redo task 6 (i.e., compare agglomerative hierarchicalc lustering using Ward’s method and \(k\)-means partitional clustering) with both clusterings specified using only the 5 principal components (PCs) as features that explain the highest amount of variance in the data. (Hint: You can select a subset of PCs that should be used for the modeling by adding a filter pipeline operation after the pca pipeline operation. Filter for “variance” using the flt() function and set a corresponding fraction for clustering according to the PCs that explain the highest amount of variance in the data)

Agnes clustering:

# Clustering
set.seed(42)
mdl <- as_learner(po("pca") %>>%
                    po("filter", filter = flt("variance"), filter.frac = 0.05) %>>%
                    lrn('clust.agnes', stand = T, k = k, method = "ward"))
mdl$train(tsk)

# Which PCs were actually used as features?
mdl$model$variance$features
## [1] "PC1" "PC2" "PC3" "PC4" "PC5"
# Predictions
pred <- mdl$predict(tsk)
## Warning: Learner 'clust.agnes' doesn't predict on new data and predictions may not make sense on new data.
## This happened PipeOp clust.agnes's $predict()
round(table(pred$partition)/length(pred$partition) * 100, 2)
## 
##     1     2     3     4     5 
## 54.51 24.44 15.04  5.26  0.75
# Plotting
par(mfrow = c(1,5))
for (j in 1:k) {
  cat(mt_heatmap(dat[pred$partition == j,,], colors = c('white', 'black'), verbose = FALSE))
}

\(k\)-means clustering:

# Clustering
set.seed(42)
mdl <- as_learner(po("pca") %>>% 
                    po("filter", filter = flt("variance"), filter.frac = 0.05) %>>%
                    lrn('clust.kmeans', centers = k))
mdl$train(tsk)
 
# Which PCs were actually used as features?
mdl$model$variance$features
## [1] "PC1" "PC2" "PC3" "PC4" "PC5"
# Predictions
pred <- mdl$predict(tsk)
round(table(pred$partition)/length(pred$partition) * 100, 2)
## 
##     1     2     3     4     5 
## 39.29 32.99 10.81 12.88  4.04
# Plotting
par(mfrow = c(1,5))
for (j in 1:k) {
  cat(mt_heatmap(dat[pred$partition == j,,], colors = c('white', 'black'), verbose = FALSE))
}

Hierarchical clustering performs worse with PCA compared to using the information contained in the original/full dataset. In contrast, for partitional \(k\)-means clustering, it seems that all the necessary information to discriminate between the five original prototype strategies is contained in the first five PCs. In other words, the partitional clustering works equally well with reduced information.