In this tutorial, we will use the dataset from Wulff et al. (2023),
which is part of the moursetrap package. The dataset, as
prepared in the following code chunk, contains the mouse movement
trajectories of participants in a two-options forced-choice paradigm.
The trajectories are normalized using the
mt_length_normalize() function from the
moursetrap package so that all trajectories consist of 50
points (default is 20) in a 2D space.
library(mousetrap)
library(tidyverse)
dat <- data(KH2017)
# Preprocess trajectory data
dat <- KH2017 %>% mt_length_normalize(n_points = 50)
dat <- dat$ln_trajectories
dat[1:5,1:5,'xpos']; dat[1:5,1:5,'ypos'] #examles
[,1] [,2] [,3] [,4] [,5]
id0001 0 -18.06069 -38.967198 -57.753756 -76.540313
id0002 0 -15.60052 -32.227659 -48.262305 -64.296952
id0003 0 -10.02617 -17.001541 -21.124366 -23.588654
id0004 0 -20.06305 -36.929651 -52.368759 -65.752596
id0005 0 0.00000 1.080633 3.535698 5.587012
[,1] [,2] [,3] [,4] [,5]
id0001 0 7.695611 13.135988 23.98456 34.83314
id0002 0 11.244849 24.194179 37.87079 51.54740
id0003 0 16.565420 39.008474 62.18148 85.59222
id0004 0 -2.531525 6.049725 20.71095 37.44075
id0005 0 40.034589 80.048229 119.99954 159.97921
dat2 <- data.frame(cbind(dat[,,'xpos'], dat[,,'ypos']))
We can use the mt_heatmap() function from the
moursetrap package to visualize the trajectories. The
resulting plot contains 1064 mouse movement trajectories of
participants. In this tutorial, we want to cluster these trajectories to
make sense of this kind of data (i.e., shed light on the processes of
information integration and preference formation; Wulff et al.,
2023).
mt_heatmap(dat, colors = c('white', 'black'), verbose = FALSE)
dat2 (i.e., treating the
x- and y-coordinates as features) into 5 clusters by means of
agglomerative hierarchical clustering using the agnes
algorithm.for-loop and the mt_heatmap()
function from above to produce a separate heatmap for each cluster of
movement trajectories.filter pipeline operation after the
pca pipeline operation. You can filter for “variance” using
the flt() function and set a corresponding fraction for
using only these PCs for the clustering that explain the highest amount
of variance in the data)Agnes clustering:
\(k\)-means clustering: