This script shows how use principal component analysis to extract a single component from multiple codings of a same item. In this case, our running example will be annotations of the emotions generated by a set of images.

First, we read the data, which contains 20 annotations along a series of emotions for a total of 24 images.

codings <- read.csv("../data/image-codings.csv", stringsAsFactors = F)
head(codings, n=6)
##   image afraid angry delighted disgusted happy joyful nervous prideful sad
## 1    a1      1     1         1         1     1      1       1        1   1
## 2    a1      4     3         2         1     2      2       4        3   4
## 3    a1      1     3         1         2     1      1       1        1   3
## 4    a1      3     3         2         3     2      3       4        3   2
## 5    a1      1     1         1         1     2      1       3        1   2
## 6    a1      1     1         1         1     1      1       1        1   1
##   scared surprised threatened
## 1      1         1          1
## 2      4         4          3
## 3      1         1          1
## 4      3         3          3
## 5      1         3          1
## 6      1         4          1

For the purposes of simplifying the analysis, we will now collapse all annotations for each image into a single average by emotion:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
agg <- codings %>% group_by(image) %>%
  summarise_all(mean)

This is how you can visualize the data using a heatmap:

mat <- agg %>% 
  select(-image) %>%  # keep only the numeric variables
  data.matrix() # convert to matrix
heatmap(t(mat)) # transpose to help visualization

Now let’s run PCA:

pca <- princomp(mat)
pca
## Call:
## princomp(x = mat)
## 
## Standard deviations:
##     Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6     Comp.7 
## 2.49326851 1.16157294 0.62108208 0.25598054 0.16138833 0.14581017 0.10794020 
##     Comp.8     Comp.9    Comp.10    Comp.11    Comp.12 
## 0.10451300 0.09274150 0.08568929 0.05998475 0.05296294 
## 
##  12  variables and  24 observations.

Exploring the loadings for the first two components:

# loading for first two components
round(pca$loadings[,1:2],2)
##            Comp.1 Comp.2
## afraid       0.32   0.22
## angry        0.28   0.18
## delighted   -0.30   0.45
## disgusted    0.26   0.18
## happy       -0.34   0.48
## joyful      -0.32   0.46
## nervous      0.33   0.21
## prideful    -0.10   0.07
## sad          0.27   0.01
## scared       0.35   0.24
## surprised    0.07   0.16
## threatened   0.36   0.33

Using a scree plot to explore how many dimensions to keep?

screeplot(pca, 
          main="Screeplot: Relative importance of\neach different PC",
          las=2)