This script shows how use principal component analysis to extract a single component from multiple codings of a same item. In this case, our running example will be annotations of the emotions generated by a set of images.
First, we read the data, which contains 20 annotations along a series of emotions for a total of 24 images.
codings <- read.csv("../data/image-codings.csv", stringsAsFactors = F)
head(codings, n=6)
## image afraid angry delighted disgusted happy joyful nervous prideful sad
## 1 a1 1 1 1 1 1 1 1 1 1
## 2 a1 4 3 2 1 2 2 4 3 4
## 3 a1 1 3 1 2 1 1 1 1 3
## 4 a1 3 3 2 3 2 3 4 3 2
## 5 a1 1 1 1 1 2 1 3 1 2
## 6 a1 1 1 1 1 1 1 1 1 1
## scared surprised threatened
## 1 1 1 1
## 2 4 4 3
## 3 1 1 1
## 4 3 3 3
## 5 1 3 1
## 6 1 4 1
For the purposes of simplifying the analysis, we will now collapse all annotations for each image into a single average by emotion:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
agg <- codings %>% group_by(image) %>%
summarise_all(mean)
This is how you can visualize the data using a heatmap:
mat <- agg %>%
select(-image) %>% # keep only the numeric variables
data.matrix() # convert to matrix
heatmap(t(mat)) # transpose to help visualization
Now let’s run PCA:
pca <- princomp(mat)
pca
## Call:
## princomp(x = mat)
##
## Standard deviations:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
## 2.49326851 1.16157294 0.62108208 0.25598054 0.16138833 0.14581017 0.10794020
## Comp.8 Comp.9 Comp.10 Comp.11 Comp.12
## 0.10451300 0.09274150 0.08568929 0.05998475 0.05296294
##
## 12 variables and 24 observations.
Exploring the loadings for the first two components:
# loading for first two components
round(pca$loadings[,1:2],2)
## Comp.1 Comp.2
## afraid 0.32 0.22
## angry 0.28 0.18
## delighted -0.30 0.45
## disgusted 0.26 0.18
## happy -0.34 0.48
## joyful -0.32 0.46
## nervous 0.33 0.21
## prideful -0.10 0.07
## sad 0.27 0.01
## scared 0.35 0.24
## surprised 0.07 0.16
## threatened 0.36 0.33
Using a scree plot to explore how many dimensions to keep?
screeplot(pca,
main="Screeplot: Relative importance of\neach different PC",
las=2)