Goal: See practical examples of using tidyverse packages and user-defined functions to analyze and visualize a psycholinguistic experiment.

Prerequisites: You should be familiar with the tidyverse package dplyr and psycholinguistic experiment analysis.

1 Introduction

This document uses tidyverse packages to analyze and visualize data from a real psycholinguistic experiment, the eye tracking while reading experiment RC. This guide also introduces user-defined functions, and provides examples of defining functions to facilitate data analysis and visualization.

2 `RC`

RC is an eye tracking while reading experiment conducted at the UCLA Language Processing Lab by Angelica Pan and supervised by Jesse Harris. Data was collected by an SR Eyelink 1000 Tower Mount and preprocessed with EyeDry.

2.1 Eye tracking while reading

In an eye tracking while reading experiment, subjects read text that is displayed on a computer screen. As they read, a high speed camera records their eye movements. Eye tracking while reading is a popular method for studying sentence processing because certain eye movement patterns, such as long fixations on difficult sentence structures, have been linked to cognitive effort.

To learn more about eye tracking, read Eye Movements in Reading Words and Sentences (2007) by Charles Clifton Jr., Adrian Staub, and Keith Rayner.

2.2 Experimental Design

RC has a 2x2 latin square design, crossing relative clause matrix position (pos) with relative clause extraction type (ext). This experiment looked at the effects of pos and ext on relative clause processing difficulty.

Factors and levels:

pos: matrix subject (SUB) or matrix object (OBJ)
ext: subject-extracted (SRC) or object-extracted (ORC)

Conditions:

SUB-SRC: subject-extracted relative clause in matrix subject position
SUB-ORC: subject-extracted relative clause in matrix object position
OBJ-SRC: object-extracted relative clause in matrix subject position
OBJ-ORC: object-extracted relative clause in matrix object position

2.3 Experimental items

Experimental items are sentence quartets:

SUB-SRC: Unsurprisingly, the diplomat who attacked the senator earlier this afternoon avoided the reporter at the charity event.
SUB-ORC: Unsurprisingly, the diplomat who the senator attacked earlier this afternoon avoided the reporter at the charity event.
OBJ-SRC: Unsurprisingly, the diplomat avoided the reporter who attacked the senator earlier this afternoon at the charity event.
OBJ-ORC: Unsurprisingly, the diplomat avoided the reporter who the senator attacked earlier this afternoon at the charity event.

2.4 Regioning

During the experiment, subjects read experimental items as a single line of unbroken text displayed on a computer monitor.

Items are split into 7 regions for analysis:

intro: introductory region
msub: matrix subject region
RC: relative clause region (also called “region of interest”, or “critical region”")
spillover: spillover region
mverb: matrix verb region
mobj: matrix object region
final: final region

The linear order of the regions differs across the SUB and OBJ levels.

Region order for SUB items:

intro¹ | msub² | RC³ | spillover⁴ | mverb⁵ | mobj⁶ | final⁷

Region order for OBJ items:

intro¹ | msub² | mverb³ | mobj⁴ | RC⁵ | spillover⁶ | final⁷

2.5 Preprocessing

EyeDry is a program that preprocesses eye tracking data. It extracts eye movement metrics from raw eye tracking data into <metric>.ixs files, which are tabular data that can be read into a R data frame.

Eye tracking metrics mentioned in this guide:

first pass times (fp):the sum of all fixations in a region until the first fixation to the region’s left or right
right-bounded times (rb): the sum of all fixations in a region until the first fixation to the region’s right
second pass times (sp): the sum of all fixations in a region after the initial first pass fixations

EyeDry also detects if and where a subject blinked while reading a sentence during an experimental trial. Trials with too many blinks on a specified critical region are excluded because no eye movement information can be recorded when a subject blinks.

The critical region in EyeDry is specified by linear order. However, in RC the critical RC region is the linearly 3rd region for SUB items, but the linarly 5th region for OBJ items.

SUB items: Unsurprisingly, | the diplomat | who attacked the senator³ | avoided | the reporter | …
OBJ items: Unsurprisingly, | the diplomat | avoided | the reporter | who attacked the senator⁵ | …

To solve this issue, the RC data was split into two files, one with data from the SUB item trials and one with data from the OBJ item trials. EyeDry was run separately on each file.

In the RC experiment, the data for each eye movement metric is therefore contained in two files:

<metric>-rcs.ixs: eye movement metric data from SUB item trials
<metric>-rco.ixs: eye movement metric data from OBJ item trials.

During analysis, the corresponding <metric>-rcs.ixs and <metric>-rco.ixs files for each metric must be joined for a complete set of values.

2.6 First pass times

A first pass time is the sum of all fixations made in a given region from the first time the point of fixation enters the region until the the first time the point of fixation leaves the region. To learn more about first pass times, read Eye Movements in Reading Words and Sentences.

First pass times are an “early” eye tracking metric, meaning that first pass fixations are associated with initial sentence processing processes.

Most examples in this guide use first pass times data.

3 Data manipulation

This section uses dplyr functions to manipulate first pass time time data from the RC experiment. For a review on dplyr, read the Tidyverse Data Manipulation Quickstart.

3.1 `.ixs` files

The first pass (fp) time data is contained within fp-rcs.ixs and fp-rco.ixs. They are comma-delimited files in which the first row contains the column names.

The full .ixs files that appear in this guide are currently not available to the public.

First 8 lines of fp-rcs.ixs:

seq,subj,item,cond,region,datum
26,1,1,5,1,318
26,1,1,5,2,311
26,1,1,5,3,488
26,1,1,5,4,350
26,1,1,5,5,171
26,1,1,5,6,356
26,1,1,5,7,696

First 8 lines of fp-rco.ixs:

seq,subj,item,cond,region,datum
133,1,3,7,1,323
133,1,3,7,2,417
133,1,3,7,3,
133,1,3,7,4,216
133,1,3,7,5,527
133,1,3,7,6,218
133,1,3,7,7,

The columns are:

seq: sequence number
subj: subject number (40 subjects)
item: item number (16 experimental items)
cond: condition number (4 conditions)
region: region number (7 regions per experimental item)
datum: length of fixation in ms

Each experimental item is split into 7 rows of data, one for each region.

3.2 Reading in data

Use dplyr::read_delim() to read in the .ixs files as a tibble:

# Read in first pass times for SUB items
fp_rcs <- read_delim("./fp-rcs.ixs", delim = ",", col_names = TRUE)

# Read in first pass times for OBJ items
fp_rco <- read_delim("./fp-rco.ixs", delim = ",", col_names = TRUE)

Print the newly created data frames:

# First pass times for SUB items
fp_rcs

# A tibble: 2,058 x 6
     seq  subj  item  cond region datum
   <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
 1    26     1     1     5      1   318
 2    26     1     1     5      2   311
 3    26     1     1     5      3   488
 4    26     1     1     5      4   350
 5    26     1     1     5      5   171
 6    26     1     1     5      6   356
 7    26     1     1     5      7   696
 8    27     1     2     6      1   470
 9    27     1     2     6      2   133
10    27     1     2     6      3   405
# … with 2,048 more rows

# First pass times for OBJ items
fp_rco

# A tibble: 2,093 x 6
     seq  subj  item  cond region datum
   <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
 1   133     1     3     7      1   323
 2   133     1     3     7      2   417
 3   133     1     3     7      3    NA
 4   133     1     3     7      4   216
 5   133     1     3     7      5   527
 6   133     1     3     7      6   218
 7   133     1     3     7      7    NA
 8    47     1     7     7      1   300
 9    47     1     7     7      2   205
10    47     1     7     7      3   184
# … with 2,083 more rows

3.3 Creating columns

We will separately create 2 columns on the fp_rcs and fp_rco data frames before combining the data frames.

pos: identify whether a row is from a SUB or OBJ item
ext: identify whether a row is from a SRC or ORC item

3.3.1 `pos`

The pos column identifies whether a row is from a SUB or OBJ item.

Use mutate():

# All items in `fp-rcs.ixs` are SUB items
fp_rcs <- read_delim("./fp-rcs.ixs", delim = ",", col_names = TRUE) %>%
  mutate(pos = "SUB")

# All items in `fp-rco.ixs` are OBJ items
fp_rco <- read_delim("./fp-rcs.ixs", delim = ",", col_names = TRUE) %>%
  mutate(pos = "OBJ")

3.3.2 `ext`

The ext column identifies whether a row is from a SRC or ORC item.

The value of a row’s ext column can be created from the cond column, which identifies the experimental condition of that row/item:

cond == 5: SUB-SRC
cond == 6: SUB-ORC
cond == 7: OBJ-SRC
cond == 8: OBJ-ORC

fp-rcs.ixs only contains SUB items (rows with a cond value of 5 or 6):

If cond == 5, then ext == SRC
Else, cond == 6 and ext == ORC

fp-rco.ixs only contains OBJ items (rows with a cond value of 7 or 8):

If cond == 7, then ext == SRC
Else, cond == 8 and ext == ORC

Use mutate() and dplyr::if_else():

# If `cond` is `5`, then the item is an SRC, else an ORC.
fp_rcs <- read_delim("./fp-rcs.ixs", delim = ",", col_names = TRUE) %>%
  mutate(pos = "SUB",
         ext = if_else(cond == 5, "SRC", "ORC"))

# If `cond` is `7`, then the item is an SRC, else an ORC.
fp_rco <- read_delim("./fp-rco.ixs", delim = ",", col_names = TRUE) %>%
  mutate(pos = "OBJ",
         ext = if_else(cond == 7, "SRC", "ORC"))

3.4 Combining data frames

Combine the fp_rcs and fp_rco data frames into a single data frame with dplyr::bind_rows():

fp <- bind_rows(fp_rcs, fp_rco)

fp

# A tibble: 4,151 x 8
     seq  subj  item  cond region datum pos   ext  
   <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <chr> <chr>
 1    26     1     1     5      1   318 SUB   SRC  
 2    26     1     1     5      2   311 SUB   SRC  
 3    26     1     1     5      3   488 SUB   SRC  
 4    26     1     1     5      4   350 SUB   SRC  
 5    26     1     1     5      5   171 SUB   SRC  
 6    26     1     1     5      6   356 SUB   SRC  
 7    26     1     1     5      7   696 SUB   SRC  
 8    27     1     2     6      1   470 SUB   ORC  
 9    27     1     2     6      2   133 SUB   ORC  
10    27     1     2     6      3   405 SUB   ORC  
# … with 4,141 more rows

3.5 More manipulations

For manipulations that are independent of the SUB and OBJ items, it is more efficient to perform them on a combined fp data frame than to perform them twice on separate data frames:

Factors: coercing character vectors into factors
cond2: identify whether a row is from a SUB-SRC, SUB-ORC, OBJ-SRC, or OBJ-ORC item
region2: identify whether a row is the intro, msub, RC, spillover, mverb, mobj, or final region of an item

3.5.1 Coercing factors

The newly created pos and ext columns are character class vectors:

# Check the class of the `pos` column
class(fp$pos)

[1] "character"

# Check the class of the `ext` column
class(fp$ext)

[1] "character"

However, in this experiment, they should actually be factors with two levels.

Use mutate() and base::factor() to coerce the pos and ext columns into factors:

fp <- bind_rows(fp_rcs, fp_rco) %>% 
  # Coerce `pos` and `ext` columns into factors with the specified levels
  mutate(pos = factor(pos, levels = c("SUB", "OBJ")),
         ext = factor(ext, levels = c("SRC", "ORC")))

# Check the class of the `pos` column
class(fp$pos)

[1] "factor"

# Check the class of the `ext` column
class(fp$ext)

[1] "factor"

3.5.2 `cond2`

The cond2 column identifies whether a row is from a SUB-SRC, SUB-ORC, OBJ-SRC, or OBJ-ORC item.

Concatenate the pos and ext columns with stringr::str_c(), and coerce cond2 into a factor for analysis:

fp <- bind_rows(fp_rcs, fp_rco) %>% 
  mutate(pos = factor(pos, levels = c("SUB", "OBJ")),
         ext = factor(ext, levels = c("SRC", "ORC")),
         # Create `cond2` column
         cond2 = str_c(pos, ext, sep = "-"),
         # Coerce `cond2` into a factor
         cond2 = factor(cond2, levels = c("SUB-SRC", "SUB-ORC", "OBJ-SRC", "OBJ-ORC")))

3.5.3 `region2`

The region2 column identifies whether a row is the intro, msub, RC, spillover, mverb, mobj, or final region of an item.

The value of a row’s region2 column can be determined by the value of the region and pos columns, following the region orders.

If region == 1, then region2 == "intro"
Else if region == 2, then region2 == "msub"
Else if region == 3 and pos == "SUB", then region2 == "RC"
Else if region == 3 and pos == "OBJ", then region2 == "mverb"
etc.

dplyr::case_when() is the equivalent of nesting multiple dplyr::if_else() statements.

Use mutate() and case_when():

fp <- bind_rows(fp_rcs, fp_rco) %>% 
  mutate(pos = factor(pos, levels = c("SUB", "OBJ")),
         ext = factor(ext, levels = c("SRC", "ORC")),
         cond2 = str_c(pos, ext, sep = "-"),
         cond2 = factor(cond2, levels = c("SUB-SRC", "SUB-ORC", "OBJ-SRC", "OBJ-ORC")),
         # Create `region2` column
         region2 = case_when(region == 1 ~ "intro",
                             region == 2 ~ "msub",
                             region == 3 & pos == "SUB" ~ "RC",
                             region == 3 & pos == "OBJ" ~ "mverb",
                             region == 4 & pos == "SUB" ~ "spillover",
                             region == 4 & pos == "OBJ" ~ "mobj",
                             region == 5 & pos == "SUB" ~ "mverb",
                             region == 5 & pos == "OBJ" ~ "RC",
                             region == 6 & pos == "SUB" ~ "mobj",
                             region == 6 & pos == "OBJ" ~ "spillover",
                             region == 7 ~ "final"))

fp

# A tibble: 4,151 x 10
     seq  subj  item  cond region datum pos   ext   cond2   region2  
   <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <fct> <fct> <fct>   <chr>    
 1    26     1     1     5      1   318 SUB   SRC   SUB-SRC intro    
 2    26     1     1     5      2   311 SUB   SRC   SUB-SRC msub     
 3    26     1     1     5      3   488 SUB   SRC   SUB-SRC RC       
 4    26     1     1     5      4   350 SUB   SRC   SUB-SRC spillover
 5    26     1     1     5      5   171 SUB   SRC   SUB-SRC mverb    
 6    26     1     1     5      6   356 SUB   SRC   SUB-SRC mobj     
 7    26     1     1     5      7   696 SUB   SRC   SUB-SRC final    
 8    27     1     2     6      1   470 SUB   ORC   SUB-ORC intro    
 9    27     1     2     6      2   133 SUB   ORC   SUB-ORC msub     
10    27     1     2     6      3   405 SUB   ORC   SUB-ORC RC       
# … with 4,141 more rows

4 Data analysis

This section provides examples of using dplyr functions to analyze first pass time data.

The .ixs files contain missing values in the datum column, so summary functions should include an na.rm = TRUE argument.

4.1 Winsorizing

Winsorization is a method of reducing the effect of outliers by replacing extreme values with less extreme values.

Install and load the psych package for the the psych::winsor() function:

install.packages("psych")
library(psych)

First pass times should be winsorized by experimental condition and region, because the different conditons and regions differ in length and difficulty.

Use group_by(), mutate(), and winsor() to winsorize by group:

# Winsorize first pass times by condition and region
fp <- fp %>%  
  group_by(cond2, region2) %>% 
  mutate(win = winsor(datum, trim = 0.1, na.rm = TRUE)) %>% 
  ungroup()

fp

# A tibble: 4,151 x 11
     seq  subj  item  cond region datum pos   ext   cond2   region2     win
   <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <fct> <fct> <fct>   <chr>     <dbl>
 1    26     1     1     5      1   318 SUB   SRC   SUB-SRC intro      318 
 2    26     1     1     5      2   311 SUB   SRC   SUB-SRC msub       311 
 3    26     1     1     5      3   488 SUB   SRC   SUB-SRC RC         488 
 4    26     1     1     5      4   350 SUB   SRC   SUB-SRC spillover  350 
 5    26     1     1     5      5   171 SUB   SRC   SUB-SRC mverb      180.
 6    26     1     1     5      6   356 SUB   SRC   SUB-SRC mobj       356 
 7    26     1     1     5      7   696 SUB   SRC   SUB-SRC final      696 
 8    27     1     2     6      1   470 SUB   ORC   SUB-ORC intro      470 
 9    27     1     2     6      2   133 SUB   ORC   SUB-ORC msub       189 
10    27     1     2     6      3   405 SUB   ORC   SUB-ORC RC         405 
# … with 4,141 more rows

4.2 Means and standard errors

The RC experiment looks at the effects of pos and ext on relative clause processing difficulty. Increased processing difficulty is associated with longer first pass time durations.

We should look at the average first pass time (with standard errors) for each condition and region group.

Install and load the plotrix() package for the the plotrix::std.error() function:

install.packages("plotrix")
library(plotrix)

Use group_by(), summarize(), and std.error() to calculate mean first pass times with standard errors:

# Calculate mean first pass times by condition and region
fp %>% 
  group_by(region2, cond2) %>% 
  summarize(avg = mean(win, na.rm = T), ste = std.error(win, na.rm = T))

# A tibble: 28 x 4
# Groups:   region2 [7]
   region2 cond2     avg   ste
   <chr>   <fct>   <dbl> <dbl>
 1 final   SUB-SRC  585. 26.7 
 2 final   SUB-ORC  580. 23.2 
 3 final   OBJ-SRC  545. 22.1 
 4 final   OBJ-ORC  545. 22.2 
 5 intro   SUB-SRC  523. 19.7 
 6 intro   SUB-ORC  567. 20.5 
 7 intro   OBJ-SRC  522. 21.0 
 8 intro   OBJ-ORC  526. 19.7 
 9 mobj    SUB-SRC  308.  8.86
10 mobj    SUB-ORC  312. 10.0 
# … with 18 more rows

5 Data visualization

ggplot2 is a tidyverse package for creating graphics. This section uses ggplot2 to provide examples of tidyverse data visualization, but is not intended to be an introduction to ggplot2. To learn more about data visualization, read Chapter 3 “Data visualization” of R for Data Science, by Garrett Grolemund and Hadley Wickham.

5.1 RC region

The RC experiment has a 2x2 factorial design. Line graphs are good for visualizing 2x2 factorials.

To learn more about factorial design visualization, see Chapter 10.2 “Interpreting main effects and interactions” of Answering Questions with Data by Matthew Crump.

Create a line graph for the most important region, the critical RC region:

# Pick out first pass data from the RC region only, and drop rows with missing values
fp_RC <- fp %>% 
  filter(region2 == "RC") %>% 
  drop_na() 

# Create a line graph for `fp_RC` / the RC region
ggplot(fp_RC, aes(x = pos, y = win, group = ext, color = ext)) +
  labs(title = "First Pass Times", x = "Matrix position (pos)", 
       y = "Winsorized fixation time (ms)", color = "Extraction type (ext)", 
       subtitle = "RC region") +
  stat_summary(fun.y = mean, geom = "line") +
  stat_summary(fun.y = mean, geom = "point") +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.25) +  
  theme(legend.position = "bottom", legend.box = "horizontal",
        legend.background = element_rect(fill = "grey90"),
        strip.background = element_rect(fill="grey90"))

5.2 All regions

In the RC experiment, the RC region is the most important. However, visualizing all regions can be helpful in interpreting main effects and interactions.

You can use ggplot2::facet_wrap() to create a facet, or subplot, for each region and display all facets in a single row.

There are two region orders, so there should be two facet orders:

SUB items: intro¹ | msub² | RC³ | spillover⁴ | mverb⁵ | mobj⁶ | final⁷
OBJ items: intro¹ | msub² | mverb³ | mobj⁴ | RC⁵ | spillover⁶ | final⁷

5.2.1 SUB item facet order

Create facets for all regions, following the SUB item region order:

# Create a copy of `fp` that coerces the `region2` column into a factor with levels. 
# Specifying the levels sets the facet order.
fp_sub <-  fp %>% 
    mutate(region2 = factor(region2, levels = c("intro", "msub", "RC", "spillover", "mverb", "mobj", "final"))) %>% 
  drop_na()
  
# SUB item region order
ggplot(fp_sub, aes(x = pos, y = win, group = ext, color = ext)) +
  labs(title = "First Pass Times", x = "Matrix position (pos)", 
       y = "Fixation time (ms), winsorized", color = "Extraction type (ext)", 
       subtitle = "SUB item region order") +
  stat_summary(fun.y = mean, geom = "line") +
  stat_summary(fun.y = mean, geom = "point") +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.25) +  
  theme(legend.position = "bottom", legend.box = "horizontal",
        legend.background = element_rect(fill = "grey90"),
        strip.background = element_rect(fill="grey90")) +
    # Create facet for each region and display all facets in a single row
    facet_wrap( ~ region2, nrow = 1)

5.2.2 OBJ item facet order

Create facets for all regions, following the OBJ item region order:

# Create a copy of `fp` that coerces the `region2` column into a factor with levels. 
# Specifying the levels sets the facet order.
fp_obj <-  fp %>% 
    mutate(region2 = factor(region2, levels = c("intro", "msub", "mverb", "mobj", "RC", "spillover", "final"))) %>% 
  drop_na()
  
# OBJ item region order
ggplot(fp_obj, aes(x = pos, y = win, group = ext, color = ext)) +
  labs(title = "First Pass Times", x = "Matrix position (pos)", 
       y = "Fixation time (ms), winsorized", color = "Extraction type (ext)", 
       subtitle = "OBJ item region order") +
  stat_summary(fun.y = mean, geom = "line") +
  stat_summary(fun.y = mean, geom = "point") +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.25) +  
  theme(legend.position = "bottom", legend.box = "horizontal",
        legend.background = element_rect(fill = "grey90"),
        strip.background = element_rect(fill="grey90")) +
    # Create facet for each region and display all facets in a single row
    facet_wrap( ~ region2, nrow = 1)

6 User-defined functions

You can define a function in R with the following syntax:

# Define `function_name()`
function_name <- function(argument1, argument2, ...){
  # some computation involving the arguments
}

# Call `function_name()`
function_name(argument1, argument2)

User-defined functions (UDFs) allow you to reuse code easily. Writing and calling a UDF is more powerful than simply copy-and-pasting code for several reasons, including :

UDFs lower the chance of copy-and-paste errors, like forgetting to change a variable name in a copied code block.
UDFs update easily; changing the code in a UDF once is easier than changing the code many times in copied code blocks.
UDFs reduce overall code length and make R scripts easier to read.

To learn more about user-defined functions, read Chapter 19 “Functions” of R for Data Science, by Garrett Grolemund and Hadley Wickham.

6.1 UDFs and data manipulation

This section provides an example of defining a function to make data manipulation easier.

6.1.1 `combine_sub_obj()`

In Section 3 Data manipulation, we read in the fp-rcs.ixs and fp-rco.ixs files and performed various manipulations to create the first pass times fp data frame:

#### First pass times
# Read in `fp-rcs.ixs`
fp_rcs <- read_delim("./fp-rcs.ixs", delim = ",", col_names = TRUE) %>%
  mutate(pos = "SUB",
         ext = if_else(cond == 5, "SRC", "ORC"))

# Read in `fp-rco.ixs` 
fp_rco <- read_delim("./fp-rco.ixs", delim = ",", col_names = TRUE) %>%
  mutate(pos = "OBJ",
         ext = if_else(cond == 7, "SRC", "ORC"))

# Combine `fp_rcs` and `fp_rco` data frames
fp <- bind_rows(fp_rcs, fp_rco) %>% 
  mutate(pos = factor(pos, levels = c("SUB", "OBJ")),
         ext = factor(ext, levels = c("SRC", "ORC")),
         cond2 = str_c(pos, ext, sep = "-"),
         cond2 = factor(cond2, levels = c("SUB-SRC", "SUB-ORC", "OBJ-SRC", "OBJ-ORC")),
         region2 = case_when(region == 1 ~ "intro",
                             region == 2 ~ "msub",
                             region == 3 & pos == "SUB" ~ "RC",
                             region == 3 & pos == "OBJ" ~ "mverb",
                             region == 4 & pos == "SUB" ~ "spillover",
                             region == 4 & pos == "OBJ" ~ "mobj",
                             region == 5 & pos == "SUB" ~ "mverb",
                             region == 5 & pos == "OBJ" ~ "RC",
                             region == 6 & pos == "SUB" ~ "mobj",
                             region == 6 & pos == "OBJ" ~ "spillover",
                             region == 7 ~ "final"))

First pass times are just one eye tracking metric. EyeDry calculates other eye tracking metrics, like right-bounded times and second pass times.

right-bounded times (rb): the sum of all fixations in a region until the first fixation to the region’s right
second pass times (sp): the sum of all fixations in a region after the initial first pass fixations

The data for each eye tracking metric is contained in a pair of <metric>-rcs.ixs and <metrix>-rco.ixs files that must be read in and transformed like the first pass times data:

rb-rcs.ixs: right-bounded times data for SUB items
rb-rco.ixs: right-bounded times data for OBJ items
sp-rcs.ixs: second pass times data for SUB items
sp-rco.ixs: second pass times data for OBJ items

It would be redundant and error-prone to copy-and-paste the dplyr manipulations for every eye tracking metric.

Instead, define a function like combine_sub_obj():

# `combine_sub_obj()`: Read in, combine, and manipulate the `.ixs` files
combine_sub_obj <- function(sub, obj){
  rcs <- read_delim(sub, delim = ",", col_names = TRUE) %>% 
    mutate(pos = "SUB",
           ext = if_else(cond == 5, "SRC", "ORC"))
  
  rco <- read_delim(obj, delim = ",", col_names = TRUE) %>% 
    mutate(pos = "OBJ",
           ext = if_else(cond == 7, "SRC", "ORC"))
  
  bind_rows(rcs, rco) %>% 
  mutate(pos = factor(pos, levels = c("SUB", "OBJ")),
         ext = factor(ext, levels = c("SRC", "ORC")),
         cond2 = str_c(pos, ext, sep = "-"),
         cond2 = factor(cond2, levels = c("SUB-SRC", "SUB-ORC", "OBJ-SRC", "OBJ-ORC")),
         region2 = case_when(region == 1 ~ "intro",
                             region == 2 ~ "msub",
                             region == 3 & pos == "SUB" ~ "RC",
                             region == 3 & pos == "OBJ" ~ "mverb",
                             region == 4 & pos == "SUB" ~ "spillover",
                             region == 4 & pos == "OBJ" ~ "mobj",
                             region == 5 & pos == "SUB" ~ "mverb",
                             region == 5 & pos == "OBJ" ~ "RC",
                             region == 6 & pos == "SUB" ~ "mobj",
                             region == 6 & pos == "OBJ" ~ "spillover",
                             region == 7 ~ "final"))
}

Create right-bounded times data frame:

# Right-bounded times
rb <- combine_sub_obj("./rb-rcs.ixs", "./rb-rco.ixs")

rb

# A tibble: 4,151 x 10
     seq  subj  item  cond region datum pos   ext   cond2   region2  
   <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <fct> <fct> <fct>   <chr>    
 1    26     1     1     5      1   318 SUB   SRC   SUB-SRC intro    
 2    26     1     1     5      2   311 SUB   SRC   SUB-SRC msub     
 3    26     1     1     5      3   488 SUB   SRC   SUB-SRC RC       
 4    26     1     1     5      4   350 SUB   SRC   SUB-SRC spillover
 5    26     1     1     5      5   171 SUB   SRC   SUB-SRC mverb    
 6    26     1     1     5      6   356 SUB   SRC   SUB-SRC mobj     
 7    26     1     1     5      7  1271 SUB   SRC   SUB-SRC final    
 8    27     1     2     6      1   470 SUB   ORC   SUB-ORC intro    
 9    27     1     2     6      2   133 SUB   ORC   SUB-ORC msub     
10    27     1     2     6      3   405 SUB   ORC   SUB-ORC RC       
# … with 4,141 more rows

Create second pass times data frame:

# Second pass times
sp <- combine_sub_obj("./sp-rcs.ixs", "./sp-rco.ixs")

sp

# A tibble: 4,151 x 10
     seq  subj  item  cond region datum pos   ext   cond2   region2  
   <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <fct> <fct> <fct>   <chr>    
 1    26     1     1     5      1     0 SUB   SRC   SUB-SRC intro    
 2    26     1     1     5      2     0 SUB   SRC   SUB-SRC msub     
 3    26     1     1     5      3   456 SUB   SRC   SUB-SRC RC       
 4    26     1     1     5      4   436 SUB   SRC   SUB-SRC spillover
 5    26     1     1     5      5   466 SUB   SRC   SUB-SRC mverb    
 6    26     1     1     5      6   373 SUB   SRC   SUB-SRC mobj     
 7    26     1     1     5      7     0 SUB   SRC   SUB-SRC final    
 8    27     1     2     6      1     0 SUB   ORC   SUB-ORC intro    
 9    27     1     2     6      2   249 SUB   ORC   SUB-ORC msub     
10    27     1     2     6      3   944 SUB   ORC   SUB-ORC RC       
# … with 4,141 more rows

6.2 UDFs and data analysis

This section provides examples of defining a function to make data analysis easier.

6.2.1 `win_datum()`

Section 4.1 Winsorizing described winsorizing first pass data:

fp <- fp %>%  
  group_by(cond2, region2) %>% 
  mutate(win = winsor(datum, trim = 0.1, na.rm = TRUE)) %>% 
  ungroup()

Define a function like win_datum() to automate this process:

# `win_datum()`: Winsorize the `datum` column by condition and region
win_datum <- function(df){df %>% 
    group_by(cond2, region2) %>% 
    mutate(win = winsor(datum, trim = 0.1, na.rm = TRUE)) %>% 
    ungroup()
}

Winsorize right-bounded times:

# Winsorize right-bounded times by condition and region
rb <- rb %>% 
  win_datum()

rb

# A tibble: 4,151 x 11
     seq  subj  item  cond region datum pos   ext   cond2   region2     win
   <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <fct> <fct> <fct>   <chr>     <dbl>
 1    26     1     1     5      1   318 SUB   SRC   SUB-SRC intro      318 
 2    26     1     1     5      2   311 SUB   SRC   SUB-SRC msub       311 
 3    26     1     1     5      3   488 SUB   SRC   SUB-SRC RC         488 
 4    26     1     1     5      4   350 SUB   SRC   SUB-SRC spillover  350 
 5    26     1     1     5      5   171 SUB   SRC   SUB-SRC mverb      187.
 6    26     1     1     5      6   356 SUB   SRC   SUB-SRC mobj       356 
 7    26     1     1     5      7  1271 SUB   SRC   SUB-SRC final     1271 
 8    27     1     2     6      1   470 SUB   ORC   SUB-ORC intro      470 
 9    27     1     2     6      2   133 SUB   ORC   SUB-ORC msub       189 
10    27     1     2     6      3   405 SUB   ORC   SUB-ORC RC         606.
# … with 4,141 more rows

Ssecond pass times are generally not winsorized, so do not call win_datum() on the sp data frame.

6.2.2 `mean_summary()`

Section 4.2 Means and standard errors described calculating mean first pass times:

fp %>% 
  group_by(region2, cond2) %>% 
  summarize(avg = mean(win, na.rm = T), ste = std.error(win, na.rm = T))

# A tibble: 28 x 4
# Groups:   region2 [7]
   region2 cond2     avg   ste
   <chr>   <fct>   <dbl> <dbl>
 1 final   SUB-SRC  585. 26.7 
 2 final   SUB-ORC  580. 23.2 
 3 final   OBJ-SRC  545. 22.1 
 4 final   OBJ-ORC  545. 22.2 
 5 intro   SUB-SRC  523. 19.7 
 6 intro   SUB-ORC  567. 20.5 
 7 intro   OBJ-SRC  522. 21.0 
 8 intro   OBJ-ORC  526. 19.7 
 9 mobj    SUB-SRC  308.  8.86
10 mobj    SUB-ORC  312. 10.0 
# … with 18 more rows

Define a function like mean_summary() to automate this process:

# `mean_summary()`: Calculate the average `column` value by condition and region 
mean_summary <- function(df, column){
  column <- enquo(column)
  
  df %>% 
    group_by(cond2, region2) %>% 
    summarize(avg = mean(!!column, na.rm = T), ste = std.error(!!column, na.rm = T)) %>% 
    ungroup()
}

mean_summary() takes 2 arguments:

df: the data frame
column: the value to calculate the mean of(datum for unwinsorized data, win for winsorized data)

Calculate the mean right-bounded times:

# Average right-bounded times (winsorized)
rb %>% 
  mean_summary(win)

# A tibble: 28 x 4
   cond2   region2     avg   ste
   <fct>   <chr>     <dbl> <dbl>
 1 SUB-SRC final      838.  32.8
 2 SUB-SRC intro      531.  19.8
 3 SUB-SRC mobj       343.  11.2
 4 SUB-SRC msub       340.  11.5
 5 SUB-SRC mverb      341.  11.6
 6 SUB-SRC RC         833.  21.6
 7 SUB-SRC spillover  528.  17.9
 8 SUB-ORC final      810   32.2
 9 SUB-ORC intro      570.  20.4
10 SUB-ORC mobj       375.  14.8
# … with 18 more rows

Calculate the mean second pass times:

# Average second past times (not winsorized)
sp %>% 
  mean_summary(datum)

# A tibble: 28 x 4
   cond2   region2     avg   ste
   <fct>   <chr>     <dbl> <dbl>
 1 SUB-SRC final      31.3  15.2
 2 SUB-SRC intro     201.   29.3
 3 SUB-SRC mobj      233.   26.1
 4 SUB-SRC msub      280.   28.0
 5 SUB-SRC mverb     326.   38.4
 6 SUB-SRC RC        769.   78.1
 7 SUB-SRC spillover 454.   50.8
 8 SUB-ORC final      33.0  13.6
 9 SUB-ORC intro     238.   36.6
10 SUB-ORC mobj      286.   35.1
# … with 18 more rows

6.3 UDFs and data visualization

This section provides examples of defining a function to make data visualization easier.

6.3.1 `linegraph_sub()`

Section 5.2.1 SUB item facet order described graphing first pass times, by SUB item region order:

ggplot(fp_sub, aes(x = pos, y = win, group = ext, color = ext)) +
  labs(title = "First Pass Times", x = "Matrix position (pos)", 
       y = "Fixation time (ms), winsorized", color = "Extraction type (ext)", 
       subtitle = "SUB item region order") +
  stat_summary(fun.y = mean, geom = "line") +
  stat_summary(fun.y = mean, geom = "point") +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.25) +  
  theme(legend.position = "bottom", legend.box = "horizontal",
        legend.background = element_rect(fill = "grey90"),
        strip.background = element_rect(fill="grey90")) +
    facet_wrap( ~ region2, nrow = 1)

Define a function like linegraph_sub() to automate this process:

# `linegraph_sub()`: Create facets for all regions, following the SUB item order
linegraph_sub <- function(df, y.value, y.label, add.title){
  y.value <- enquo(y.value)
  
  df <- df %>% 
    mutate(region2 = factor(region2, levels = c("intro", "msub", "RC", "spillover", "mverb", "mobj", "final"))) %>% 
    drop_na()
  
  ggplot(df, aes(x = pos, y = !!y.value, group = ext, color = ext)) +
    labs(title = add.title, x = "Matrix position (pos)", 
         y = y.label, color = "Extraction type (ext)", 
         subtitle = "SUB Condition Order") +
    stat_summary(fun.y = mean, geom = "line") +
    stat_summary(fun.y = mean, geom = "point") +
    stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.25) +  
    theme(legend.position = "bottom", legend.box = "horizontal",
          legend.background = element_rect(fill = "grey90"),
          strip.background = element_rect(fill="grey90")) +
    facet_wrap( ~ region2, nrow = 1)
}

linegraph_sub() takes 4 arguments:

df: the data frame
y.value: the y-value of the graph (datum for unwinsorized data, win for winsorized data)
y.label: the label for the y-value
add.title: the title of the graph

Graph right-bounded times:

# Graph of right-bounded times for all regions, SUB item order
linegraph_sub(rb, win, "Fixation time (ms), winsorized", "Right-bounded Times")

Graph second pass times:

# Graph of second pass times for all regions, SUB item order
linegraph_sub(sp, datum, "Fixation time (ms)", "Second Pass Times")

6.3.2 `linegraph_obj()`

Section 5.2.2 OBJ item facet order described graphing first pass times, by OBJ item region order:

ggplot(fp_obj, aes(x = pos, y = win, group = ext, color = ext)) +
  labs(title = "First Pass Times", x = "Matrix position (pos)", 
       y = "Fixation time (ms), winsorized", color = "Extraction type (ext)", 
       subtitle = "OBJ item region order") +
  stat_summary(fun.y = mean, geom = "line") +
  stat_summary(fun.y = mean, geom = "point") +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.25) +  
  theme(legend.position = "bottom", legend.box = "horizontal",
        legend.background = element_rect(fill = "grey90"),
        strip.background = element_rect(fill="grey90")) +
    facet_wrap( ~ region2, nrow = 1)

Define a function like linegraph_obj() to automate this process:

# `linegraph_obj()`: Create facets for all regions, following the OBJ item order
linegraph_obj <- function(df, y.value, y.label, add.title){
  y.value <- enquo(y.value)
  
  df <- df %>% 
    mutate(region2 = factor(region2, levels = c("intro", "msub", "mverb", "mobj", "RC", "spillover", "final"))) %>% 
    drop_na()
  
  ggplot(df, aes(x = pos, y = !!y.value, group = ext, color = ext)) +
    labs(title = add.title, x = "Matrix position (pos)", 
         y = y.label, color = "Extraction type (ext)", 
         subtitle = "OBJ Condition Order") +
    stat_summary(fun.y = mean, geom = "line") +
    stat_summary(fun.y = mean, geom = "point") +
    stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.25) +  
    theme(legend.position = "bottom", legend.box = "horizontal",
          legend.background = element_rect(fill = "grey90"),
          strip.background = element_rect(fill="grey90")) +
    facet_wrap( ~ region2, nrow = 1)
}

linegraph_obj() takes 4 arguments:

df: the data frame
y.value: the y-value of the graph (datum for unwinsorized data, win for winsorized data)
y.label: the label for the y-value
add.title: the title of the graph