Goal: See practical examples of using tidyverse packages and user-defined functions to analyze and visualize a psycholinguistic experiment.
Prerequisites: You should be familiar with the tidyverse package dplyr and psycholinguistic experiment analysis.
This document uses tidyverse packages to analyze and visualize data from a real psycholinguistic experiment, the eye tracking while reading experiment RC
. This guide also introduces user-defined functions, and provides examples of defining functions to facilitate data analysis and visualization.
RC
RC
is an eye tracking while reading experiment conducted at the UCLA Language Processing Lab by Angelica Pan and supervised by Jesse Harris. Data was collected by an SR Eyelink 1000 Tower Mount and preprocessed with EyeDry.
In an eye tracking while reading experiment, subjects read text that is displayed on a computer screen. As they read, a high speed camera records their eye movements. Eye tracking while reading is a popular method for studying sentence processing because certain eye movement patterns, such as long fixations on difficult sentence structures, have been linked to cognitive effort.
To learn more about eye tracking, read Eye Movements in Reading Words and Sentences (2007) by Charles Clifton Jr., Adrian Staub, and Keith Rayner.
RC
has a 2x2 latin square design, crossing relative clause matrix position (pos
) with relative clause extraction type (ext
). This experiment looked at the effects of pos
and ext
on relative clause processing difficulty.
Factors and levels:
pos
: matrix subject (SUB) or matrix object (OBJ)ext
: subject-extracted (SRC) or object-extracted (ORC)Conditions:
Experimental items are sentence quartets:
During the experiment, subjects read experimental items as a single line of unbroken text displayed on a computer monitor.
Items are split into 7 regions for analysis:
The linear order of the regions differs across the SUB and OBJ levels.
Region order for SUB items:
Region order for OBJ items:
EyeDry is a program that preprocesses eye tracking data. It extracts eye movement metrics from raw eye tracking data into <metric>.ixs
files, which are tabular data that can be read into a R data frame.
Eye tracking metrics mentioned in this guide:
EyeDry also detects if and where a subject blinked while reading a sentence during an experimental trial. Trials with too many blinks on a specified critical region are excluded because no eye movement information can be recorded when a subject blinks.
The critical region in EyeDry is specified by linear order. However, in RC
the critical RC region is the linearly 3rd region for SUB items, but the linarly 5th region for OBJ items.
To solve this issue, the RC
data was split into two files, one with data from the SUB item trials and one with data from the OBJ item trials. EyeDry was run separately on each file.
In the RC
experiment, the data for each eye movement metric is therefore contained in two files:
<metric>-rcs.ixs
: eye movement metric data from SUB item trials<metric>-rco.ixs
: eye movement metric data from OBJ item trials.During analysis, the corresponding
<metric>-rcs.ixs
and<metric>-rco.ixs
files for each metric must be joined for a complete set of values.
A first pass time is the sum of all fixations made in a given region from the first time the point of fixation enters the region until the the first time the point of fixation leaves the region. To learn more about first pass times, read Eye Movements in Reading Words and Sentences.
First pass times are an “early” eye tracking metric, meaning that first pass fixations are associated with initial sentence processing processes.
Most examples in this guide use first pass times data.
This section uses dplyr functions to manipulate first pass time time data from the RC
experiment. For a review on dplyr, read the Tidyverse Data Manipulation Quickstart.
.ixs
filesThe first pass (fp) time data is contained within fp-rcs.ixs
and fp-rco.ixs
. They are comma-delimited files in which the first row contains the column names.
The full
.ixs
files that appear in this guide are currently not available to the public.
First 8 lines of fp-rcs.ixs
:
seq,subj,item,cond,region,datum
26,1,1,5,1,318
26,1,1,5,2,311
26,1,1,5,3,488
26,1,1,5,4,350
26,1,1,5,5,171
26,1,1,5,6,356
26,1,1,5,7,696
First 8 lines of fp-rco.ixs
:
seq,subj,item,cond,region,datum
133,1,3,7,1,323
133,1,3,7,2,417
133,1,3,7,3,
133,1,3,7,4,216
133,1,3,7,5,527
133,1,3,7,6,218
133,1,3,7,7,
The columns are:
seq
: sequence numbersubj
: subject number (40 subjects)item
: item number (16 experimental items)cond
: condition number (4 conditions)region
: region number (7 regions per experimental item)datum
: length of fixation in msEach experimental item is split into 7 rows of data, one for each region.
Use dplyr::read_delim()
to read in the .ixs
files as a tibble:
# Read in first pass times for SUB items
fp_rcs <- read_delim("./fp-rcs.ixs", delim = ",", col_names = TRUE)
# Read in first pass times for OBJ items
fp_rco <- read_delim("./fp-rco.ixs", delim = ",", col_names = TRUE)
Print the newly created data frames:
# First pass times for SUB items
fp_rcs
# A tibble: 2,058 x 6
seq subj item cond region datum
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 26 1 1 5 1 318
2 26 1 1 5 2 311
3 26 1 1 5 3 488
4 26 1 1 5 4 350
5 26 1 1 5 5 171
6 26 1 1 5 6 356
7 26 1 1 5 7 696
8 27 1 2 6 1 470
9 27 1 2 6 2 133
10 27 1 2 6 3 405
# … with 2,048 more rows
# First pass times for OBJ items
fp_rco
# A tibble: 2,093 x 6
seq subj item cond region datum
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 133 1 3 7 1 323
2 133 1 3 7 2 417
3 133 1 3 7 3 NA
4 133 1 3 7 4 216
5 133 1 3 7 5 527
6 133 1 3 7 6 218
7 133 1 3 7 7 NA
8 47 1 7 7 1 300
9 47 1 7 7 2 205
10 47 1 7 7 3 184
# … with 2,083 more rows
We will separately create 2 columns on the fp_rcs
and fp_rco
data frames before combining the data frames.
pos
: identify whether a row is from a SUB or OBJ itemext
: identify whether a row is from a SRC or ORC itempos
The pos
column identifies whether a row is from a SUB or OBJ item.
Use mutate()
:
# All items in `fp-rcs.ixs` are SUB items
fp_rcs <- read_delim("./fp-rcs.ixs", delim = ",", col_names = TRUE) %>%
mutate(pos = "SUB")
# All items in `fp-rco.ixs` are OBJ items
fp_rco <- read_delim("./fp-rcs.ixs", delim = ",", col_names = TRUE) %>%
mutate(pos = "OBJ")
ext
The ext
column identifies whether a row is from a SRC or ORC item.
The value of a row’s ext
column can be created from the cond
column, which identifies the experimental condition of that row/item:
cond == 5
: SUB-SRCcond == 6
: SUB-ORCcond == 7
: OBJ-SRCcond == 8
: OBJ-ORCfp-rcs.ixs
only contains SUB items (rows with a cond
value of 5
or 6
):
cond == 5
, then ext == SRC
cond == 6
and ext == ORC
fp-rco.ixs
only contains OBJ items (rows with a cond
value of 7
or 8
):
cond == 7
, then ext == SRC
cond == 8
and ext == ORC
Use mutate()
and dplyr::if_else()
:
# If `cond` is `5`, then the item is an SRC, else an ORC.
fp_rcs <- read_delim("./fp-rcs.ixs", delim = ",", col_names = TRUE) %>%
mutate(pos = "SUB",
ext = if_else(cond == 5, "SRC", "ORC"))
# If `cond` is `7`, then the item is an SRC, else an ORC.
fp_rco <- read_delim("./fp-rco.ixs", delim = ",", col_names = TRUE) %>%
mutate(pos = "OBJ",
ext = if_else(cond == 7, "SRC", "ORC"))
Combine the fp_rcs
and fp_rco
data frames into a single data frame with dplyr::bind_rows()
:
fp <- bind_rows(fp_rcs, fp_rco)
fp
# A tibble: 4,151 x 8
seq subj item cond region datum pos ext
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 26 1 1 5 1 318 SUB SRC
2 26 1 1 5 2 311 SUB SRC
3 26 1 1 5 3 488 SUB SRC
4 26 1 1 5 4 350 SUB SRC
5 26 1 1 5 5 171 SUB SRC
6 26 1 1 5 6 356 SUB SRC
7 26 1 1 5 7 696 SUB SRC
8 27 1 2 6 1 470 SUB ORC
9 27 1 2 6 2 133 SUB ORC
10 27 1 2 6 3 405 SUB ORC
# … with 4,141 more rows
For manipulations that are independent of the SUB and OBJ items, it is more efficient to perform them on a combined fp
data frame than to perform them twice on separate data frames:
cond2
: identify whether a row is from a SUB-SRC, SUB-ORC, OBJ-SRC, or OBJ-ORC itemregion2
: identify whether a row is the intro, msub, RC, spillover, mverb, mobj, or final region of an itemThe newly created pos
and ext
columns are character class vectors:
# Check the class of the `pos` column
class(fp$pos)
[1] "character"
# Check the class of the `ext` column
class(fp$ext)
[1] "character"
However, in this experiment, they should actually be factors with two levels.
Use mutate()
and base::factor()
to coerce the pos
and ext
columns into factors:
fp <- bind_rows(fp_rcs, fp_rco) %>%
# Coerce `pos` and `ext` columns into factors with the specified levels
mutate(pos = factor(pos, levels = c("SUB", "OBJ")),
ext = factor(ext, levels = c("SRC", "ORC")))
# Check the class of the `pos` column
class(fp$pos)
[1] "factor"
# Check the class of the `ext` column
class(fp$ext)
[1] "factor"
cond2
The cond2
column identifies whether a row is from a SUB-SRC, SUB-ORC, OBJ-SRC, or OBJ-ORC item.
Concatenate the pos
and ext
columns with stringr::str_c()
, and coerce cond2
into a factor for analysis:
fp <- bind_rows(fp_rcs, fp_rco) %>%
mutate(pos = factor(pos, levels = c("SUB", "OBJ")),
ext = factor(ext, levels = c("SRC", "ORC")),
# Create `cond2` column
cond2 = str_c(pos, ext, sep = "-"),
# Coerce `cond2` into a factor
cond2 = factor(cond2, levels = c("SUB-SRC", "SUB-ORC", "OBJ-SRC", "OBJ-ORC")))
region2
The region2
column identifies whether a row is the intro, msub, RC, spillover, mverb, mobj, or final region of an item.
The value of a row’s region2
column can be determined by the value of the region
and pos
columns, following the region orders.
region == 1
, then region2 == "intro"
region == 2
, then region2 == "msub"
region == 3
and pos == "SUB"
, then region2 == "RC"
region == 3
and pos == "OBJ"
, then region2 == "mverb"
dplyr::case_when()
is the equivalent of nesting multiple dplyr::if_else()
statements.
Use mutate()
and case_when()
:
fp <- bind_rows(fp_rcs, fp_rco) %>%
mutate(pos = factor(pos, levels = c("SUB", "OBJ")),
ext = factor(ext, levels = c("SRC", "ORC")),
cond2 = str_c(pos, ext, sep = "-"),
cond2 = factor(cond2, levels = c("SUB-SRC", "SUB-ORC", "OBJ-SRC", "OBJ-ORC")),
# Create `region2` column
region2 = case_when(region == 1 ~ "intro",
region == 2 ~ "msub",
region == 3 & pos == "SUB" ~ "RC",
region == 3 & pos == "OBJ" ~ "mverb",
region == 4 & pos == "SUB" ~ "spillover",
region == 4 & pos == "OBJ" ~ "mobj",
region == 5 & pos == "SUB" ~ "mverb",
region == 5 & pos == "OBJ" ~ "RC",
region == 6 & pos == "SUB" ~ "mobj",
region == 6 & pos == "OBJ" ~ "spillover",
region == 7 ~ "final"))
fp
# A tibble: 4,151 x 10
seq subj item cond region datum pos ext cond2 region2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <chr>
1 26 1 1 5 1 318 SUB SRC SUB-SRC intro
2 26 1 1 5 2 311 SUB SRC SUB-SRC msub
3 26 1 1 5 3 488 SUB SRC SUB-SRC RC
4 26 1 1 5 4 350 SUB SRC SUB-SRC spillover
5 26 1 1 5 5 171 SUB SRC SUB-SRC mverb
6 26 1 1 5 6 356 SUB SRC SUB-SRC mobj
7 26 1 1 5 7 696 SUB SRC SUB-SRC final
8 27 1 2 6 1 470 SUB ORC SUB-ORC intro
9 27 1 2 6 2 133 SUB ORC SUB-ORC msub
10 27 1 2 6 3 405 SUB ORC SUB-ORC RC
# … with 4,141 more rows
This section provides examples of using dplyr functions to analyze first pass time data.
The
.ixs
files contain missing values in thedatum
column, so summary functions should include anna.rm = TRUE
argument.
Winsorization is a method of reducing the effect of outliers by replacing extreme values with less extreme values.
Install and load the psych
package for the the psych::winsor()
function:
install.packages("psych")
library(psych)
First pass times should be winsorized by experimental condition and region, because the different conditons and regions differ in length and difficulty.
Use group_by()
, mutate()
, and winsor()
to winsorize by group:
# Winsorize first pass times by condition and region
fp <- fp %>%
group_by(cond2, region2) %>%
mutate(win = winsor(datum, trim = 0.1, na.rm = TRUE)) %>%
ungroup()
fp
# A tibble: 4,151 x 11
seq subj item cond region datum pos ext cond2 region2 win
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <chr> <dbl>
1 26 1 1 5 1 318 SUB SRC SUB-SRC intro 318
2 26 1 1 5 2 311 SUB SRC SUB-SRC msub 311
3 26 1 1 5 3 488 SUB SRC SUB-SRC RC 488
4 26 1 1 5 4 350 SUB SRC SUB-SRC spillover 350
5 26 1 1 5 5 171 SUB SRC SUB-SRC mverb 180.
6 26 1 1 5 6 356 SUB SRC SUB-SRC mobj 356
7 26 1 1 5 7 696 SUB SRC SUB-SRC final 696
8 27 1 2 6 1 470 SUB ORC SUB-ORC intro 470
9 27 1 2 6 2 133 SUB ORC SUB-ORC msub 189
10 27 1 2 6 3 405 SUB ORC SUB-ORC RC 405
# … with 4,141 more rows
The RC
experiment looks at the effects of pos
and ext
on relative clause processing difficulty. Increased processing difficulty is associated with longer first pass time durations.
We should look at the average first pass time (with standard errors) for each condition and region group.
Install and load the plotrix()
package for the the plotrix::std.error()
function:
install.packages("plotrix")
library(plotrix)
Use group_by()
, summarize()
, and std.error()
to calculate mean first pass times with standard errors:
# Calculate mean first pass times by condition and region
fp %>%
group_by(region2, cond2) %>%
summarize(avg = mean(win, na.rm = T), ste = std.error(win, na.rm = T))
# A tibble: 28 x 4
# Groups: region2 [7]
region2 cond2 avg ste
<chr> <fct> <dbl> <dbl>
1 final SUB-SRC 585. 26.7
2 final SUB-ORC 580. 23.2
3 final OBJ-SRC 545. 22.1
4 final OBJ-ORC 545. 22.2
5 intro SUB-SRC 523. 19.7
6 intro SUB-ORC 567. 20.5
7 intro OBJ-SRC 522. 21.0
8 intro OBJ-ORC 526. 19.7
9 mobj SUB-SRC 308. 8.86
10 mobj SUB-ORC 312. 10.0
# … with 18 more rows
ggplot2 is a tidyverse package for creating graphics. This section uses ggplot2 to provide examples of tidyverse data visualization, but is not intended to be an introduction to ggplot2. To learn more about data visualization, read Chapter 3 “Data visualization” of R for Data Science, by Garrett Grolemund and Hadley Wickham.
The RC
experiment has a 2x2 factorial design. Line graphs are good for visualizing 2x2 factorials.
To learn more about factorial design visualization, see Chapter 10.2 “Interpreting main effects and interactions” of Answering Questions with Data by Matthew Crump.
Create a line graph for the most important region, the critical RC region:
# Pick out first pass data from the RC region only, and drop rows with missing values
fp_RC <- fp %>%
filter(region2 == "RC") %>%
drop_na()
# Create a line graph for `fp_RC` / the RC region
ggplot(fp_RC, aes(x = pos, y = win, group = ext, color = ext)) +
labs(title = "First Pass Times", x = "Matrix position (pos)",
y = "Winsorized fixation time (ms)", color = "Extraction type (ext)",
subtitle = "RC region") +
stat_summary(fun.y = mean, geom = "line") +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.25) +
theme(legend.position = "bottom", legend.box = "horizontal",
legend.background = element_rect(fill = "grey90"),
strip.background = element_rect(fill="grey90"))
In the RC
experiment, the RC region is the most important. However, visualizing all regions can be helpful in interpreting main effects and interactions.
You can use ggplot2::facet_wrap()
to create a facet, or subplot, for each region and display all facets in a single row.
There are two region orders, so there should be two facet orders:
Create facets for all regions, following the SUB item region order:
# Create a copy of `fp` that coerces the `region2` column into a factor with levels.
# Specifying the levels sets the facet order.
fp_sub <- fp %>%
mutate(region2 = factor(region2, levels = c("intro", "msub", "RC", "spillover", "mverb", "mobj", "final"))) %>%
drop_na()
# SUB item region order
ggplot(fp_sub, aes(x = pos, y = win, group = ext, color = ext)) +
labs(title = "First Pass Times", x = "Matrix position (pos)",
y = "Fixation time (ms), winsorized", color = "Extraction type (ext)",
subtitle = "SUB item region order") +
stat_summary(fun.y = mean, geom = "line") +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.25) +
theme(legend.position = "bottom", legend.box = "horizontal",
legend.background = element_rect(fill = "grey90"),
strip.background = element_rect(fill="grey90")) +
# Create facet for each region and display all facets in a single row
facet_wrap( ~ region2, nrow = 1)
Create facets for all regions, following the OBJ item region order:
# Create a copy of `fp` that coerces the `region2` column into a factor with levels.
# Specifying the levels sets the facet order.
fp_obj <- fp %>%
mutate(region2 = factor(region2, levels = c("intro", "msub", "mverb", "mobj", "RC", "spillover", "final"))) %>%
drop_na()
# OBJ item region order
ggplot(fp_obj, aes(x = pos, y = win, group = ext, color = ext)) +
labs(title = "First Pass Times", x = "Matrix position (pos)",
y = "Fixation time (ms), winsorized", color = "Extraction type (ext)",
subtitle = "OBJ item region order") +
stat_summary(fun.y = mean, geom = "line") +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.25) +
theme(legend.position = "bottom", legend.box = "horizontal",
legend.background = element_rect(fill = "grey90"),
strip.background = element_rect(fill="grey90")) +
# Create facet for each region and display all facets in a single row
facet_wrap( ~ region2, nrow = 1)
You can define a function in R with the following syntax:
# Define `function_name()`
function_name <- function(argument1, argument2, ...){
# some computation involving the arguments
}
# Call `function_name()`
function_name(argument1, argument2)
User-defined functions (UDFs) allow you to reuse code easily. Writing and calling a UDF is more powerful than simply copy-and-pasting code for several reasons, including :
To learn more about user-defined functions, read Chapter 19 “Functions” of R for Data Science, by Garrett Grolemund and Hadley Wickham.
This section provides an example of defining a function to make data manipulation easier.
combine_sub_obj()
In Section 3 Data manipulation, we read in the fp-rcs.ixs
and fp-rco.ixs
files and performed various manipulations to create the first pass times fp
data frame:
#### First pass times
# Read in `fp-rcs.ixs`
fp_rcs <- read_delim("./fp-rcs.ixs", delim = ",", col_names = TRUE) %>%
mutate(pos = "SUB",
ext = if_else(cond == 5, "SRC", "ORC"))
# Read in `fp-rco.ixs`
fp_rco <- read_delim("./fp-rco.ixs", delim = ",", col_names = TRUE) %>%
mutate(pos = "OBJ",
ext = if_else(cond == 7, "SRC", "ORC"))
# Combine `fp_rcs` and `fp_rco` data frames
fp <- bind_rows(fp_rcs, fp_rco) %>%
mutate(pos = factor(pos, levels = c("SUB", "OBJ")),
ext = factor(ext, levels = c("SRC", "ORC")),
cond2 = str_c(pos, ext, sep = "-"),
cond2 = factor(cond2, levels = c("SUB-SRC", "SUB-ORC", "OBJ-SRC", "OBJ-ORC")),
region2 = case_when(region == 1 ~ "intro",
region == 2 ~ "msub",
region == 3 & pos == "SUB" ~ "RC",
region == 3 & pos == "OBJ" ~ "mverb",
region == 4 & pos == "SUB" ~ "spillover",
region == 4 & pos == "OBJ" ~ "mobj",
region == 5 & pos == "SUB" ~ "mverb",
region == 5 & pos == "OBJ" ~ "RC",
region == 6 & pos == "SUB" ~ "mobj",
region == 6 & pos == "OBJ" ~ "spillover",
region == 7 ~ "final"))
First pass times are just one eye tracking metric. EyeDry calculates other eye tracking metrics, like right-bounded times and second pass times.
The data for each eye tracking metric is contained in a pair of <metric>-rcs.ixs
and <metrix>-rco.ixs
files that must be read in and transformed like the first pass times data:
rb-rcs.ixs
: right-bounded times data for SUB itemsrb-rco.ixs
: right-bounded times data for OBJ itemssp-rcs.ixs
: second pass times data for SUB itemssp-rco.ixs
: second pass times data for OBJ itemsIt would be redundant and error-prone to copy-and-paste the dplyr manipulations for every eye tracking metric.
Instead, define a function like combine_sub_obj()
:
# `combine_sub_obj()`: Read in, combine, and manipulate the `.ixs` files
combine_sub_obj <- function(sub, obj){
rcs <- read_delim(sub, delim = ",", col_names = TRUE) %>%
mutate(pos = "SUB",
ext = if_else(cond == 5, "SRC", "ORC"))
rco <- read_delim(obj, delim = ",", col_names = TRUE) %>%
mutate(pos = "OBJ",
ext = if_else(cond == 7, "SRC", "ORC"))
bind_rows(rcs, rco) %>%
mutate(pos = factor(pos, levels = c("SUB", "OBJ")),
ext = factor(ext, levels = c("SRC", "ORC")),
cond2 = str_c(pos, ext, sep = "-"),
cond2 = factor(cond2, levels = c("SUB-SRC", "SUB-ORC", "OBJ-SRC", "OBJ-ORC")),
region2 = case_when(region == 1 ~ "intro",
region == 2 ~ "msub",
region == 3 & pos == "SUB" ~ "RC",
region == 3 & pos == "OBJ" ~ "mverb",
region == 4 & pos == "SUB" ~ "spillover",
region == 4 & pos == "OBJ" ~ "mobj",
region == 5 & pos == "SUB" ~ "mverb",
region == 5 & pos == "OBJ" ~ "RC",
region == 6 & pos == "SUB" ~ "mobj",
region == 6 & pos == "OBJ" ~ "spillover",
region == 7 ~ "final"))
}
Create right-bounded times data frame:
# Right-bounded times
rb <- combine_sub_obj("./rb-rcs.ixs", "./rb-rco.ixs")
rb
# A tibble: 4,151 x 10
seq subj item cond region datum pos ext cond2 region2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <chr>
1 26 1 1 5 1 318 SUB SRC SUB-SRC intro
2 26 1 1 5 2 311 SUB SRC SUB-SRC msub
3 26 1 1 5 3 488 SUB SRC SUB-SRC RC
4 26 1 1 5 4 350 SUB SRC SUB-SRC spillover
5 26 1 1 5 5 171 SUB SRC SUB-SRC mverb
6 26 1 1 5 6 356 SUB SRC SUB-SRC mobj
7 26 1 1 5 7 1271 SUB SRC SUB-SRC final
8 27 1 2 6 1 470 SUB ORC SUB-ORC intro
9 27 1 2 6 2 133 SUB ORC SUB-ORC msub
10 27 1 2 6 3 405 SUB ORC SUB-ORC RC
# … with 4,141 more rows
Create second pass times data frame:
# Second pass times
sp <- combine_sub_obj("./sp-rcs.ixs", "./sp-rco.ixs")
sp
# A tibble: 4,151 x 10
seq subj item cond region datum pos ext cond2 region2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <chr>
1 26 1 1 5 1 0 SUB SRC SUB-SRC intro
2 26 1 1 5 2 0 SUB SRC SUB-SRC msub
3 26 1 1 5 3 456 SUB SRC SUB-SRC RC
4 26 1 1 5 4 436 SUB SRC SUB-SRC spillover
5 26 1 1 5 5 466 SUB SRC SUB-SRC mverb
6 26 1 1 5 6 373 SUB SRC SUB-SRC mobj
7 26 1 1 5 7 0 SUB SRC SUB-SRC final
8 27 1 2 6 1 0 SUB ORC SUB-ORC intro
9 27 1 2 6 2 249 SUB ORC SUB-ORC msub
10 27 1 2 6 3 944 SUB ORC SUB-ORC RC
# … with 4,141 more rows
This section provides examples of defining a function to make data analysis easier.
win_datum()
Section 4.1 Winsorizing described winsorizing first pass data:
fp <- fp %>%
group_by(cond2, region2) %>%
mutate(win = winsor(datum, trim = 0.1, na.rm = TRUE)) %>%
ungroup()
Define a function like win_datum()
to automate this process:
# `win_datum()`: Winsorize the `datum` column by condition and region
win_datum <- function(df){df %>%
group_by(cond2, region2) %>%
mutate(win = winsor(datum, trim = 0.1, na.rm = TRUE)) %>%
ungroup()
}
Winsorize right-bounded times:
# Winsorize right-bounded times by condition and region
rb <- rb %>%
win_datum()
rb
# A tibble: 4,151 x 11
seq subj item cond region datum pos ext cond2 region2 win
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <chr> <dbl>
1 26 1 1 5 1 318 SUB SRC SUB-SRC intro 318
2 26 1 1 5 2 311 SUB SRC SUB-SRC msub 311
3 26 1 1 5 3 488 SUB SRC SUB-SRC RC 488
4 26 1 1 5 4 350 SUB SRC SUB-SRC spillover 350
5 26 1 1 5 5 171 SUB SRC SUB-SRC mverb 187.
6 26 1 1 5 6 356 SUB SRC SUB-SRC mobj 356
7 26 1 1 5 7 1271 SUB SRC SUB-SRC final 1271
8 27 1 2 6 1 470 SUB ORC SUB-ORC intro 470
9 27 1 2 6 2 133 SUB ORC SUB-ORC msub 189
10 27 1 2 6 3 405 SUB ORC SUB-ORC RC 606.
# … with 4,141 more rows
Ssecond pass times are generally not winsorized, so do not call win_datum()
on the sp
data frame.
mean_summary()
Section 4.2 Means and standard errors described calculating mean first pass times:
fp %>%
group_by(region2, cond2) %>%
summarize(avg = mean(win, na.rm = T), ste = std.error(win, na.rm = T))
# A tibble: 28 x 4
# Groups: region2 [7]
region2 cond2 avg ste
<chr> <fct> <dbl> <dbl>
1 final SUB-SRC 585. 26.7
2 final SUB-ORC 580. 23.2
3 final OBJ-SRC 545. 22.1
4 final OBJ-ORC 545. 22.2
5 intro SUB-SRC 523. 19.7
6 intro SUB-ORC 567. 20.5
7 intro OBJ-SRC 522. 21.0
8 intro OBJ-ORC 526. 19.7
9 mobj SUB-SRC 308. 8.86
10 mobj SUB-ORC 312. 10.0
# … with 18 more rows
Define a function like mean_summary()
to automate this process:
# `mean_summary()`: Calculate the average `column` value by condition and region
mean_summary <- function(df, column){
column <- enquo(column)
df %>%
group_by(cond2, region2) %>%
summarize(avg = mean(!!column, na.rm = T), ste = std.error(!!column, na.rm = T)) %>%
ungroup()
}
mean_summary()
takes 2 arguments:
df
: the data framecolumn
: the value to calculate the mean of(datum
for unwinsorized data, win
for winsorized data)Calculate the mean right-bounded times:
# Average right-bounded times (winsorized)
rb %>%
mean_summary(win)
# A tibble: 28 x 4
cond2 region2 avg ste
<fct> <chr> <dbl> <dbl>
1 SUB-SRC final 838. 32.8
2 SUB-SRC intro 531. 19.8
3 SUB-SRC mobj 343. 11.2
4 SUB-SRC msub 340. 11.5
5 SUB-SRC mverb 341. 11.6
6 SUB-SRC RC 833. 21.6
7 SUB-SRC spillover 528. 17.9
8 SUB-ORC final 810 32.2
9 SUB-ORC intro 570. 20.4
10 SUB-ORC mobj 375. 14.8
# … with 18 more rows
Calculate the mean second pass times:
# Average second past times (not winsorized)
sp %>%
mean_summary(datum)
# A tibble: 28 x 4
cond2 region2 avg ste
<fct> <chr> <dbl> <dbl>
1 SUB-SRC final 31.3 15.2
2 SUB-SRC intro 201. 29.3
3 SUB-SRC mobj 233. 26.1
4 SUB-SRC msub 280. 28.0
5 SUB-SRC mverb 326. 38.4
6 SUB-SRC RC 769. 78.1
7 SUB-SRC spillover 454. 50.8
8 SUB-ORC final 33.0 13.6
9 SUB-ORC intro 238. 36.6
10 SUB-ORC mobj 286. 35.1
# … with 18 more rows
This section provides examples of defining a function to make data visualization easier.
linegraph_sub()
Section 5.2.1 SUB item facet order described graphing first pass times, by SUB item region order:
ggplot(fp_sub, aes(x = pos, y = win, group = ext, color = ext)) +
labs(title = "First Pass Times", x = "Matrix position (pos)",
y = "Fixation time (ms), winsorized", color = "Extraction type (ext)",
subtitle = "SUB item region order") +
stat_summary(fun.y = mean, geom = "line") +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.25) +
theme(legend.position = "bottom", legend.box = "horizontal",
legend.background = element_rect(fill = "grey90"),
strip.background = element_rect(fill="grey90")) +
facet_wrap( ~ region2, nrow = 1)
Define a function like linegraph_sub()
to automate this process:
# `linegraph_sub()`: Create facets for all regions, following the SUB item order
linegraph_sub <- function(df, y.value, y.label, add.title){
y.value <- enquo(y.value)
df <- df %>%
mutate(region2 = factor(region2, levels = c("intro", "msub", "RC", "spillover", "mverb", "mobj", "final"))) %>%
drop_na()
ggplot(df, aes(x = pos, y = !!y.value, group = ext, color = ext)) +
labs(title = add.title, x = "Matrix position (pos)",
y = y.label, color = "Extraction type (ext)",
subtitle = "SUB Condition Order") +
stat_summary(fun.y = mean, geom = "line") +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.25) +
theme(legend.position = "bottom", legend.box = "horizontal",
legend.background = element_rect(fill = "grey90"),
strip.background = element_rect(fill="grey90")) +
facet_wrap( ~ region2, nrow = 1)
}
linegraph_sub()
takes 4 arguments:
df
: the data framey.value
: the y-value of the graph (datum
for unwinsorized data, win
for winsorized data)y.label
: the label for the y-valueadd.title
: the title of the graphGraph right-bounded times:
# Graph of right-bounded times for all regions, SUB item order
linegraph_sub(rb, win, "Fixation time (ms), winsorized", "Right-bounded Times")
Graph second pass times:
# Graph of second pass times for all regions, SUB item order
linegraph_sub(sp, datum, "Fixation time (ms)", "Second Pass Times")
linegraph_obj()
Section 5.2.2 OBJ item facet order described graphing first pass times, by OBJ item region order:
ggplot(fp_obj, aes(x = pos, y = win, group = ext, color = ext)) +
labs(title = "First Pass Times", x = "Matrix position (pos)",
y = "Fixation time (ms), winsorized", color = "Extraction type (ext)",
subtitle = "OBJ item region order") +
stat_summary(fun.y = mean, geom = "line") +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.25) +
theme(legend.position = "bottom", legend.box = "horizontal",
legend.background = element_rect(fill = "grey90"),
strip.background = element_rect(fill="grey90")) +
facet_wrap( ~ region2, nrow = 1)
Define a function like linegraph_obj()
to automate this process:
# `linegraph_obj()`: Create facets for all regions, following the OBJ item order
linegraph_obj <- function(df, y.value, y.label, add.title){
y.value <- enquo(y.value)
df <- df %>%
mutate(region2 = factor(region2, levels = c("intro", "msub", "mverb", "mobj", "RC", "spillover", "final"))) %>%
drop_na()
ggplot(df, aes(x = pos, y = !!y.value, group = ext, color = ext)) +
labs(title = add.title, x = "Matrix position (pos)",
y = y.label, color = "Extraction type (ext)",
subtitle = "OBJ Condition Order") +
stat_summary(fun.y = mean, geom = "line") +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.25) +
theme(legend.position = "bottom", legend.box = "horizontal",
legend.background = element_rect(fill = "grey90"),
strip.background = element_rect(fill="grey90")) +
facet_wrap( ~ region2, nrow = 1)
}
linegraph_obj()
takes 4 arguments:
df
: the data framey.value
: the y-value of the graph (datum
for unwinsorized data, win
for winsorized data)y.label
: the label for the y-valueadd.title
: the title of the graphGraph right-bounded times:
# Graph of right-bounded times for all regions, OBJ item order
linegraph_obj(rb, win, "Fixation time (ms), winsorized", "Right-bounded Times")
Graph second pass times:
# Graph of second pass times for all regions, OBJ item order
linegraph_obj(sp, datum, "Fixation time (ms)", "Second Pass Times")