Examining data
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Collecting data
We’ll run the experiment to collect data that we can examine.
- Uncomment the
DebugOff
command, since we are now ready to collect data: - Delete any files that already exist in the experiment project page’s Results folder.
- Run the experiment at least twice.
- Save the
results
file as a CSV file namedresults.csv
.
@// Type code below this line.
@
@// Remove command prefix
@PennController.ResetPrefix(null)
@
@// Turn off debugger
!DebugOff()
@
@// Control trial sequence
*Sequence("consent", "instructions", randomize("experimental-trial"), "completion_screen")
@
@// Instructions
@// code omitted in the interest of space
Reading in results
This section assumes prior knowledge of R.
We’ll use this sample results file, results.csv
.
- Add one of the following code blocks to your R script to:
- Create a user-defined function that reads in a PennController results file in CSV format.
- Read in
results.csv
and save it as a data frame namedresults
.
- Make sure that your R script and
results.csv
are in the same folder.
Click for base R version
# Set working directory to source file location
# User-defined function to read in PCIbex Farm results files
read.pcibex <- function(filepath, auto.colnames=TRUE, fun.col=function(col,cols){cols[cols==col]<-paste(col,"Ibex",sep=".");return(cols)}) {
n.cols <- max(count.fields(filepath,sep=",",quote=NULL),na.rm=TRUE)
if (auto.colnames){
cols <- c()
con <- file(filepath, "r")
while ( TRUE ) {
line <- readLines(con, n = 1, warn=FALSE)
if ( length(line) == 0) {
break
}
m <- regmatches(line,regexec("^# (\\d+)\\. (.+)\\.$",line))[[1]]
if (length(m) == 3) {
index <- as.numeric(m[2])
value <- m[3]
if (index < length(cols)){
cols <- c()
}
if (is.function(fun.col)){
cols <- fun.col(value,cols)
}
cols[index] <- value
if (index == n.cols){
break
}
}
}
close(con)
return(read.csv(filepath, comment.char="#", header=FALSE, col.names=cols))
}
else{
return(read.csv(filepath, comment.char="#", header=FALSE, col.names=seq(1:n.cols)))
}
}
# Read in results file
results <- read.pcibex("results.csv")
Click for tidyverse version
# Set working directory to source file location
# Load tidyverse package
library (tidyverse)
# User-defined function to read in PCIbex Farm results files
read.pcibex <- function(filepath, auto.colnames=TRUE, fun.col=function(col,cols){cols[cols==col]<-paste(col,"Ibex",sep=".");return(cols)}) {
n.cols <- max(count.fields(filepath,sep=",",quote=NULL),na.rm=TRUE)
if (auto.colnames){
cols <- c()
con <- file(filepath, "r")
while ( TRUE ) {
line <- readLines(con, n = 1, warn=FALSE)
if ( length(line) == 0) {
break
}
m <- regmatches(line,regexec("^# (\\d+)\\. (.+)\\.$",line))[[1]]
if (length(m) == 3) {
index <- as.numeric(m[2])
value <- m[3]
if (index < length(cols)){
cols <- c()
}
if (is.function(fun.col)){
cols <- fun.col(value,cols)
}
cols[index] <- value
if (index == n.cols){
break
}
}
}
close(con)
return(read_csv(filepath, comment="#", col_names=cols))
}
else{
return(read_csv(filepath, comment="#", col_names=seq(1:n.cols)))
}
}
# Read in results file
results <- read.pcibex("results.csv")
If you’re using the tidyverse, you may see an error message like the following when you create the results
tibble:
Warning: 8 parsing failures.
row col expected actual file
1 -- 17 columns 13 columns 'results.csv'
2 -- 17 columns 13 columns 'results.csv'
3 -- 17 columns 13 columns 'results.csv'
4 -- 17 columns 13 columns 'results.csv'
21 -- 17 columns 13 columns 'results.csv'
... ... .......... .......... .............
See problems(...) for more details.
Don’t worry! You can ignore this message. The readr::read_csv()
function throws a warning because some rows have different number of columns:
- The rows that log the
"consent"
and"instructions"
trials have the default 13 columns. - The rows that log the
"experimental-trial"
trials have the default 13 columns plus 4 columns added by thelog
method (group
,item
,condition
andID
).
If you’re using base R, the pre-installed utils::read.csv()
function won’t throw such a warning.
Tidying and analyzing data (optional)
This section uses the tidyverse to transform and analyze data (prior knowledge of the tidyverse assumed). The code blocks in this section are suggestions that can be modified as desired.
If you’re using base R, you can skip ahead to Wrapping up.
Tidyverse functions are designed to work with tidy data, meaning that:
- Each variable must have its own column.
- Each observation must have its own row.
- Each value must have its own cell.
The results
tibble is not tidy, because every "experimental-trial"
trial is split into 4 rows:
- Trial start
- Information logged from the
"side-by-side"
Canvas
- Information logged from the
"selection"
Selector - Trial end
Tidy the results
tibble:
Click for more details
Add the following code block to your R script:
- Keep only rows that log information about the
"side-by-side"
Canvas
or"selection"
Selector. - Keep only the
ID
,group
,item
,condition
,PennElementName
,Value
, andEventTime
columns. - Group by the
ID
anditem
variables. - Create the
event
andselection
columns, and coerce theEventTime
column from a character vector to a double vector. - Drop the
PennElementName
andValue
columns (necessary forpivot_wider()
). - “Widen” the tibble. For a more in-depth explanation, see Transforming data in R.
- Save the tidied data as a new tibble named
tidied_results
.
tidied_results <- results %>%
filter(PennElementName == "side-by-side" | PennElementName == "selection") %>%
select(ID, group, item, condition, PennElementName, Value, EventTime) %>%
group_by(ID, item) %>%
mutate(event = case_when(PennElementName == "side-by-side" ~ "canvas_time",
PennElementName == "selection" ~ "selection_time"),
selection = case_when("singular" %in% Value ~ "singular",
"plural" %in% Value ~ "plural",
FALSE ~ NA_character_),
EventTime = if_else(EventTime == "Never", NA_real_, suppressWarnings(as.numeric(EventTime)))) %>%
ungroup() %>%
select(-PennElementName, -Value) %>%
pivot_wider(names_from = event, values_from = EventTime)
Note: You may need to scroll to the right to see all the columns.
ID | group | item | condition | selection | canvas_time | selection_time |
---|---|---|---|---|---|---|
SOME_ID | B | 4 | plural | plural | 1603397451156 | 1603397453036 |
SOME_ID | B | 2 | plural | NA | 1603397454060 | NA |
SOME_ID | B | 3 | singular | singular | 1603397457722 | 1603397459321 |
SOME_ID | B | 1 | singular | singular | 1603397460332 | 1603397461856 |
ANOTHER_ID | B | 1 | singular | singular | 1603398704462 | 1603398706007 |
ANOTHER_ID | B | 4 | plural | plural | 1603398707019 | 1603398708549 |
ANOTHER_ID | B | 3 | singular | plural | 1603398709562 | 1603398711692 |
ANOTHER_ID | B | 2 | plural | plural | 1603398712705 | 1603398714189 |
You can analyze the tidied data in a variety of ways, for example:
- Calculate reaction times and response accuracy.
- Calculate average reaction time by condition.
- Calculate average response accuracy by participant.
Click for more details
- Calculate reaction times and response accuracy:
- Create the
reaction_time
column by subtracting thetrial_start
value from thecanvas_time
value. The resulting value is how long it took a participant to select an image once the images were printed to the screen. - Create the
correct
column by comparing thecondition
andselection
columns. The resulting value is1
if the particpant selected the correct image, and0
if the participant selected the wrong image.
tidied_results <- tidied_results %>% mutate(reaction_time = selection_time - canvas_time, correct = if_else(condition == selection, 1, 0))
Result:
Note: You may need to scroll to the right to see all the columns.
ID group item condition selection canvas_time selection_time reaction_time correct SOME_ID B 4 plural plural 1603397451156 1603397453036 1880 1 SOME_ID B 2 plural NA 1603397454060 NA NA NA SOME_ID B 3 singular singular 1603397457722 1603397459321 1599 1 SOME_ID B 1 singular singular 1603397460332 1603397461856 1524 1 ANOTHER_ID B 1 singular singular 1603398704462 1603398706007 1545 1 ANOTHER_ID B 4 plural plural 1603398707019 1603398708549 1530 1 ANOTHER_ID B 3 singular plural 1603398709562 1603398711692 2130 0 ANOTHER_ID B 2 plural plural 1603398712705 1603398714189 1484 1
- Create the
-
Calculate the average reaction time by condition:
tidied_results %>% group_by(condition) %>% summarize(avg_rt = mean(reaction_time, na.rm = TRUE), n = sum(!is.na(reaction_time)))
Result:
condition avg_rt n plural 1631. 3 singular 1700. 4 n
is the number of items with a reaction time for a given condition, meaning that 1 item in theplural
condition did not have a response.
-
Calculate average response accuracy by participant:
tidied_results %>% group_by(ID) %>% summarize(accuracy = sum(correct, na.rm = TRUE) / sum(!is.na(correct)), answered = sum(!is.na(correct)) / n())
Result:
ID accuracy answered ANOTHER_ID 0.75 1 SOME_ID 1 0.75 - The
ANOTHER_ID
participant had 75% accuracy and 100% completeness, meaning that they responded correctly to 3 out of 4 items. - The
SOME_ID
participant had 100% accuracy and 75% completeness, meaning that they responded correctly to 3 out of 3 items, and did not respond to 1 item.
- The