How optimistic are you? Are you more or less optimistic than you were when you were 16? To study this, 50 subjects were given an optimism test at age 16. These same subjects were give the same test at ages 20 and 24. A total of 18 subjects completed the survey at all 3 ages. The data for these 18 subjects is displayed below:
# Load optimism data
optimism <- read.csv("http://www.bradthiessen.com/html5/data/opt.csv")
## Change subject and age to factor variables
optimism$age <-as.factor(optimism$age)
optimism$subject <-as.factor(optimism$subject)
## Summary of data
favstats(optimism ~ age, data=optimism)
## age min Q1 median Q3 max mean sd n missing
## 1 16 13 24.00 27 30.50 35 25.88889 6.578833 18 0
## 2 20 13 25.25 29 31.75 39 26.94444 7.487026 18 0
## 3 24 12 19.00 24 27.75 32 23.38889 5.922429 18 0
## Some summary plots
densityplot(~optimism | age, data=optimism, lwd=2,
main="Optimism by age", layout=c(1,3))
## Levene's test
leveneTest(optimism ~ age, data=optimism)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.1673 0.8464
## 51
Let’s pretend as though the data in this study come from 3 independent groups of subjects (and not the same subjects at 3 different ages). With this erroneous assumption, let’s conduct a one-way ANOVA:
# Summarize ("anova") the analysis ("aov")
anova(aov(optimism ~ age, data=optimism))
## Analysis of Variance Table
##
## Response: optimism
## Df Sum Sq Mean Sq F value Pr(>F)
## age 2 120.04 60.019 1.3396 0.271
## Residuals 51 2285.00 44.804
# Post-hoc tests with Holm's method
with(optimism, pairwise.t.test(optimism, age, p.adjust.method="holm"))
##
## Pairwise comparisons using t tests with pooled SD
##
## data: optimism and age
##
## 16 20
## 20 0.64 -
## 24 0.54 0.35
##
## P value adjustment method: holm
The data in this study did not come from 3 independent groups. The scores are from the same 18 subjects measured 3 different times (at ages 16, 20, and 24). This is called a repeated measures design.
Before we conduct an AxS (repeated measures) ANOVA, let’s calculate the mean optimism score for each of the 18 subjects in this study.
# Mean optimism for each subject
mean(optimism ~ subject, data=optimism)
## 1 2 3 4 5 6 7 8
## 35.33333 32.66667 31.00000 31.00000 30.00000 29.33333 29.00000 27.66667
## 9 10 11 12 13 14 15 16
## 27.33333 26.33333 25.66667 25.33333 24.00000 22.66667 17.00000 16.33333
## 17 18
## 13.66667 13.00000
We can display those means alongside our data.
The age effect is what we’re interested in estimating. When we analyzed the data with a one-way ANOVA, we treated all other sources of variation as unexplained error variance.
In an AxS design, we can partition this unexplained error variance further (by explaining some of it as individual subject differences).
We know some of the variation in optimism within and across the age groups is due to pre-existing individual differences. For example, subject #3 tends to be more optimistic than subject #18. If we can estimate these subject-effects, we can remove them from our unexplained error variance.
After estimating the age and subject effects, the only remaining variation would be due to the interaction between a subject and his or her age. Some people will become more optimistic over time, some will become less optimistic, and others will have other paths. This variation is still considered unexplained, because it includes other potential factors, too.
Let’s take a look at the paths of our subjects across the age groups:
# Profile plots (optimism by subject at each age)
interaction.plot(x.factor=optimism$age, trace.factor=optimism$subject,
response=optimism$optimism,
fun=mean, type="b", legend=F,
ylab = "Mean optimism score", xlab = "Age group")
# This plot looks a little nicer. It uses the ggplot2 package
library(ggplot2)
# Here's the syntax
ggplot(data = optimism, aes(x = age, y = optimism, group = subject)) +
ggtitle("Optimism by age") +
ylab("optimism") + geom_line()
To calculate an AxS ANOVA by hand, you could use the following formulas:
We can calculate everything in R:
# AxS ANOVA
# We specify the error term as "subjects nested within age groups"
axs <- aov(optimism ~ age + Error(subject/age), data=optimism)
# We have to use "summary" (and not "anova") to get the summary table
summary(axs)
##
## Error: subject
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 17 2182 128.3
##
## Error: subject:age
## Df Sum Sq Mean Sq F value Pr(>F)
## age 2 120.0 60.02 19.75 2.03e-06 ***
## Residuals 34 103.3 3.04
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here are the same results calculated by hand:
Suppose we’re interested in the the effectiveness of statistics textbooks. If we sample 3 classrooms and randomly assign them to use one of 3 different textbooks, our data could look like this:
Let’s look at a different dataset. In Iowa, local governments can be organized in several ways (e.g., mayor and council with all officers elected at large; mayor elected at large with council members elected by district; council members elected by district and mayor elected from within the council; variations of these organizational types with and without a hired city manager; etc.).
A political scientist wanted to investigate whether some organizations of local governments result in more content citizens than others. He identified 4 types of government (A1, A2, A3, A4) and interviewed citizens regarding their satisfaction with the local government’s responsiveness to the needs and concerns of citizens.
5 communities were identified within each type of government organization. Within each community, 50 heads of households were randomly selected for interviews. The data are summarized below:
This Groups within Treatments or (GwT ANOVA) design requires two separate analyses. First, we will begin under the hope that the 860 individuals in the study are independent observations. With this hope, we’ll test to see if we find significant dependencies within each community. If we find significant dependencies, we’ll need to re-run the analysis with only 20 independent observations.
Here are the formulas we’d use to run the first minor unit analysis:
Our data would produce the following summary table. Note that weighted averages are used in all calculations (averages weighted by the number of individuals in each community):
If that mean square ratio were not statistically significant, we could treat all 860 individuals as independent observations and run a one-way ANOVA (with SSE = SSw + SSgwa).
Since the mean square ratio in this scenario produces a p-value less than 0.05, we may conclude that individuals within communities are not independent. Then we’d have to run a major units analysis using unweighted means (ignoring the sample size within each community):
Here are the results from this dataset:
From this, we could conclude the types of governments are associated with different average levels of contentment.
When you’re ready to try the Your Turn exercises:
Sources:
This document is released under a Creative Commons Attribution-ShareAlike 3.0 Unported license.