Let’s load the Mosaic package:
require(mosaic)
Let’s first load the dataset, examine the first several rows, and generate some plots:
doctors <- read.csv(url("http://bradthiessen.com/html5/stats/m300/doctors.csv"))
head(doctors)
## weight time
## 1 average 15
## 2 average 15
## 3 average 45
## 4 average 40
## 5 average 45
## 6 average 20
favstats(time ~ weight, data=doctors)
## .group min Q1 median Q3 max mean sd n missing
## 1 average 15 25 30 40 50 31.36 9.864 33 0
## 2 obese 5 20 25 30 60 24.74 9.653 38 0
bwplot(time ~ weight, data = doctors)
xyplot(jitter(time) ~ weight, data = doctors, alpha = 0.6, cex = 1.4)
In our sample, doctors did report spending more time with non-obese patients. The observed standard deviations are similar, so we’ll probably be ok with our equal variances assumption. Let’s run a t-test:
# One-sample t-test; alternate hypothesis is less than
t.test(time ~ weight, var.equal=TRUE, alternative="greater", sig.level=0.05, data=doctors)
##
## Two Sample t-test
##
## data: time by weight
## t = 2.856, df = 69, p-value = 0.002831
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 2.758 Inf
## sample estimates:
## mean in group average mean in group obese
## 31.36 24.74
There’s our p-value of 0.002831 (which matches the Stata output from question #7 in Activity #22).
What would happen if we didn’t make the equal variances assumption?
# One-sample t-test; alternate hypothesis is less than
t.test(time ~ weight, var.equal=FALSE, alternative="greater", sig.level=0.05, data=doctors)
##
## Welch Two Sample t-test
##
## data: time by weight
## t = 2.852, df = 67.17, p-value = 0.002887
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 2.751 Inf
## sample estimates:
## mean in group average mean in group obese
## 31.36 24.74
Our p-value doesn’t change too much, but notice the degrees of freedom are now 67.174 (hopefully, I explained this in class).
We can easily get a 90% confidence interval for the difference in means:
t.test(time ~ weight, var.equal=TRUE, conf.level=0.90, data=doctors)$conf.int
## [1] 2.758 10.495
## attr(,"conf.level")
## [1] 0.9
The CI is (2.7583, 10.4953).
# Find the mean of each group
mean(time ~ weight, data = doctors)
## average obese
## 31.36 24.74
# Create our test statistic (difference in means)
test.stat <- compareMean(time ~ weight, data = doctors)
# Randomly shuffle the "weight" groups 10,000 times and calculate our test statistic
rtest.stats = do(10000) * compareMean(time ~ shuffle(weight), data=doctors)
## Loading required package: parallel
Now that we’ve run 10,000 randomizations, let’s take a look at the distribution of mean differences:
histogram(~ result, xlim=c(-12,12), groups=result <= test.stat, data=rtest.stats, width=1)
ladd(panel.abline(v=test.stat))
prop(~ (result<= test.stat), data=rtest.stats)
## TRUE
## 0.0027
That last value, 0.0031, is our p-value. It’s similar to the p-value from our t-test (which provides evidence that our assumptions we reasonable).
Let’s first load the dataset, examine the first several rows, and generate some plots:
smile1 <- read.csv(url("http://bradthiessen.com/html5/stats/m300/smile1.csv"))
head(smile1)
## smile leniency
## 1 smile 7.0
## 2 smile 3.0
## 3 smile 6.0
## 4 smile 4.5
## 5 smile 3.5
## 6 smile 4.0
favstats(leniency ~ smile, data=smile1)
## .group min Q1 median Q3 max mean sd n missing
## 1 no 2.0 3.0 4 4.875 8 4.118 1.523 34 0
## 2 smile 2.5 3.5 5 6.000 9 5.064 1.659 102 0
bwplot(leniency ~ smile, data = smile1)
xyplot(jitter(leniency) ~ smile, data = smile1, alpha = 0.6, cex = 1.4)
From the sample statistics, it looks like our equal variances assumption is reasonable. Let’s run a t-test with that assumption:
# One-sample t-test; alternate hypothesis is less than
t.test(leniency ~ smile, var.equal=TRUE, alternative="less", sig.level=0.05, data=smile1)
##
## Two Sample t-test
##
## data: leniency by smile
## t = -2.938, df = 134, p-value = 0.001946
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.4127
## sample estimates:
## mean in group no mean in group smile
## 4.118 5.064
That small p-value tells us it was highly unlikely to observe this data if the null hypothesis were true. [… but, again, would any reasonable person think this null hypothesis has any chance of being true? Why do we test against null hypotheses when we know they cannot be true? For answers to this, and other interesting questions, take STAT 301 or 305]
Let’s get a 90% confidence interval for the difference in means:
t.test(leniency ~ smile, var.equal=TRUE, conf.level=0.90, data=smile1)$conf.int
## [1] -1.4795 -0.4127
## attr(,"conf.level")
## [1] 0.9
The CI is (-1.479, -0.413).
# Find the mean of each group
mean(leniency ~ smile, data = smile1)
## no smile
## 4.118 5.064
# Create our test statistic (difference in means)
test.stat <- compareMean(leniency ~ smile, data = smile1)
# Randomly shuffle the "weight" groups 10,000 times and calculate our test statistic
smileTestStats = do(10000) * compareMean(leniency ~ shuffle(smile), data=smile1)
Now that we’ve run 10,000 randomizations, let’s take a look at the distribution of mean differences:
histogram(~ result, xlim=c(-1.5,1.5), groups=result >= test.stat, data=smileTestStats, width=.1)
ladd(panel.abline(v=test.stat))
prop(~ (result>= test.stat), data=smileTestStats)
## TRUE
## 0.0015
The p-value, 0.0019, is similar to the p-value we obtained from our t-test.
We could also use a bootstrap method to estimate a confidence interval of the difference in means (between the smile and no-smile groups). To do this, we first have the computer resample our data (with replacement) 10,000 times and calculate our test statistic (the mean of the smile group minus the mean of the no-smile group). Then, we find the 5th and 95th percentiles of our test statistic.
trials = do(10000) * compareMean(leniency~smile, data = resample(smile1))
with(trials, quantile(result, c(0.025, 0.975)))
## 2.5% 97.5%
## 0.3469 1.5420
Let’s compare that confidence interval to one we’d get from our parametric methods:
t.test(leniency ~ smile, var.equal=TRUE, conf.level=0.95, data=smile1)$conf.int
## [1] -1.5830 -0.3091
## attr(,"conf.level")
## [1] 0.95
Bootstrap Method: (0.33605, 1.52669) Parametric Method: (0.30915, 1.58301)
Let’s do a quick power analysis for our two-sample t-test for this smile/leniency dataset:
power.t.test(n = 68, delta = 0.94608, sd = 1.6, sig.level = 0.05, alternative = "one.sided", type = "two.sample")$power
## [1] 0.9629
Again, I won’t explain this method here, but I will show some results:
# Load the package
require(BEST)
## Loading required package: BEST
## Loading required package: rjags
## Loading required package: coda
## Linked to JAGS 3.3.0
## Loaded modules: basemod,bugs
## Comparison of two groups:
## =========================
smile <- c(7, 3, 6, 4.5, 3.5, 4, 3, 3, 3.5, 4.5, 7, 5, 5, 7.5, 2.5, 5, 5.5, 5.5, 5, 4, 5, 6.5, 6.5, 7, 3.5, 5, 3.5, 9, 2.5, 8.5, 3.5, 4.5, 3.5, 4.5, 2.5, 5.5, 6.5, 3.5, 3, 3.5, 6, 5, 4, 4.5, 5, 5.5, 3.5, 6, 6.5, 3, 8, 6.5, 8, 6, 6, 3, 7, 8, 4, 3, 2.5, 8, 4.5, 5.5, 7.5, 6, 9, 6.5, 5.5, 4, 4, 5, 6, 3.5, 3.5, 3.5, 4, 5.5, 5.5, 4.5, 2.5, 5.5, 4.5, 3, 3.5, 8, 5, 7.5, 8, 4, 5.5, 6.5, 5, 4, 3, 5, 4, 4, 6, 8, 4.5, 5.5)
nosmile <- c(2, 4, 4, 3, 6, 4.5, 2, 6, 3, 3, 4.5, 8, 4, 5, 3.5, 4.5, 6.5, 3.5, 4.5, 4.5, 2.5, 2.5, 4.5, 2.5, 6, 6, 2, 4, 5.5, 4, 2.5, 2.5, 3, 6.5)
# Run an analysis, takes up to 2 min.
BESTout <- BESTmcmc(smile, nosmile)
## Setting up the JAGS model...
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph Size: 291
##
## Initializing model
##
## Burning in the MCMC chain...
## Sampling final MCMC chain...
# Look at the result:
BESTout
## MCMC fit results for BEST analysis:
## 100002 simulations saved.
## mean sd median HDIlo HDIup Rhat n.eff
## mu1 5.043 0.1685 5.043 4.708 5.371 1.000 57817
## mu2 4.095 0.2743 4.094 3.563 4.642 1.000 62423
## nu 48.014 32.4502 39.656 6.031 112.729 1.001 24656
## sigma1 1.644 0.1244 1.638 1.407 1.892 1.000 52073
## sigma2 1.546 0.2083 1.526 1.162 1.963 1.000 47170
##
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## 'Rhat' is the potential scale reduction factor (at convergence, Rhat=1).
## 'n.eff' is a crude measure of effective sample size.
plot(BESTout)
plot(BESTout, "sd")
plotPostPred(BESTout)
pairs(BESTout)
The output and plots show: * There’s a 95% chance the difference in means is between 0.31 and 1.58. That’s similar to what we got from our parametric and bootstrap confidence intervals. * There’s a 95% chance the difference in standard deviations between the groups is between -0.388 and 0.553. Since that interval contains zero, it may be reasonable to assume equal variances.
As explained in the activity, this smiles/leniency dataset actually contains 4 different smile groups:
smiles2 <- read.csv(url("http://bradthiessen.com/html5/stats/m300/smiles2.csv"))
favstats(leniency ~ expression, data=smiles2)
## .group min Q1 median Q3 max mean sd n missing
## 1 felt 2.5 3.500 4.75 5.875 9 4.912 1.681 34 0
## 2 feltfalse 2.5 3.625 5.50 6.500 9 5.368 1.827 34 0
## 3 miserable 2.5 4.000 4.75 5.500 8 4.912 1.454 34 0
## 4 neutral 2.0 3.000 4.00 4.875 8 4.118 1.523 34 0
bwplot(leniency ~ expression, data = smiles2)
The sample means differ (ranging from 4.1 to 5.4), but does that give us evidence that the population means differ?
As we discussed in class, running multiple t-tests would inflate the overall alpha error. Instead, we can conduct an ANOVA (ANalysis Of VAriance).
ANOVA is one of the big topics in STAT 301, so I won’t try to explain it in any detail here. As I explained in class, ANOVA “works” by comparing the variation between groups (distance among the group means) to the variation within each group (the average standard deviation of the groups).
I’ll show you the output from an ANOVA here:
mod <- aov(leniency ~ expression, data = smiles2)
summary(mod)
## Df Sum Sq Mean Sq F value Pr(>F)
## expression 3 28 9.18 3.46 0.018 *
## Residuals 132 350 2.65
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The value on the top-right (0.018) represents our p-value. From this, we could conclude the group means differ. To find which group means differ from one another, we could run post-hoc tests. Here’s an example of Tukey’s Honestly Significant Differences
smiles2 <- transform(smiles2, subgrp = factor(expression, levels=c("felt", "feltfalse", "miserable", "neutral"), labels=c("F", "FF", "M", "N")))
mod <- lm(leniency ~ subgrp, data=smiles2)
compare <- TukeyHSD(mod, "subgrp")
compare
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = x)
##
## $subgrp
## diff lwr upr p adj
## FF-F 4.559e-01 -0.5712 1.4830 0.6562
## M-F -8.882e-16 -1.0271 1.0271 1.0000
## N-F -7.941e-01 -1.8212 0.2330 0.1889
## M-FF -4.559e-01 -1.4830 0.5712 0.6562
## N-FF -1.250e+00 -2.2771 -0.2229 0.0102
## N-M -7.941e-01 -1.8212 0.2330 0.1889
This shows, among other things, that the only groups that differ are Neutral and FeltFalse. You can see how they differ in the boxplot we created above.