Remember to download the report template for this lab and open it in RStudio. You can download the template by clicking this link: http://bradthiessen.com/html5/stats/m300/lab3report.Rmd
I’m typing this on June 15, 2015, but I’m willing to bet that no one earned the 1,000,000 extra credit points during activity #7. Let’s see if we can simulate the 4- and 10-question quizzes and estimate the probability of getting a perfect score on each.
Each question in our quizzes had 4 possible answers: A, B, C, or D. Of those 4 answers, only one was correct. Your task, as a student in the class, was to choose the one correct answer from the 4 possible choices.
Using this logic, each quiz question had 4 possible outcomes: 1, 0, 0, 0. Either you chose the correct (1) answer or you chose one of the three incorrect (0, 0, 0) answers.
Let’s construct a dataframe containing the possible answers for each test question:
choices <- c(1, 0, 0, 0)
To simulate the 4-question quiz, we need to sample 4 of these possible answers (choices) with replacement. By sampling with replacement, we allow our simulated student to choose the correct answer multiple times.
Let’s simulate a single student taking the 4-question quiz:
set.seed(3141)
sample(choices, 4, replace=TRUE)
## [1] 0 0 1 0
Wait… what’s set.seed(3141)
? Every time we run a randomized simulation, we get different results. That’s a good thing – we want a random sample. Unfortunately, it means we’re unable to replicate our results. By setting a random number seed, it ensures I get the same random sample each time I run the sample()
command (at least until I run the command again).
I arbitrarily chose the value 3141
in the seed. When I choose that number and run the command sample(choices, 4, replace=TRUE)
, the output shows a simulated student who answered the 3rd question correctly but missed questions 1, 2, and 4.
If we only care about the simulated student’s score on the quiz (and we don’t care about which questions the student answered correctly), we can ask R to give us the sum of his item scores. In this case, we should get a score of 1.
set.seed(3141) # This seed ensures I get that same simulated student
sum(sample(choices, 4, replace=TRUE))
## [1] 1
Now that we know how to simulate a single student, let’s simulate a class of 10,000 students.
quiz4 <- do(10000) * sum(sample(choices, 4, replace=TRUE))
We can then visualize the results and estimate the probability of getting a perfect score:
histogram(~result, # Plot the results
data=quiz4, # Stored in quiz4
type="count", # Show frequencies on the y-axis
width=1, # Make the bars have width = 1
col="grey", border="white", # Change the colors
xlab = "4-question quiz scores")
prop(~result == 4, data=quiz4)
## TRUE
## 0.0038
We can also estimate the probability of scoring at least 2 points on the 4-question quiz:
prop(~result >= 2, data=quiz4)
## TRUE
## 0.2606
As we learned in class, we can use the binomial distribution to model the probability a student gets any score by guessing on our quizzes.
To model the probability of getting a perfect score on the 4-question quiz, we’d use a binomial distribution with:
We can feed these parameters into the dbinom(x, size, prob)
command:
dbinom(4, 4, 0.25)
## [1] 0.00390625
That gives us a probability similar to what we obtained from our simulation.
To calculate the probabiity of getting a score of 2 or greater on the quiz, we can use pbinom()
. This command gives us lower-tail probabilities – P(X <= x)
. Since we want to know P(X >= 2), we’ll use the complement rule to calculate:
P(X >= 2) = 1 - P(X <= 1)
1 - pbinom(1, 4, 0.25)
## [1] 0.2617188
This answer is similar to our simulated probability.
Use dbinom()
to calculate the probability of getting a perfect score on the 10-question quiz.
Use pbinom()
to calculate the probability of getting a score of at least 5 on the 10-question quiz.
We can also visualize the entire probability model by graphing the binomial distribution. For the 4-question quiz, we use:
plotDist('binom',
params=list(size=4, prob=0.25),
lw=5,
main="Binomial Probabilities with n=4, p=0.25")
The first line of code tells R we want to plot the binomial distribution.
The second line informs R that our distribution has size = 4 and prob = 0.25.
The third line, lw=5
sets the width of the lines in the plot. I think the default is too skinny, so I set the width equal to 5.
The last line specifies the title of the plot.
In our first lab (and in the first activity in this class), we used simulation to estimate the likelihood of dolphins choosing 15 correct switches out of 16 trials (assuming they had a 50% chance of choosing the correct switch on each trial.)
pbinom()
to calculate the likelihood of getting 15 or more correct switches out of 16 trials.When you’ve finished typing all your answers to the exercises, you’re ready to publish your lab report. To do this, look at the top of your source panel (the upper-left pane) and locate the Knit HTML button:
Click that button and RStudio should begin working to create an .html file of your lab report. It may take a minute or two to compile everything, but the report should open automatically when finished.
Once the report opens, quickly look through it to see all your completed exercises. Assuming everything looks good, send that lab1report.html file to me via email or print it out and hand it in to me.
Feel free to browse around the websites for R and RStudio if you’re interested in learning more, or find more labs for practice at http://openintro.org.