Lab #3: Binomial & Sign Tests

Remember to download the report template for this lab and open it in RStudio. You can download the template by clicking this link: http://bradthiessen.com/html5/stats/m300/lab3report.Rmd

Simulating binomial random variables

No time to study

I’m typing this on June 15, 2015, but I’m willing to bet that no one earned the 1,000,000 extra credit points during activity #7. Let’s see if we can simulate the 4- and 10-question quizzes and estimate the probability of getting a perfect score on each.

Each question in our quizzes had 4 possible answers: A, B, C, or D. Of those 4 answers, only one was correct. Your task, as a student in the class, was to choose the one correct answer from the 4 possible choices.

Using this logic, each quiz question had 4 possible outcomes: 1, 0, 0, 0. Either you chose the correct (1) answer or you chose one of the three incorrect (0, 0, 0) answers.

Let’s construct a dataframe containing the possible answers for each test question:

choices <- c(1, 0, 0, 0)

To simulate the 4-question quiz, we need to sample 4 of these possible answers (choices) with replacement. By sampling with replacement, we allow our simulated student to choose the correct answer multiple times.

Let’s simulate a single student taking the 4-question quiz:

set.seed(3141)
sample(choices, 4, replace=TRUE)

## [1] 0 0 1 0

Wait… what’s set.seed(3141)? Every time we run a randomized simulation, we get different results. That’s a good thing – we want a random sample. Unfortunately, it means we’re unable to replicate our results. By setting a random number seed, it ensures I get the same random sample each time I run the sample() command (at least until I run the command again).

I arbitrarily chose the value 3141 in the seed. When I choose that number and run the command sample(choices, 4, replace=TRUE), the output shows a simulated student who answered the 3rd question correctly but missed questions 1, 2, and 4.

If we only care about the simulated student’s score on the quiz (and we don’t care about which questions the student answered correctly), we can ask R to give us the sum of his item scores. In this case, we should get a score of 1.

set.seed(3141)  # This seed ensures I get that same simulated student
sum(sample(choices, 4, replace=TRUE))

## [1] 1

Now that we know how to simulate a single student, let’s simulate a class of 10,000 students.

quiz4 <- do(10000) * sum(sample(choices, 4, replace=TRUE))

We can then visualize the results and estimate the probability of getting a perfect score:

histogram(~result,                  # Plot the results
          data=quiz4,               # Stored in quiz4 
          type="count",             # Show frequencies on the y-axis
          width=1,                  # Make the bars have width = 1
          col="grey", border="white", # Change the colors
          xlab = "4-question quiz scores")

prop(~result == 4, data=quiz4)

##   TRUE 
## 0.0038

Simulate the 10-question quiz and estimate the probability of getting a perfect score.

We can also estimate the probability of scoring at least 2 points on the 4-question quiz:

prop(~result >= 2, data=quiz4)

##   TRUE 
## 0.2606

Estimate the probability of getting a score of at least 5 on the 10-question quiz.

Using the binomial distribution

As we learned in class, we can use the binomial distribution to model the probability a student gets any score by guessing on our quizzes.

To model the probability of getting a perfect score on the 4-question quiz, we’d use a binomial distribution with:

n = size = 4 trials
prob = 0.25 = P(correct answer on each question)
x = 4 = number of correct answers we want

We can feed these parameters into the dbinom(x, size, prob) command:

dbinom(4, 4, 0.25)

## [1] 0.00390625

That gives us a probability similar to what we obtained from our simulation.

To calculate the probabiity of getting a score of 2 or greater on the quiz, we can use pbinom(). This command gives us lower-tail probabilities – P(X <= x). Since we want to know P(X >= 2), we’ll use the complement rule to calculate:

P(X >= 2) = 1 - P(X <= 1)

1 - pbinom(1, 4, 0.25)

## [1] 0.2617188

This answer is similar to our simulated probability.

Use dbinom() to calculate the probability of getting a perfect score on the 10-question quiz.
Use pbinom() to calculate the probability of getting a score of at least 5 on the 10-question quiz.

We can also visualize the entire probability model by graphing the binomial distribution. For the 4-question quiz, we use:

plotDist('binom', 
         params=list(size=4, prob=0.25), 
         lw=5,
         main="Binomial Probabilities with n=4, p=0.25")

The first line of code tells R we want to plot the binomial distribution.

The second line informs R that our distribution has size = 4 and prob = 0.25.

The third line, lw=5 sets the width of the lines in the plot. I think the default is too skinny, so I set the width equal to 5.

The last line specifies the title of the plot.

Plot the probability model for your 10-question quiz.

Scenario #2: Dolphins, revisited

In our first lab (and in the first activity in this class), we used simulation to estimate the likelihood of dolphins choosing 15 correct switches out of 16 trials (assuming they had a 50% chance of choosing the correct switch on each trial.)

Use pbinom() to calculate the likelihood of getting 15 or more correct switches out of 16 trials.

Generating (pubishing) your lab report

When you’ve finished typing all your answers to the exercises, you’re ready to publish your lab report. To do this, look at the top of your source panel (the upper-left pane) and locate the Knit HTML button: Drawing

Click that button and RStudio should begin working to create an .html file of your lab report. It may take a minute or two to compile everything, but the report should open automatically when finished.

Once the report opens, quickly look through it to see all your completed exercises. Assuming everything looks good, send that lab1report.html file to me via email or print it out and hand it in to me.

You’ve done it! Congratulations on finishing your 3rd lab!

Feel free to browse around the websites for R and RStudio if you’re interested in learning more, or find more labs for practice at http://openintro.org.

This lab, released under a Creative Commons Attribution-ShareAlike 3.0 Unported license, is a derivative of templates originally developed by Mark Hansen (UCLA Statistics) and adapted by Andrew Bray (Mt. Holyoke) & Mine Çetinkaya-Rundel (Duke).