Let’s load the Mosaic package:
require(mosaic)
Here’s the code to create the graph showing the running proportion of heads in 50,000 flips:
# First, specify how many coin tosses we want. We'll start with 1000
N = 1000
# Flip a coin N times and store the results in **coin**
coin <- do(N) * rflip()
# Add a column: *sum* = cumulative number of heads
coin <- data.frame(coin,
sum = cumsum(coin$prop))
# Add a column: *toss* = cumulative number of coin tosses
coin <- data.frame(coin,
toss =1:N)
# Add a column: *runprop* = running proportion of heads
coin <- data.frame(coin,
runprop = coin$sum/coin$toss)
Let’s look at the first several rows of our data frame, coin to see all the columns we added:
head(coin)
## n heads tails prop sum toss runprop
## 1 1 0 1 0 0 1 0.0000
## 2 1 1 0 1 1 2 0.5000
## 3 1 1 0 1 2 3 0.6667
## 4 1 0 1 0 2 4 0.5000
## 5 1 1 0 1 3 5 0.6000
## 6 1 1 0 1 4 6 0.6667
From this, you can see the final column contains our running proportion of heads at each toss. Let’s graph these results:
# Graph the results
xyplot(runprop~toss, data=coin, #Plot runprop as a function of toss
col="steelblue", alpha=0.7, type="o", #Sets the color & transparency
xlim=c(1,N), ylim=c(0.0,1.0), #Sets limits for axes
xlab="Toss", ylab="Proportion of Heads", #Labels axes
main=paste("Final Proportion of Heads = ", coin$runprop[N])) #Creates title
# Add a horizontal line at 0.50
plotFun(0.5~toss, add=TRUE, col="black", lwd=2, lty=3)
Rolling a die 5,000 times is like choosing 5,000 numbers between 1-6 (with replacement, of course). We can use the sample()
syntax:
roll1 <- sample(1:6, size=5000, replace=TRUE) #Rolls one die 5000 times
roll2 <- sample(1:6, size=5000, replace=TRUE) #Rolls another die 5000 times
We can now find the sum of the two dice in each of those 5,000 rolls:
sum <- roll1+roll2 #Finds the sum of the two dice
tally( ~sum) #Tallies the sum from each of the 5,000 rolls
##
## 2 3 4 5 6 7 8 9 10 11 12
## 128 296 415 522 701 791 716 569 409 299 154
histogram( ~ sum, width=1, main="Sums from 5,000 Rolls of Two Dice") #Creates a histogram
Finally, let’s count the number (and proportion) of those sums that are equal to 7:
tally( ~(sum==7)) #Tallies the sums that are equal to 7
##
## TRUE FALSE
## 791 4209
prop( ~(sum==7)) #Proportion of sums that are equal to 7
## TRUE
## 0.1582
That proportion represents our estimate of getting a sum of 7.
We’ll simulate tossing 3 coins 10,000 times:
ThreeCoins <- do(10000) * rflip(3) #Flips 3 coins 10,000 times
histogram( ~ heads, data=ThreeCoins, v=1.5, width=1) #Graphs outcomes
tally( ~heads, data=ThreeCoins) #Tallies outcomes
##
## 0 1 2 3
## 1212 3772 3775 1241
prop( ~(heads >= 2), data=ThreeCoins) #Counts proportion with at least 2 heads
## TRUE
## 0.5016
Let’s simulate 25,000 students randomly guessing on this true/false quiz. With random guessing, each question is like flipping a coin (heads = correct answer).
Here’s the estimated probability of getting at least one question correct:
Quiz <- do(25000) * rflip(10) #Flips 10 coins 25,000 times
histogram( ~ heads, data=Quiz, width=1, col="SteelBlue") #Graphs outcomes
tally( ~heads, data=Quiz) #Tallies outcomes
##
## 0 1 2 3 4 5 6 7 8 9 10
## 17 252 1138 2960 5033 6122 5151 2978 1088 236 25
prop( ~(heads >=1), data=Quiz) #Counts proportion with at least 1 correct (head)
## TRUE
## 0.9993
Here’s the estimated probability of getting a perfect score:
prop( ~(heads ==10), data=Quiz) #Counts proportion with 10 heads (correct answers)
## TRUE
## 0.001
There is a 1/104 = 0.0096 chance that they would have chosen “Bradley” at random. That’s like flipping a coin that has a 0.0096 chance of coming up heads. Likewise, choosing a middle name of Adam is like flipping a coin that has a 1/157 chance of coming up heads:
FirstName <- do(50000) * rflip(1, prob=1/104) #Flips an unfair coin 50,000 times
MiddleName <- do(50000) * rflip(1, prob=1/157) #Flips another unfair coin 50,000 times
Remember that, for the computer, HEADS=1 and TAILS=0. We can add the results from the two coins on each of the 10,000 trials. If the sum is 2, it means both coins came up heads (and the correct name was chosen):
FullName<- FirstName$heads + MiddleName$heads #Sums the coins
tally(~FullName) #Shows the number of trials with 0, 1, and 2 heads
##
## 0 1 2
## 49259 736 5
prop( ~(FullName ==2)) #Counts proportion with 2 heads (correct name)
## TRUE
## 1e-04
If we sample with replacement, we’re simply shuffling the letters ABCD. Let’s have the computer do this 10,000 times:
# Create a vector with the characters A, B, C, D
ABCD <-c ("A","B","C","D")
# Shuffle the letters 25,000 times and store the results in "Words"
Words <- do(25000) * shuffle(ABCD, replace=FALSE)
# This stores the first letter as V1, the second letter as V2, etc.
If we wanted to estimate the probability of getting the letters “ABCD” in order, we would use the following:
prop( ~(V1=="A" & V2=="B" & V3=="C" & V4=="D"), data=Words)
## TRUE
## 0.04176
If we want a list of all the different 4-letter “words,” we’d need to tally
# Combine the V1-V4 variables into a single 4-character string
Words2<- paste(Words$V1,Words$V2,Words$V3,Words$V4, sep="")
# Tally the number of different words
tally(~ Words2)
##
## ABCD ABDC ACBD ACDB ADBC ADCB BACD BADC BCAD BCDA BDAC BDCA CABD CADB CBAD
## 1044 1012 1038 1023 1038 1085 1067 1071 1021 1155 1015 958 1045 994 1049
## CBDA CDAB CDBA DABC DACB DBAC DBCA DCAB DCBA
## 1090 1043 1026 1007 995 1052 990 1080 1102
# Make the table easier to view as columns
cbind(duration.freq = table(Words2))
## duration.freq
## ABCD 1044
## ABDC 1012
## ACBD 1038
## ACDB 1023
## ADBC 1038
## ADCB 1085
## BACD 1067
## BADC 1071
## BCAD 1021
## BCDA 1155
## BDAC 1015
## BDCA 958
## CABD 1045
## CADB 994
## CBAD 1049
## CBDA 1090
## CDAB 1043
## CDBA 1026
## DABC 1007
## DACB 995
## DBAC 1052
## DBCA 990
## DCAB 1080
## DCBA 1102
From that table, we can count 24 different outcomes.
Let’s start with permutations. We can create a function in R that calculates the number of permutations given n = number of objects and x = number of objects selected:
perm = function(n, x) {
return(factorial(n) / factorial(n-x))
}
We can then call that function. As an example, suppose we want to replicate our answer to #14 (arranging the letters ABCD). We would set n=4 and x=4:
perm(n=4,x=4)
## [1] 24
To replicate #20 (choosing president, VP, and secretary from 6 people), we would set n=6 and x=3:
perm(n=6,x=3)
## [1] 120
Combinations are just as simple. First, we create a function with inputs n and x:
comb = function(n, x) {
return(factorial(n) / (factorial(x) * factorial(n-x)))
}
To replicate #23 (5-card hands from 52 cards), we would set n=52 and x=5:
comb(n=52,x=5)
## [1] 2598960
To replicate #24 (dividing 8 subjects into two equal-sized groups), we would set n=8 and x=4:
comb(n=8,x=4)
## [1] 70
We can use the combo function that we just wrote. First, let’s calculate the number of ways to choose 4 students out of 20:
comb(n=20,x=4)
## [1] 4845
Next, let’s calculate the number of ways to choose 4 students from 10 IE majors:
comb(n=10,x=4)
## [1] 210
Finally, we divide the two results to get our probability:
comb(n=10,x=4)/comb(n=20,x=4)
## [1] 0.04334
As we’ll learn in Activity #8, we could use the hypergeometric distribution to get this probability:
The function is: phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)
where:
q = number of type 1 objects chosen
m = number of type 1 objects
n = number of type 2 objects
k = number of objects selected
To get our probability, we’d need:
phyper(4, 10, 10, 4, lower.tail = TRUE, log.p = FALSE)-phyper(3, 10, 10, 4, lower.tail = TRUE, log.p = FALSE)
## [1] 0.04334
One final way to estimate this probability would be via simulation. In this scenario, we have 20 students. Let’s create 10 engineering majors (who all happen to have the name “E”) and 10 other majors (who all have the name “X”).
Students <-c ("E","E","E","E","E","E","E","E","E","E","X","X","X","X","X","X","X","X","X","X")
Students
## [1] "E" "E" "E" "E" "E" "E" "E" "E" "E" "E" "X" "X" "X" "X" "X" "X" "X"
## [18] "X" "X" "X"
To choose 4 students, we simply sample 4 of these letters. We’ll do this 100,000 times:
Failures <- do(100000) * sample(Students, 4, replace=FALSE)
# This stores the first student as V1, the second student as V2, etc.
head(Failures)
## V1 V2 V3 V4
## 1 E X X X
## 2 X E E X
## 3 X X E E
## 4 E X X X
## 5 X E X X
## 6 E X X X
If we wanted to estimate the probability of choosing 4 engineering majors, we’d simply find the proportion of our 100,000 trials that yielded “E, E, E, E.”
prop( ~(V1=="E" & V2=="E" & V3=="E" & V4=="E"), data=Failures)
## TRUE
## 0.04229