Probability & Counting

# Load the tidyverse packages
library(tidyverse)
library(mosaic)
# Set scientific notation limit
options(scipen=999)

When we flip a coin, we assume it has a 50% chance of heads. If this is true, why don’t we expect to get exactly 50 heads if we toss a coin 100 times? How can we be sure we have a 50% chance of flipping heads on any single coin toss?

A phenomenon is random if individual outcomes are uncertain but there is a regular distribution of outcomes in a large number of repetitions.
Probability can refer to our prediction of the likelihood of that outcome occurring in the future.

What’s the probability I am reading this question on the second day of class?

What’s the probability I will read this question on the second day of class next year?

What’s the probability you showed up to class today?

In this unit, we’ll learn how to construct probability models. Probability models display (through a list, plot, or formulas):

The sample space of an experiment (all possible outcomes)
The probability of observing each of those outcomes

Write out the probability model for tossing a fair, six-sided die.

Construct a probability model for your grade in this course. How did you estimate the probabilities?

Probabilities, while useful, can be interpreted in many ways. Physical probabilities, such as those arising from coin tosses, card games, or radioactive atoms, might be interpreted and estimated using:

Relative frequency approach: Probability is how frequently something happens over a large number of repetitions
Classical approach: Probability is directly related to the number of equally-likely outcomes in an experiment.

Evidential probabilities, such as our beliefs in the fairness of a coin, can be estimated through a subjective approach:

Subjective approach: Probability represents our degree of belief in an outcome based on previous evidence.

Relative Frequency Approach

\(P(win) \approx\underset{\textrm{trials}\rightarrow \infty}{lim}\frac{\textrm{wins}}{\textrm{trials}}\)

We estimate probability as the proportion of times an outcome occurs in a long series of repetitions.

Suppose we toss a coin 10 times and observe 7 heads. If this was the only information we had about this coin, what would be our best estimate for the probability of tossing heads? Do we know this estimate is inaccurate? Explain.

Karl Pearson once tossed a coin 24,000 times and observed 12,012 heads. Using his results, we’d estimate the probability of tossing heads to be 0.5005

Using R, I simulated Pearson’s experiment. The following graph shows the running proportion of heads when a simulated coin is flipped 24,000 times. While the outcome of each flip is random, the long-run relative frequency settles near 0.50.

# First, specify how many coin tosses we want.  We want 24,000
N = 24000
# Flip a coin N times and store the results in **coin**
coin <- Do(N) * nflip()
# Add a column: *sum* = cumulative number of heads
# Add a column: *toss* = cumulative number of coin tosses
# Add a column: *runprop* = running proportion of heads
coin <- coin %>%
  mutate(sum = cumsum(nflip),
         toss = 1:N,
         runprop = sum/toss)

# Let's look at some of the data
head(coin)
#    # A tibble: 6 × 4
#      nflip   sum  toss   runprop
#    * <dbl> <dbl> <int>     <dbl>
#    1     0     0     1 0.0000000
#    2     1     1     2 0.5000000
#    3     1     2     3 0.6666667
#    4     0     2     4 0.5000000
#    5     1     3     5 0.6000000
#    6     0     3     6 0.5000000

# Let's plot the results
ggplot(data = coin, aes(x = toss, y = runprop)) +
  geom_line(color="steelblue", alpha=0.7) +
  annotate("segment", x = 0, xend = 24000, y = 0.50, yend = 0.50,
  colour = "black") +
  labs(title="Running proportion of heads in 24,000 coin tosses",
       subtitle=paste("Proportion after 100 trials = ", coin$runprop[100], ".  Proportion of heads after 24,000 trials = ", coin$runprop[24000]))

Using this approach, how would you estimate the probability that the current vice president will become president?

How could we use this approach to estimate the probability of obtaining a sum of 7 if we roll two 6-sided dice?

Let’s simulate 5,000 rolls of two dice.

# Create the data.  The outcome of each die is simply
# one sample (with replacement) from the numbers 1-6.
dice <- tibble(
  die1 = sample(1:6, size=5000, replace=TRUE),
  die2 = sample(1:6, size=5000, replace=TRUE),
  sum = die1 + die2
)

# Let's see a table of those sums
table(dice$sum)
#    
#      2   3   4   5   6   7   8   9  10  11  12 
#    146 260 399 591 732 849 689 537 411 255 131

# A histogram of those sums
dice %>%
  ggplot(aes(x = sum)) +
  geom_histogram(binwidth = 1, fill="lightblue", color="white", alpha = 0.8) +
  labs(
      title = "5,000 replications of rolling two 6-sided dice",
      subtitle = paste("Proportion with sum equal to 7 = " , sum(dice$sum == 7) / length(dice$sum)),
      x = "sum of the two dice"
      ) +
  scale_x_continuous(breaks=seq(-1, 13, 1), minor_breaks=NULL) +
    theme(
    axis.text.x = element_text(size = 11, color="grey10"),
    legend.position = "none",
    panel.grid.major.y = element_line(colour = "white"),
    panel.grid.major.x = element_line(colour = "white", size=.15),
    panel.grid.minor = element_blank(),
    panel.background = element_rect(fill = "grey93")
  )

Lazy nurse

You are the world’s laziest nurse. One night, four mothers with the last names Anderson, Boyd, Carter, and Dillon give birth to baby boys. Each mothers gives her child a first name alliterative to his last: Andy Anderson, Bobby Boyd, Chris Carter, and Dennis Dillon. You’re too lazy to keep track of which baby belongs to which mother, so you decide to return the babies to their mothers completely at random. Estimate the probability you return 0, 1, 2, 3, or 4 babies to their correct mothers.

# We're shuffling the babies and randomly returning them
# Let's shuffle the letters ABCD
babies <- c("A", "B", "C", "D")
shuffle(babies)
#    [1] "B" "A" "D" "C"
shuffle(babies)
#    [1] "C" "B" "A" "D"

# We'll replicate this process 10,000 times
lazynurse <- Do(10000) * shuffle(babies)

# Calculate the number of matches
lazynurse <- lazynurse %>%
  mutate(return = paste(V1, V2, V3, V4, sep=""),
         firstmatch   = (V1=="A"),
         secondmatch = (V2=="B"),
         thirdmatch   = (V3=="C"),
         fourthmatch  = (V4=="D"),
         matches = firstmatch + secondmatch + thirdmatch + fourthmatch) %>%
  select(return, matches)

# Histogram
lazynurse %>%
  ggplot(aes(x = matches)) +
  geom_histogram(binwidth = 1, fill="lightblue", color="white", alpha = 0.8) +
  labs(
      title = "10,000 replications of returning four babies at random",
      x = "number of correct matches"
      ) +
  scale_x_continuous(breaks=seq(-1, 5, 1), minor_breaks=NULL) +
    theme(
    axis.text.x = element_text(size = 11, color="grey10"),
    legend.position = "none",
    panel.grid.major.y = element_line(colour = "white"),
    panel.grid.major.x = element_line(colour = "white", size=.15),
    panel.grid.minor = element_blank(),
    panel.background = element_rect(fill = "grey93")
  ) +
  stat_bin(aes(y=(..count..)/sum(..count..), 
    label=round((..count..)/sum(..count..),2)), 
    geom="text", size=4, binwidth = 1, vjust=-1.5)

Classical Approach

With the relative frequency approach, the estimated probability will change each time we collect (or simulate) more trials. Perhaps we want an approach that gives us a single consistent estimate.

Classical / Theoretical Approach: \(P(\textrm{event}) = \frac{\textrm{number of outcomes that result in the event}}{\textrm{total number of possible outcomes}}\)

Rolling 2 dice results in 36 possible outcomes. Is each outcome equally likely? What’s the probability of rolling a sum of 7? How does this compare to our estimate using the relative frequency approach?

To use the classical approach, we need to count the possible outcomes. If we toss a coin 3 times, how many outcomes are possible? What’s the probability we observe at least two heads?

# Simulate tossing 3 coins 10,000 times
threecoins <- Do(10000) * nflip(3)
prop(~nflip >= 2, data=threecoins)
#      TRUE 
#    0.4981

# List all possible outcomes
# HHH
# HHT, HTH, THH
# TTH, THT, HTT
# TTT

This classical approach is useful if all outcomes are equally likely to occur. Counting outcomes can quickly become tedious, though:

Rolling one 6-sided die 2 times: 36 outcomes
Rolling one 6-sided die 10 times: 60466176 outcomes
Tossing a coin 10 times: 1024 outcomes
Tossing a coin 20 times: 1048576 outcomes
Dividing 20 students into 4 equal-sized groups (group ID does not matter): 488864376 outcomes
Dividing 20 students into 4 equal-sized groups (group ID does matter): 11732745024 outcomes
Choosing 6 lottery numbers between 1-36 in order with repeats allowed: 2176782336 outcomes
Choosing 6 lottery numbers between 1-36 in order with no repeats: 1402410240 outcomes
Choosing 6 lottery numbers between 1-36 in any order with no repeats: 1947792 outcomes

To use this classical approach, we’re going to need to learn how to count.

The owners of Relish, a restaurant in Florida, estimate customers can create more than 1,000 possible hamburgers. Customers can choose among 12 types of meat, 20 relishes, 4 cheeses, and 6 salts. Let’s assume customers must choose exactly one meat, relish, cheese, and salt combination. Use a tree diagram and then the slot method to count the number of possible hamburgers.

Suppose one of the salts is sea salt. If we randomly choose a hamburger, what’s the probability it includes sea salt?

# Number of options
12 * 20 * 4 * 6
#    [1] 5760

# Probability it includes sea salt
1 / 6
#    [1] 0.1666667

Suppose you forget to study, so you randomly guess on a 10-question true/false quiz. How likely are you to get a perfect score? What’s the probability you get at least one question correct?

# Simulate flipping 10 coins 10,000 times
quiz <- Do(10000) * nflip(10)

# Count proportion of quiz scores that equal 10
prop(~nflip == 10, data=quiz)
#      TRUE 
#    0.0018

# "Exact" calculation
1 / (2 ^ 10)
#    [1] 0.0009765625

# Count proportion of quiz scores that equal at least one
prop(~nflip >=1, data=quiz)
#     TRUE 
#    0.999

# "Exact" calculation
1 - (1 / (2 ^ 10))
#    [1] 0.9990234

How many 4-character passwords could you create if the characters can be
     • a lower-case letter
     • an upper-case letter
     • a number between 0-9?

possibilities <- (26 + 26 + 10)
possibilities^4
#    [1] 14776336

Suppose the characters cannot be repeated. Now how many 4-character passwords could you create?

62 * 61 * 60 * 59
#    [1] 13388280

Suppose characters can be repeated but they cannot be repeated back-to-back (e.g., you can have A1Ag but not AA1g). How many 4-character passwords could you create?

62 * 61 * 61 * 61
#    [1] 14072822

My two older brothers and I all have the same initials: BAT. If my parents chose a name at random from 104 commonly used (English) boy names beginning with the letter “B” and 157 names beginning with “A,” what’s the probability I would have gotten the name Bradley Adam?

Can you think of a better way to estimate this probability?

(1 / 104) * (1 / 157)
#    [1] 0.00006124449

# Let's load 1.83 million baby names from the Social Security Administration
library(babynames)
babies <- babynames

# Let's only look at boys born +/- 3 years from my birth year
babies <- babynames %>%
  filter(sex == "M", year < 1980, year > 1972)

# How many boys were born in these 7 years?
totalbabies <- sum(babies$n)

# How many boys were born with the name Bradley in these 7 years?
bradleys <- sum(babies$n[babies$name=="Bradley"])

# How many boys were born with the name Adam in these 7 years?
adams <- sum(babies$n[babies$name=="Adam"])
    
# Proportion of babies born with the name Bradley
bradprop <- bradleys / totalbabies

# Proportion of babies born with the name Adam
adamprop <- adams / totalbabies

# P(Bradley Adam)
bradprop * adamprop
#    [1] 0.00002464961

# Assume Adam Bradley is just as likely as Bradley Adam
bradprop * adamprop / 2
#    [1] 0.00001232481

You are once again the world’s laziest nurse who has decided to randomly return 4 babies to 4 mothers. Calculate the number of possible outcomes in this experiment.

4 * 3 * 2 * 1
#    [1] 24

Suppose we randomly arrange 10 books on a shelf. What’s the probability we (randomly) arrange them in alphabetical order?

1 / factorial(10)
#    [1] 0.0000002755732

Suppose we decide to take a class photo. In how many ways could we line up for the photo?

# I'll assume 25 students are in the class
factorial(25)
#    [1] 15511210043330986055303168

Suppose we can only get 5 students in the class photo. How many different line-ups of 5 students could we select?

# I'll assume 25 students are in the class
factorial(25) / factorial(25-5)
#    [1] 6375600
25 * 24 * 23 * 22 * 21
#    [1] 6375600

Using each digit only once, how many unique 4-digit numbers are there?

factorial(10) / factorial(10-4)
#    [1] 5040
10*9*8*7
#    [1] 5040

# We can also calculate permutations with this function:
perm = function(n, x) {
  return(factorial(n) / factorial(n-x))
}

perm(10, 4)
#    [1] 5040

How many ways are there to select a president, vice president, and secretary from 6 people? Is order important?

# This assumes the order matters
perm(6, 3) 
#    [1] 120

How many ways are there to select 3 committee members from 6 people? How does this differ from the previous question?

# Order does not matter, so ABC is the same as:
# ACB, BAC, BCA, CAB, CBA
# In other words, six outcomes actually represent
# only one unique outcome

# We can divide our permutation by the number of ways
# we can rearrange our chosen objects
perm(6, 3) / factorial(3)
#    [1] 20

# Or we can calculate a combination with choose()
choose(6,3)
#    [1] 20

Will we always have fewer combinations than permutations? Why or why not?

How many different 5-card hands can we select from 52 cards?

# Order does not matter, so we have a combination
choose(52,5)
#    [1] 2598960

Suppose we must assign 8 subjects to 2 groups (of 4 subjects each). In how many ways can we do this?

In how many ways can we assign 12 subjects into 3 groups (of 4 subjects each)?

# Order does not matter, so we have a combination
# We're choosing 4 people for a group
choose(8,4)
#    [1] 70

# To choose 3 groups, we choose
choose(12,4) * choose(8, 4) * choose(4, 4)
#    [1] 34650

The answers to those questions are wrong. Why?

# Does the order of the groups matter?  For example, suppose we choose
# Group 1:  A, B, C, D
# Group 2:  E, F, G, H

# Is that different from:
# Group 1:  E, F, G, H
# Group 2:  A, B, C, D

# If the group identifications do not matter
# (and those two scenarios are the same)
# then we need to divide by the number of ways
# we can rearrange groups

choose(8,4) / factorial(2)
#    [1] 35

choose(12,4) * choose(8, 4) * choose(4, 4) / factorial(3)
#    [1] 5775

In how many ways can we arrange the letters in the word MISSISSIPPI? Note that we’re not distinguishing among the 4 Ss, 4 Is, or 2 Ps. Another way to ask this question would be: Out of 11 characters, in how many ways can we choose 1 M, 4 Is, 4 Ss, and 2 Ps?

choose(11,1) * choose(10,4) * choose(6,4) * choose(2,2)
#    [1] 34650
# The order of the letters matter (they represent different "words")

Suppose we have 10 engineering majors and 10 other majors in this class. I need to choose 4 students at random to fail this course (thus, satisfying my ego).

In how many ways could I choose 4 students out of 20?

In how many ways could I choose 4 students out of the 10 engineering majors?

Using the results from these two questions, what is the probability that I randomly select 4 engineering majors to fail?

# We'll learn how to use the hypergeometric distribution to calculate this quickly
# For now, let's do it step-by-step

# Choose 4 students out of 20
total <- choose(20, 4)

# Choose 4 engineers out of 10
engineers <- choose(10,4)

# Choose no other majors
other <- choose(10,0)

# Event of interest = choosing engineers.
engineers / total
#    [1] 0.04334365


# We can also simulate this experiment
# Create 10 engineering majors (1s) and 10 other majors (0s)
students <- c(rep(1,10), rep(0,10))

# Choose 4 of them 10,000 times
choice <- Do(10000) * sample(students, 4)

# Create the data frame
choice <- choice %>%
  mutate(engineers = V1 + V2 + V3 + V4)

# Calculate proportion
prop(~engineers == 4, data=choice)
#      TRUE 
#    0.0433

Sources:

• Bastian, J. (1967). The transmission of arbitrary environmental information between bottlenose dolphins. In Animal sonar systems: Biology and bionics, ed. R.-G. Busnel, pp. 807-873. Jouy-en Josas, France: Laboratoire de Physiologie Acoustique.

• Tintle, N., et al. (2015). Introduction to Statistical Investigations, Preliminary edition

• Rosen, B. & Jerdee, T. (1974). Influence of sex role stereotypes on personnel decisions. Journal of Applied Psychology, 59: 9-14.

Creative Commons Attribution-ShareAlike 3.0 Unported license.

```