Let’s load the Mosaic package:
require(mosaic)
We have a binomial distribution with: p = 0.300
, n = 25
, and x ≤ 8
# Binomial Distribution
plotDist('binom', params=list(size=25, prob=0.300), xlim=c(-1,26), main="Binomial Probabilities with n=25, p=0.30")
That’s the binomial distribution. We can also look at the cdf:
# Binomial CDF
plotDist('binom', kind="cdf", params=list(size=25, prob=0.300), xlim=c(-1,26), main="Binomial Probabilities with n=25, p=0.30")
# Calculate P(X≤8)
pbinom(8, 25, 0.30, lower.tail=TRUE, log.p=FALSE)
## [1] 0.6769
If we wanted to reduce this probability, we can either give him more at-bats or lower our standard. Suppose we lower our standard so that they player only needs 7 or more hits to get a contract. The probability we make a mistake is nearly cut in half:
# P(X≤6)
pbinom(6, 25, 0.30, lower.tail=TRUE, log.p=FALSE)
## [1] 0.3407
The problem with lowering our standard is that a player who cannot hit .300 may be able to get 7 hits in 25 at-bats. As an example, here’s the probability that a .250 hitter gets 7 or more hits in 25 at-bats:
# P(X≤6)
1-pbinom(6, 25, 0.250, lower.tail=TRUE, log.p=FALSE)
## [1] 0.4389
That gives us a 44% chance of making the mistake of signing a .250 hitter to a contract.
rbinom(n, size, prob)
Let’s generate 5,000 numbers from a binomial distribution with p=0.3 and n=10. To make sure our code will work, let’s first try to get 5 numbers:
rbinom(5, 12, .3)
## [1] 5 7 1 3 4
Each of those values represents the number of successes in 12 trials. It looks like the code works, so let’s get our 5,000 values:
randomBinomial <- rbinom(5000, 12, .3)
Let’s get a histogram of these values (in yellow) and compare it the actual probabilities from the binomial distribution (in blue):
histogram(~randomBinomial, width=1, col = "lightyellow", border="yellow")
plotDist('binom', params=list(size=12, prob=0.300), xlim=c(-1,13), lwt=5, col="blue", add=TRUE)
We know the professor has a .167 (74/444) chance of winning any game. We can call on the Geometric Distribution to calculate the probabilities of interest.
To see how to use the geometric distribution in R, we can ask for help: help(pgeom)
The help file shows lots of information, including these four commands:
dgeom(x, prob, log = FALSE)
pgeom(q, prob, lower.tail = TRUE, log.p = FALSE)
qgeom(p, prob, lower.tail = TRUE, log.p = FALSE)
rgeom(n, prob)
Let’s plot the Geometric Distribution with p=.167:
# Geometric Distribution with p=.167
plotDist('geom', params=list(prob=74/444), xlim=c(-1,21), main="Geometric Distribution with p=.167")
# Geometric CDF
plotDist('geom', kind="cdf", params=list(prob=74/444), xlim=c(-1,20), main="Geometric CDF")
To calculate the probability of the first win ocurring on the 3rd trial, we would use:
# The first input is the number of trials BEFORE the first win
# So P(X=1) would be...
# dgeom(0, 74/444, log=FALSE)
# P(X=3) would be...
dgeom(2, 74/444, log=FALSE)
## [1] 0.1157
Running this code in RStudio will give you an interactive plot of the Geometric Distribution:
require(manipulate)
manipulate(
plotDist('geom',
kind=type,
params=list(prob=P),
xlim=c(-1, N)),
N=slider(5,50, step=1, initial=10),
P=slider(0,1, step=0.1, initial=0.5),
type=picker("density"="density", "cdf"="cdf", "qq"="qq", "histogram"="histogram"))
rgeom(n, prob)
Let’s generate 5,000 numbers from a geometric distribution with p=0.3:
randomGeometric <- rgeom(5000, .3)
Let’s get a histogram of these values (in yellow) and compare it the actual probabilities from the geometric distribution (in blue):
histogram(~randomGeometric, width=1, xlim=c(-1,20), col = "lightyellow", border="yellow")
plotDist('geom', params=list(prob=0.300), xlim=c(-1,20), lwt=5, col="blue", add=TRUE)
We need the Negative Binomial Distribution to answer this. To get help, I type: help(pnbinom)
That shows me two commands of interest: dnbinom(x, size, prob, mu, log = FALSE)
pnbinom(q, size, prob, mu, lower.tail = TRUE, log.p = FALSE)
Let’s plot the Negative Binomial Distribution with p=.167:
# NegBin Distribution with p=.167
# Note the x-axis represents trials before 3 successes
plotDist('nbinom', params=list(size=3, prob=74/444), xlim=c(-1,41), main="Negative Binomial Distribution with p=.167")
# Geometric CDF
plotDist('nbinom', kind="cdf", params=list(size=3, prob=74/444), xlim=c(-1,41), main="Negative Binomial Distribution with p=.167")
To calculate the probability of getting 2 wins on the 3rd trial, we would use:
# x = (Total number of trials - number of successes desired)
# size = number of successes desired
# So P(2nd win on 3rd trial) would be...
# P(1 trial before 2 wins)
dnbinom(1, 2, 74/444, log=FALSE)
## [1] 0.0463
To calculate the probability of getting 3 wins on the 4th trial, we would use:
dnbinom(1, size=3, prob=74/444, log=FALSE)
## [1] 0.01157
rnbinom(n, size, prob)
Let’s generate 5,000 numbers from a negative binomial distribution looking for the 3rd win with p=0.3:
randomNegBin <- rnbinom(5000, 3, .3)
Let’s get a histogram of these values (in yellow) and compare it the actual probabilities from the negative binomial distribution (in blue):
histogram(~randomNegBin, width=1, xlim=c(-1,23), col = "lightyellow", border="yellow")
plotDist('nbinom', params=list(size=3, prob=0.300), xlim=c(-1,23), lwt=5, col="blue", add=TRUE)
We’ve seen the syntax for the Hypergeometric Distribution in Activity #3. Recall that the function is: dhyper(x, m, n, k, log = FALSE)
where: x = number of type 1 objects chosen
m = number of type 1 objects
n = number of type 2 objects
k = number of objects selected
For this question, we would use:
dhyper(7, 18, 16, 12, log = FALSE)
## [1] 0.2535
rhyper(nn, m, n, k)
Let’s generate 5,000 numbers from a hypergeometric distribution with:
20 objects of type 1
10 objects of type 2
8 objects chosen at random
randomHyper <- rhyper(5000, 20, 10, 8)
Let’s get a histogram of these values (in yellow) and compare it the actual probabilities from the negative binomial distribution (in blue). The values represent the number of objects of type 1 selected:
histogram(~randomHyper, width=1, xlim=c(-1,9), col = "lightyellow", border="yellow")
plotDist('hyper', params=list(20, 10, 8), xlim=c(-1,9), lwt=5, col="blue", add=TRUE)
We need the Poisson Distribution, so let’s see the commands: help(ppois)
dpois(x, lambda, log = FALSE)
ppois(q, lambda, lower.tail = TRUE, log.p = FALSE)
In our scenario, our expected value is 15 per year. We’re interested in a single week, so we can convert our expected value to find lambda = 15/52.
Let’s look at a plot of our Poisson Distribution:
# Poisson distribution with lambda=15/52
plotDist('pois', params=list(lambda=15/52), xlim=c(-1,6), main="Poisson Distribution with Lambda=15/52")
# Geometric CDF
plotDist('pois', kind="cdf", params=list(lambda=15/52), xlim=c(-1,6), main="Poisson Distribution with Lambda=15/52")
To get our probability, we use:
dpois(x=0, lambda=15/52)
## [1] 0.7494
Let’s look at a plot of our Poisson Distribution with lambda=6:
# Poisson distribution with lambda=6
plotDist('pois', params=list(lambda=6), xlim=c(-1,16), main="Poisson Distribution with Lambda=6")
# Geometric CDF
plotDist('pois', kind="cdf", params=list(lambda=6), xlim=c(-1,16), main="Poisson Distribution with Lambda=15/52")
To get our probability, we use:
1-ppois(q=5, lambda=6, lower.tail = TRUE, log.p = FALSE)
## [1] 0.5543
Running this code in RStudio will give you an interactive plot of the Poisson Distribution:
require(manipulate)
manipulate(
plotDist('pois',
kind=type,
params=list(lambda=L),
xlim=c(-1, N)),
N=slider(5,25, step=1, initial=10),
L=slider(0,10, step=1, initial=5),
type=picker("density"="density", "cdf"="cdf", "qq"="qq", "histogram"="histogram"))
rpois(n, lambda)
Let’s generate 5,000 numbers from a poisson distribution with lambda=30:
randomPoisson <- rpois(5000, 20)
Let’s get a histogram of these values (in yellow) and compare it the actual probabilities from the negative binomial distribution (in blue). The values represent the number of objects of type 1 selected:
histogram(~randomPoisson, width=1, xlim=c(-1,35), col = "lightyellow", border="yellow")
plotDist('pois', params=list(20), xlim=c(-1,35), lwt=5, col="blue", add=TRUE)