Chapter 4 Understanding the Poisson Distribution
Before we proceed, let’s quickly recap what the Poisson distribution entails. The Poisson distribution describes the probability of a certain number of events occurring in a fixed interval of time or space. It is often used to model the number of occurrences of rare events in a large population.
Example Scenarios
Phone Calls: Counting the number of phone calls received by a call center in an hour. The Poisson distribution helps us predict how many calls we might receive during a specific time period, such as getting exactly five calls in an hour.
Traffic Accidents: Recording the number of traffic accidents that occur at a particular intersection in a day. The Poisson distribution helps us estimate the probability of a certain number of accidents happening in a given timeframe, like having exactly two accidents in a day.
Typographical Errors: Counting the number of typographical errors in a book. The Poisson distribution can help us calculate the likelihood of having a specific number of errors in a given number of pages, such as having exactly ten errors in 100 pages.
Email Arrivals: Tracking the number of emails arriving in an inbox per minute. The Poisson distribution helps us understand the probability of receiving a certain number of emails in a short time span, like getting exactly three emails in a minute.
Poisson Distribution Example
Suppose we are interested in modeling the number of students visiting the library in an hour at the University of Gadau. Let’s assume that, on average, the library receives 10 students per hour.
Probability Mass Function
The Poisson distribution helps us calculate the probability of a certain number of events occurring in a fixed interval of time or space.
Let’s calculate the probability of having exactly 5 students visit the library in an hour using the Poisson distribution.
Given: \[\begin{align*} \lambda &= 10 \quad \text{(average number of students per hour)}, \\ k &= 5 \quad \text{(number of students we're interested in)}. \end{align*}\]
The probability mass function (PMF) of the Poisson distribution is given by: \[ P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!} \]
Plugging in the values: \[ P(X = 5) = \frac{e^{-10} \times 10^5}{5!} \]
\[ P(X = 5) = \frac{e^{-10} \times 100000}{120} \]
Calculating: \[ P(X = 5) \approx \frac{0.0000454 \times 100000}{120} \]
\[ P(X = 5) \approx \frac{4.54}{120} \]
\[ P(X = 5) \approx 0.0378 \]
Therefore, the probability of having exactly 5 students visit the library in an hour is approximately 0.0378.
Cumulative Distribution Function
The cumulative distribution function (CDF) of the Poisson distribution gives the probability of having up to a certain number of events occur.
Let’s calculate the probability of having up to 5 students visit the library in an hour.
Given: \[ \lambda = 10 \quad \text{(average number of students per hour)}, \\ k = 5 \quad \text{(number of students we're interested in)}. \]
The cumulative distribution function (CDF) of the Poisson distribution is given by: \[ P(X \leq k) = \sum_{i=0}^{k} \frac{e^{-\lambda} \lambda^i}{i!} \]
Plugging in the values: \[ P(X \leq 5) = \sum_{i=0}^{5} \frac{e^{-10} \times 10^i}{i!} \]
Calculating: \[ P(X \leq 5) = \frac{e^{-10} \times 10^0}{0!} + \frac{e^{-10} \times 10^1}{1!} + \frac{e^{-10} \times 10^2}{2!} + \frac{e^{-10} \times 10^3}{3!} + \frac{e^{-10} \times 10^4}{4!} + \frac{e^{-10} \times 10^5}{5!} \]
\[ P(X \leq 5) = e^{-10} + \frac{10e^{-10}}{1} + \frac{100e^{-10}}{2} + \frac{1000e^{-10}}{6} + \frac{10000e^{-10}}{24} + \frac{100000e^{-10}}{120} \]
\[ P(X \leq 5) \approx 0.0671 \]
Therefore, the probability of having up to 5 students visit the library in an hour is approximately 0.0671.
#PMF in R
<- 10 # Average number of students visiting the library per hour
lambda <- 5 # Number of students we're interested in
k
# Calculate PMF for exactly 5 students
<- dpois(k, lambda)
pmf_5_students pmf_5_students
## [1] 0.03783327
# Calculate CDF for up to 5 students
<- ppois(k, lambda)
cdf_up_to_5_students cdf_up_to_5_students
## [1] 0.06708596
4.1 Testing Large Samples for Poisson Distribution
The Poisson distribution describes the probability of a certain number of events occurring in a fixed interval of time or space. It is often used to model the number of occurrences of rare events in a large population.
4.1.1 Testing Large Samples
When dealing with large samples, we can use the normal approximation to the Poisson distribution due to the Central Limit Theorem (CLT). The CLT states that the sampling distribution of the sample mean will be approximately normally distributed for large sample sizes, regardless of the distribution of the population.
4.1.2 Test Statistic
The test statistic for testing large samples for the Poisson distribution is given by:
\[ Z = \frac{\hat{\lambda} - \lambda_0}{\sqrt{\frac{\lambda_0}{n}}} \]
where: - \(\hat{\lambda}\) is the sample mean, - \(\lambda_0\) is the hypothesized mean under the null hypothesis, - \(n\) is the sample size.
Example
Suppose we are interested in testing whether the average number of accidents at a particular intersection is different from 5 per day. We collect a large sample of accident data and find that the sample mean is \(\hat{\lambda} = 6\) accidents per day.
Let’s perform the hypothesis test:
- Null Hypothesis (\(H_0\)): \(\lambda = \lambda_0 = 5\)
- Alternative Hypothesis (\(H_1\)): \(\lambda \neq \lambda_0\)
Given \(n = 1000\), we can use the test statistic formula to calculate the value of \(Z\).
Given: \[\begin{align*} \hat{\lambda} &= 6 \\ \lambda_0 &= 5 \\ n &\text{ is large} \end{align*}\]
We can calculate the test statistic: \[ Z = \frac{{6 - 5}}{{\sqrt{5}}} = \frac{1}{\sqrt{\frac{5}{1000}}} \]
[ Z
]
Conclusion
Since the sample size is large, we can use the standard normal distribution to find the critical values. At a significance level of \(\alpha = 0.05\), the critical values for a two-tailed test are \(\pm 1.96\).
Since \(-1.96 < 14.14 < 1.96\), we fail to reject the null hypothesis.
Therefore, there is not enough evidence to conclude that the average number of earthquakes per month is significantly different from 5.
# Define parameters
<- 6 # Sample mean
lambda_hat <- 5 # Hypothesized mean under null hypothesis
lambda_0 <- 1000 # Sample size
n
# Calculate test statistic
<- (lambda_hat - lambda_0) / sqrt(lambda_0 / n)
Z Z
## [1] 14.14214
4.2 Interval Estimation for Poisson Distribution
Suppose we have a sample from a population that follows a Poisson distribution with parameter \(\lambda\). We want to construct a confidence interval for the true value of \(\lambda\).
Method
For large samples, we can use the normal approximation to the Poisson distribution to construct a confidence interval for \(\lambda\). The confidence interval is given by:
\[ \left( \hat{\lambda} \pm Z_{\alpha/2} \sqrt{\frac{\hat{\lambda}}{n}} \right) \]
where:Example
Suppose we have a sample of earthquake occurrences in a region over a certain period of time. We want to estimate the average number of earthquakes per month, denoted by \(\lambda\), using a confidence interval.
Sample Information
Given: \[\begin{align*} n &= 50 \quad \text{(sample size)}, \\ \text{Total number of earthquakes observed} &= 60. \end{align*}\]
Method
To construct a confidence interval for \(\lambda\), we’ll use the normal approximation to the Poisson distribution.
The confidence interval is given by: \[ \hat{\lambda} \pm z_{\alpha/2} \sqrt{\frac{\hat{\lambda}}{n}} \] where:Calculation
Given: \[ \hat{\lambda} = \frac{\text{Total number of earthquakes observed}}{n} = \frac{60}{50} = 1.2 \] and using a 95% confidence level (\(\alpha = 0.05\)), the critical value \(z_{\alpha/2}\) is approximately 1.96.
The margin of error is: \[ \text{Margin of Error} = 1.96 \times \sqrt{\frac{1.2}{50}} \approx 0.3036 \]
Confidence Interval
Therefore, the 95% confidence interval for the average number of earthquakes per month is approximately: \[ 1.2 \pm 0.3036 = (0.8963637, 1.503636) \]
Conclusion
We are 95% confident that the true average number of earthquakes per month in the region falls within the interval (0.8963637, 1.503636).
# Given data
<- 60
total_earthquakes <- 50
sample_size
# Calculate sample mean
<- total_earthquakes / sample_size
lambda_hat
# Calculate standard error
<- sqrt(lambda_hat / sample_size)
se
# Critical value for 95% confidence level
<- qnorm(0.975)
z
# Calculate margin of error
<- z * se
margin_error
# Calculate confidence interval
<- lambda_hat - margin_error
lower_bound <- lambda_hat + margin_error
upper_bound
# Print results
cat("Sample mean (lambda hat):", lambda_hat, "\n")
## Sample mean (lambda hat): 1.2
cat("Margin of error:", margin_error, "\n")
## Margin of error: 0.3036363
cat("95% Confidence interval:", lower_bound, "-", upper_bound, "\n")
## 95% Confidence interval: 0.8963637 - 1.503636