Chapter 4 Understanding the Poisson Distribution

Before we proceed, let’s quickly recap what the Poisson distribution entails. The Poisson distribution describes the probability of a certain number of events occurring in a fixed interval of time or space. It is often used to model the number of occurrences of rare events in a large population.

Example Scenarios

Phone Calls: Counting the number of phone calls received by a call center in an hour. The Poisson distribution helps us predict how many calls we might receive during a specific time period, such as getting exactly five calls in an hour.
Traffic Accidents: Recording the number of traffic accidents that occur at a particular intersection in a day. The Poisson distribution helps us estimate the probability of a certain number of accidents happening in a given timeframe, like having exactly two accidents in a day.
Typographical Errors: Counting the number of typographical errors in a book. The Poisson distribution can help us calculate the likelihood of having a specific number of errors in a given number of pages, such as having exactly ten errors in 100 pages.
Email Arrivals: Tracking the number of emails arriving in an inbox per minute. The Poisson distribution helps us understand the probability of receiving a certain number of emails in a short time span, like getting exactly three emails in a minute.

Poisson Distribution Example

Suppose we are interested in modeling the number of students visiting the library in an hour at the University of Gadau. Let’s assume that, on average, the library receives 10 students per hour.

Probability Mass Function

The Poisson distribution helps us calculate the probability of a certain number of events occurring in a fixed interval of time or space.

Let’s calculate the probability of having exactly 5 students visit the library in an hour using the Poisson distribution.

Given: \[\begin{align*} \lambda &= 10 \quad \text{(average number of students per hour)}, \\ k &= 5 \quad \text{(number of students we're interested in)}. \end{align*}\]

The probability mass function (PMF) of the Poisson distribution is given by: \[ P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!} \]

Plugging in the values: \[ P(X = 5) = \frac{e^{-10} \times 10^5}{5!} \]

\[ P(X = 5) = \frac{e^{-10} \times 100000}{120} \]

Calculating: \[ P(X = 5) \approx \frac{0.0000454 \times 100000}{120} \]

\[ P(X = 5) \approx \frac{4.54}{120} \]

\[ P(X = 5) \approx 0.0378 \]

Therefore, the probability of having exactly 5 students visit the library in an hour is approximately 0.0378.

Cumulative Distribution Function

The cumulative distribution function (CDF) of the Poisson distribution gives the probability of having up to a certain number of events occur.

Let’s calculate the probability of having up to 5 students visit the library in an hour.

Given: \[ \lambda = 10 \quad \text{(average number of students per hour)}, \\ k = 5 \quad \text{(number of students we're interested in)}. \]

The cumulative distribution function (CDF) of the Poisson distribution is given by: \[ P(X \leq k) = \sum_{i=0}^{k} \frac{e^{-\lambda} \lambda^i}{i!} \]

Plugging in the values: \[ P(X \leq 5) = \sum_{i=0}^{5} \frac{e^{-10} \times 10^i}{i!} \]

Calculating: \[ P(X \leq 5) = \frac{e^{-10} \times 10^0}{0!} + \frac{e^{-10} \times 10^1}{1!} + \frac{e^{-10} \times 10^2}{2!} + \frac{e^{-10} \times 10^3}{3!} + \frac{e^{-10} \times 10^4}{4!} + \frac{e^{-10} \times 10^5}{5!} \]

\[ P(X \leq 5) = e^{-10} + \frac{10e^{-10}}{1} + \frac{100e^{-10}}{2} + \frac{1000e^{-10}}{6} + \frac{10000e^{-10}}{24} + \frac{100000e^{-10}}{120} \]

\[ P(X \leq 5) \approx 0.0671 \]

Therefore, the probability of having up to 5 students visit the library in an hour is approximately 0.0671.

#PMF in R
lambda <- 10  # Average number of students visiting the library per hour
k <- 5  # Number of students we're interested in

# Calculate PMF for exactly 5 students
pmf_5_students <- dpois(k, lambda)
pmf_5_students

## [1] 0.03783327

# Calculate CDF for up to 5 students
cdf_up_to_5_students <- ppois(k, lambda)
cdf_up_to_5_students

## [1] 0.06708596

4.1 Testing Large Samples for Poisson Distribution

The Poisson distribution describes the probability of a certain number of events occurring in a fixed interval of time or space. It is often used to model the number of occurrences of rare events in a large population.

4.1.1 Testing Large Samples

When dealing with large samples, we can use the normal approximation to the Poisson distribution due to the Central Limit Theorem (CLT). The CLT states that the sampling distribution of the sample mean will be approximately normally distributed for large sample sizes, regardless of the distribution of the population.

4.1.2 Test Statistic

The test statistic for testing large samples for the Poisson distribution is given by:

\[ Z = \frac{\hat{\lambda} - \lambda_0}{\sqrt{\frac{\lambda_0}{n}}} \]

where: - \(\hat{\lambda}\) is the sample mean, - \(\lambda_0\) is the hypothesized mean under the null hypothesis, - \(n\) is the sample size.

Example

Suppose we are interested in testing whether the average number of accidents at a particular intersection is different from 5 per day. We collect a large sample of accident data and find that the sample mean is \(\hat{\lambda} = 6\) accidents per day.

Let’s perform the hypothesis test:

Null Hypothesis (\(H_0\)): \(\lambda = \lambda_0 = 5\)
Alternative Hypothesis (\(H_1\)): \(\lambda \neq \lambda_0\)

Given \(n = 1000\), we can use the test statistic formula to calculate the value of \(Z\).

Given: \[\begin{align*} \hat{\lambda} &= 6 \\ \lambda_0 &= 5 \\ n &\text{ is large} \end{align*}\]

We can calculate the test statistic: \[ Z = \frac{{6 - 5}}{{\sqrt{5}}} = \frac{1}{\sqrt{\frac{5}{1000}}} \]

[ Z

]

Conclusion

Since the sample size is large, we can use the standard normal distribution to find the critical values. At a significance level of \(\alpha = 0.05\), the critical values for a two-tailed test are \(\pm 1.96\).

Since \(-1.96 < 14.14 < 1.96\), we fail to reject the null hypothesis.

Therefore, there is not enough evidence to conclude that the average number of earthquakes per month is significantly different from 5.

# Define parameters
lambda_hat <- 6  # Sample mean
lambda_0 <- 5  # Hypothesized mean under null hypothesis
n <- 1000  # Sample size

# Calculate test statistic
Z <- (lambda_hat - lambda_0) / sqrt(lambda_0 / n)
Z

## [1] 14.14214

4.2 Interval Estimation for Poisson Distribution

Suppose we have a sample from a population that follows a Poisson distribution with parameter \(\lambda\). We want to construct a confidence interval for the true value of \(\lambda\).

Method

For large samples, we can use the normal approximation to the Poisson distribution to construct a confidence interval for \(\lambda\). The confidence interval is given by:

\[ \left( \hat{\lambda} \pm Z_{\alpha/2} \sqrt{\frac{\hat{\lambda}}{n}} \right) \]

where:

Example

Suppose we have a sample of earthquake occurrences in a region over a certain period of time. We want to estimate the average number of earthquakes per month, denoted by \(\lambda\), using a confidence interval.

Sample Information

Given: \[\begin{align*} n &= 50 \quad \text{(sample size)}, \\ \text{Total number of earthquakes observed} &= 60. \end{align*}\]

Method

To construct a confidence interval for \(\lambda\), we’ll use the normal approximation to the Poisson distribution.

The confidence interval is given by: \[ \hat{\lambda} \pm z_{\alpha/2} \sqrt{\frac{\hat{\lambda}}{n}} \] where:

Calculation

Given: \[ \hat{\lambda} = \frac{\text{Total number of earthquakes observed}}{n} = \frac{60}{50} = 1.2 \] and using a 95% confidence level (\(\alpha = 0.05\)), the critical value \(z_{\alpha/2}\) is approximately 1.96.

The margin of error is: \[ \text{Margin of Error} = 1.96 \times \sqrt{\frac{1.2}{50}} \approx 0.3036 \]

Confidence Interval

Therefore, the 95% confidence interval for the average number of earthquakes per month is approximately: \[ 1.2 \pm 0.3036 = (0.8963637, 1.503636) \]

Conclusion

We are 95% confident that the true average number of earthquakes per month in the region falls within the interval (0.8963637, 1.503636).

# Given data
total_earthquakes <- 60
sample_size <- 50

# Calculate sample mean
lambda_hat <- total_earthquakes / sample_size

# Calculate standard error
se <- sqrt(lambda_hat / sample_size)

# Critical value for 95% confidence level
z <- qnorm(0.975)

# Calculate margin of error
margin_error <- z * se

# Calculate confidence interval
lower_bound <- lambda_hat - margin_error
upper_bound <- lambda_hat + margin_error

# Print results
cat("Sample mean (lambda hat):", lambda_hat, "\n")

## Sample mean (lambda hat): 1.2

cat("Margin of error:", margin_error, "\n")

## Margin of error: 0.3036363

cat("95% Confidence interval:", lower_bound, "-", upper_bound, "\n")

## 95% Confidence interval: 0.8963637 - 1.503636