Probability Distribution in R
Different probability distribution functions available in R: R provides built-in
functions for all major probability distributions. Each distribution uses four standard
functions:
d → Density / Probability mass function: Gives height of PDF (continuous) or
probability (discrete).
Example:
dnorm(0, mean=0, sd=1)
dbinom(4, size=10, prob=0.5)
dpois(3, lambda=4)
p → Cumulative distribution function (CDF): Probability that random
variable ≤ x.
Example:
pnorm(1.96)
pbinom(3, size=10, prob=0.4)
ppois(5, lambda=2)
q → Quantile function: Finds value of x for a given probability (inverse CDF).
Example:
qnorm(0.95, mean=0, sd=1)
qbinom(0.5, size=10, prob=0.4)
qpois(0.8, lambda=2)
r → Random number generation: Generates random samples from the
distribution.
Example:
rnorm(5, 50, 10)
rbinom(5, 10, 0.5)
rpois(5, lambda=3)
1. Normal Distribution: The Normal Distribution, also known as the
Gaussian distribution, is a continuous probability distribution that is
symmetric and bell-shaped.
Characteristics
Symmetric around the mean (μ).
Mean = Median = Mode.
Controlled by two parameters:
Mean (μ)
Standard Deviation (σ)
Total area under the curve = 1.
Follows the 68–95–99.7 rule:
68% of values lie within 1σ
95% within 2σ
99.7% within 3σ
Continuous probability distribution.
Symmetric bell-shaped curve defined by mean (μ) and standard
deviation (σ).
Used for heights, marks, measurement errors, etc.
R Functions:
dnorm(x, mean, sd)
pnorm(q, mean, sd)
qnorm(p, mean, sd)
rnorm(n, mean, sd)
Example:
rnorm(5, mean = 50, sd = 10)
1. Binomial Distribution: The Binomial Distribution is a discrete probability
distribution that describes the number of successes in a fixed number of
independent Bernoulli trials, each with the same probability of success.
Example:
Tossing a coin 10 times and counting number of Heads.
Conditions of Binomial Distribution : A distribution is Binomial if:
The number of trials n is fixed.
Each trial has two outcomes: Success/Failure.
Probability of success p remains constant.
All trials are independent.
Parameters
n = number of trials
p = probability of success
q = 1 – p = probability of failure
X ∼Binomial(n, p)
Discrete distribution.
Models number of successes in n independent trials with success probability
p.
Used for coin tosses, pass/fail outcomes, defect rate, etc.
R Functions:
dbinom(x, size, prob)
pbinom(q, size, prob)
qbinom(p, size, prob)
rbinom(n, size, prob)
Example:
dbinom(3, size = 10, prob = 0.5)
2. Poisson Distribution: The Poisson Distribution is a discrete probability
distribution that describes the number of events occurring in a fixed interval of
time or space, when:
Events occur independently
Events occur with a constant average rate (λ)
Two events cannot occur at the exact same instant (rare events)
Examples:
Number of emails received in 1 hour
Number of accidents per day
Number of customers arriving per minute
Parameter
λ (lambda) = average number of occurrences in the interval
Random variable: X ∼ Poisson(λ)
Features of Poisson Distribution
Mean = Variance = λ
Skewed to the right (especially for small λ)
Approximates Binomial(n, p) when:
n is large
p is small
λ = np
Discrete distribution representing number of events in a fixed interval.
Used for traffic flow, number of calls per hour, server requests, etc.
Parameter λ = average rate.
R Functions:
dpois(x, lambda)
ppois(q, lambda)
qpois(p, lambda)
rpois(n, lambda)
Example:
dpois(4, lambda = 2.5)
3. Exponential Distribution
Continuous distribution.
Used for time between two events in a Poisson process.
Parameter rate = λ.
R Functions:
dexp(x, rate) pexp(q, rate) qexp(p, rate) rexp(n, rate)
Example:
rexp(5, rate = 0.3)
Practice Question:
1. Generate 1000 random values from a normal distribution with mean = 50
and sd = 10.
Find the mean and standard deviation of the generated data.
Solution:
[Link](123)
x <- rnorm(1000, mean = 50, sd = 10)
mean(x)
sd(x)
2. Plot the probability density function (PDF) of a normal distribution with
mean 0 and sd 1.
Solution:
x <- seq(-4, 4, length = 100)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l", col = "blue", main = "Normal PDF")
3. Find the probability that a standard normal variable is less than 1.5.
Solution:
pnorm(1.5)
4. Compute the 95th percentile of a normal distribution (mean = 100, sd = 15).
Solution:
qnorm(0.95, mean = 100, sd = 15)
5. Generate 20 random numbers from a Poisson distribution with λ = 3.
Solution:
rpois(20, lambda = 3)
6. Find the probability of observing exactly 5 events in a Poisson distribution with λ
= 4.
Solution:
dpois(5, lambda = 4)
7. Find the cumulative probability of getting ≤ 3 events (Poisson, λ = 2.5).
Solution:
ppois(3, lambda = 2.5)
8. Generate a binomial distribution of 50 trials with probability of success = 0.4.
Solution:
rbinom(50, size = 1, prob = 0.4)
(Use size = 1 for Bernoulli trials; use size > 1 for binomial experiments.)
9. Find the probability of getting exactly 7 successes in 10 trials with p = 0.3.
Solution:
dbinom(7, size = 10, prob = 0.3)
10. Calculate P(X ≤ 4) for a Binomial distribution with n = 12, p = 0.5.
Solution:
pbinom(4, size = 12, prob = 0.5)
1. Write R code to generate 1000 random values from a normal
distribution (mean = 40, sd = 5). Plot the histogram and compute
mean and variance.
Answer:
[Link](123)
x <- rnorm(1000, mean = 40, sd = 5)
# Histogram
hist(x, col = "lightblue", main = "Normal Distribution")
# Mean and Variance
mean(x)
var(x)
In this we use following functions:
rnorm() generates normally distributed values.
hist() visualizes distribution shape.
mean() and var() compute central tendency and spread.
2. Describe the Binomial distribution and write R commands to
compute:
(a) P(X = 5) for n = 10, p = 0.4
(b) P(X ≤ 3)
Answer: The Binomial distribution models the number of successes in a fixed
number of independent trials where each trial has probability p of success.
(a) Probability of exactly 5 successes
#dbinom() gives point probability.
dbinom(5, size = 10, prob = 0.4)
(b) Cumulative probability for ≤ 3 successes
#pbinom() gives cumulative distribution values.
pbinom(3, size = 10, prob = 0.4)
3. Explain the Poisson distribution. Write R code to find the probability
of observing exactly 7 events in a Poisson distribution with λ = 5. Also
generate 20 random values.
Answer:: The Poisson distribution is used for modeling the number of events
occurring in a fixed interval of time or space when the events occur independently
with a constant average rate (λ). Poisson distribution is suitable for rare events (e.g.,
calls per minute, accidents per day). R provides dpois(), ppois(), qpois(), and rpois()
for calculations
(a) Probability of 7 events
dpois(7, lambda = 5)
(b) Generate 20 random values
rpois(20, lambda = 5)
4. Write an R program to simulate a dataset using Normal, Binomial, Poisson, and
Uniform distributions. Compare their shapes using plots and compute summary
statistics (mean, median, variance).
Answer:
[Link](123)
# 1. Normal distribution
normal_data <- rnorm(1000, mean = 50, sd = 10)
# 2. Binomial distribution
binom_data <- rbinom(1000, size = 20, prob = 0.4)
# 3. Poisson distribution
pois_data <- rpois(1000, lambda = 5)
# Plots
par(mfrow = c(2, 2))
hist(normal_data, main="Normal", col="lightblue")
hist(binom_data, main="Binomial", col="pink")
hist(pois_data, main="Poisson", col="lightgreen")
# Summary statistics
summary_list <- list(
Normal = summary(normal_data),
Binomial = summary(binom_data),
Poisson = summary(pois_data)
)
summary_list
Output Meaning:
Normal → bell-shaped, symmetric
Binomial → slightly skewed based on p
Poisson → right-skewed for smaller λ
Summary includes:
Mean, median, min, max, quartiles, and spread.
5. A company receives customer complaints at an average rate of 3 per hour. Using
Poisson and Exponential distributions, answer the following:
a) Find the probability of receiving exactly 5 complaints in an hour.
b) Find the probability that the time until the next complaint is less than 10
minutes.
c) Simulate 100 hours of complaint data in R.
Answer:
Given:
λ = 3 complaints/hour
10 minutes = 1/6 hour
Rate for exponential = 3
(a) P(X = 5) for Poisson
dpois(5, lambda = 3)
(b) P(T < 1/6) for Exponential
pexp(1/6, rate = 3)
(c) Simulation for 100 hours
[Link](100)
complaints <- rpois(100, lambda = 3)
complaints
6. Explain the Normal distribution in detail. Derive the properties and write R
programs to (a) generate 1000 random normal values, (b) plot PDF and CDF, and
(c) compute P(X < 60) for mean = 50 and sd = 8.
Answer:
Normal Distribution:
Continuous probability distribution with bell-shaped curve.
Defined by mean (μ) and standard deviation (σ).
Symmetric about the mean.
Mean = Median = Mode.
Total area under curve = 1.
Properties:
Symmetrical shape
Unimodal (single peak)
68–95–99.7 Rule
Linear transformation also yields a normal distribution
(a) Generate 1000 random values
[Link](123)
x <- rnorm(1000, mean = 50, sd = 8)
(b) Plot PDF and CDF
# PDF
curve(dnorm(x, mean = 50, sd = 8), from = 20, to = 80, main="Normal
PDF")
# CDF
curve(pnorm(x, mean = 50, sd = 8), from = 20, to = 80, main="Normal
CDF")
(c) Compute P(X < 60)
pnorm(60, mean = 50, sd = 8)
7. What is a Binomial distribution? Explain its assumptions and applications. Write
an R program to calculate (a) P(X = 8), (b) P(X ≤ 5), and (c) generate 200
binomial random values for n=20, p=0.3.
Answer:
Binomial Distribution:
Discrete distribution representing the number of successes in n independent
trials.
Assumptions:
Fixed number of trials (n).
Only two outcomes (success/failure).
Probability of success (p) is constant.
Trials are independent.
Applications:
Quality control
Coin toss experiments
Pass/Fail outcomes
Defective products count
Given:
n = 20, p = 0.3
(a) P(X = 8)
dbinom(8, size = 20, prob = 0.3)
(b) P(X ≤ 5)
pbinom(5, size = 20, prob = 0.3)
(c) Generate 200 binomial values
[Link](50)
x <- rbinom(200, size = 20, prob = 0.3)
8. Explain the Poisson distribution with examples. Using λ = 4, write R code to (a)
compute the probability of exactly 6 events, (b) probability of at most 3 events,
and (c) simulate one week of hourly Poisson arrivals.
Answer:
Poisson Distribution:
Models number of events occurring in a fixed interval
Events occur independently
Mean = Variance = λ
Applications:
Number of calls in an hour
Number of network packets per second
Number of accidents per day
Number of emails received
Given: λ = 4
(a) P(X = 6)
dpois(6, lambda = 4)
(b) P(X ≤ 3)
ppois(3, lambda = 4)
(c) Simulate one week hourly (7×24 = 168 hours)
[Link](99)
week_data <- rpois(168, lambda = 4)
9. A factory produces electrical components with a defect probability of 0.02. Using
Binomial and Poisson distributions
a) Compute the probability of finding exactly 3 defective items in a batch of
200.
b) Approximate the same using Poisson distribution.
Answer:
Given:
n = 200
p = 0.02
λ = np = 4
(a) Exact Binomial:
dbinom(3, size = 200, prob = 0.02)
(b) Poisson Approximation (λ = 4):
dpois(3, lambda = 4)
10. For a Normal distribution with μ = 70 and σ = 12, compute the following
using R:
a) P(X < 85)
b) P(60 < X < 90)
c) 90th percentile
d) Generate 500 random values and plot histogram with density curve.**
Answer:
(a) P(X < 85)
pnorm(85, mean = 70, sd = 12)
(b) P(60 < X < 90)
pnorm(90, 70, 12) - pnorm(60, 70, 12)
(c) 90th Percentile
qnorm(0.90, 70, 12)
(d) Generate & Plot
[Link](123)
x <- rnorm(500, 70, 12)
hist(x, probability = TRUE, col = "lightblue")
curve(dnorm(x, 70, 12), add = TRUE, col = "red")
Meaning of [Link](123): In R, many functions such as rnorm(), runif(), sample(),
rbinom(), etc., generate random numbers. But if you run the same random number
function again, the output will be different each time. To make random results
repeatable, we use:
[Link](123)
Example
[Link](123)
rnorm(3)
Output (always the same):
[1] -0.56048 -0.23018 1.55871
Run again with same seed → same result.
Run without seed → different result.