NAME OF STUDENT SUSHIL VERMA
PROGRAMME: MASTER OF COMPUTER APPLICATION
SEMESTER: 3rd
COURSE CODE & NAME: DCA7101 – PROBABILITY AND STATISTICS
ROLL NUMBER: 2414101873
EMAIL: flpernodmeerut@[Link]
SET-1
Ans. – 1-
a) Definition and Explanation of Conditional Probability and
Bayes' Theorem
1. Conditional Probability: Definition:
Conditional probability is the probability of an event occurring given that
another event has already occurred. It is denoted as:
P ( A ∩B)
P(A∣B)= , provided P(B)>0
P( B)
Explanation: P(A∣B) means "the probability of event A occurring given
that event B has occurred." It reflects how the knowledge of event B
affects our belief in the likelihood of event A.
Example: Suppose 10% of a population has a disease (event D), and
90% does not. A test detects the disease correctly 95% of the time (true
positive), but has a 5% false positive rate.
Let: P(D)=0.10P(D) = 0.10
P(Positive∣D)=0.95
P(Positive∣¬D)=0.05
We can use conditional probability to find the chance that someone has
the disease given they test positive.
2. Bayes’ Theorem
Definition:
Bayes' Theorem relates the conditional and marginal probabilities of
random events. It is formulated as:
P (B ∣ A) ⋅ P( A)
P(A∣B)=
P(B)
Explanation: It provides a way to update the probability of a hypothesis
(A) in light of new evidence (B).
It "reverses" conditional probabilities.
Continuing the Example: We want to know:
P ( Positive∣ D)⋅ P(D)
P(D∣Positive)=
P(Positive)
First, calculate P(Positive):
P(Positive)=P(Positive∣D)⋅P(D)+P(Positive∣¬D)⋅P(¬D)=(0.95)(0.10)+(0.05)
(0.90)=0.095+0.045=0.14
Then,
P(D∣Positive)=0.95⋅0.10\0.14=0.095\0.14≈0.679
So even with a positive test, the chance the person actually has the
disease is about 67.9%.
Real-World Applications
1. Medical Diagnosis:
Doctors use Bayes’ theorem to update the probability of a
disease given new test results or symptoms.
E.g., after a positive mammogram, doctors evaluate the real
chance of breast cancer using prior probability (age, family
history, etc.).
2. Spam Filtering:
Email filters use Bayes’ theorem to classify emails. Words in
an email update the probability that the email is spam.
3. Financial Risk Assessment:
Conditional probability is used to assess the likelihood of loan
default given a credit score or employment status.
4. Machine Learning & AI:
Naive Bayes classifiers are built on Bayes' theorem for tasks
like sentiment analysis, classification, etc.
5. Weather Forecasting:
Meteorologists update the probability of rain given satellite
images and humidity data using conditional probability.
So, it helps in rational decision-making under uncertainty in
many fields.
Ans. 1B-
This is a classic Bayes' Theorem problem. Let's define the events:
D: The person has the disease
¬D: The person does not have the disease
+¿¿
T : The test result is positive
Given:
P(D) = 0.01 (1% of the population has the disease)
P(¬D) = 0.99 (99% do not have the disease)
P(T ∣D) = 0.99 (True positive rate = 99%)
+¿¿
P(T ∣¬D) = 0.05 (False positive rate = 5%)
+¿¿
We want to find the probability that a person actually has the disease
+¿¿
given that they tested positive, i.e.: P(D∣T )
Step 1: Use Bayes’ Theorem
+¿¿
P(D∣T )= P ¿ ¿
First, we compute P(T +¿¿), the total probability of testing positive:
P(T )=P(T ∣D)⋅P(D)+P(T ∣¬D)⋅P(¬D)
+¿¿ +¿¿ +¿¿
=(0.99)(0.01)+(0.05)(0.99)=0.0099+0.0495=0.0594
Step 2: Plug into Bayes’ Formula
+¿¿
P(D∣T )=0.99⋅0.010.0594=0.00990.0594≈0.1667
Final Answer: 16.67%
Even though the test is highly accurate, the probability that a person
actually has the disease given a positive result is only about 16.67%, due
to the rarity of the disease and the non-zero false positive rate.
This illustrates the importance of considering base rates when interpreting
test results.
Ans. 2- a) Two-Dimensional Random Variable Definition:
A two-dimensional random variable (also called a bivariate random variable) consists of a
pair of random variables (X, Y) defined on the same probability space. These variables can be
discrete, continuous, or a mix of both. This structure allows us to analyse the probabilistic
behaviour of two related quantities simultaneously.
Joint Distribution
The joint distribution of X and Y gives the probability that each pair (x, y) occurs:
Discrete case: The joint probability mass function (pmf) is:
P(X= x, Y= y) = p(x, y)
Continuous case: The joint probability density function (pdf) is:
∂2
f(x,y)= P(X≤x,Y≤y)
∂ x∂ y
The joint distribution provides a complete description of the probabilistic behaviour of both
variables together.
Marginal Distribution
The marginal distribution describes the probability distribution of one variable regardless of
the other:
For a discrete variable:
P(X=x)=∑ p (x,y), P(Y=y)=∑ p (x,y)
y x
For a continuous variable:
fX(x)= ∫ f (x,y) dy, fY(y)= (x,y) dx
∞ ∞
∫f
−∞ −∞
These are useful for analysing each variable independently.
Conditional Distribution
The conditional distribution describes the distribution of one variable given a fixed value of
the other: Discrete:
P ( X =x , Y = y )
P(X=x∣Y=y)= (if P(Y=y)>0)
P (Y = y )
Continuous:
f (x , y)
f(x∣y)= (if fY(y)>0)
fY ( y )
This is crucial for understanding dependency or influence between variables.
Real-World Applications
Understanding joint, marginal, and conditional distributions is foundational in data science
and statistics:
1. Medical Diagnosis
Variables: X= test result, Y= actual disease status
Conditional probabilities help compute the probability of disease given a test result
(Bayes' theorem).
2. Marketing Analytics
Variables: X= age group, Y= product purchase
Joint and marginal distributions identify target demographics and consumer behavior.
3. Machine Learning
In classification, conditional probabilities P(Y∣X) are estimated for predicting labels Y
based on features X.
4. Finance
Variables: X= return of asset A, Y= return of asset B
Joint distributions are used to study correlation, portfolio risk, and co-movement of
assets.
These tools allow analysts to uncover relationships, dependencies, and insights critical for
decision-making across various domains.
Ans. 2b- To solve this, we’ll first lay out the joint probability distribution of X and Y in a
table and compute marginal distributions for X and Y, then use those to compute
expectations, variances, and covariance.
Given Joint Probability Table:
X\Y -1 0 1 P_X(x)
-1 0.1 0.1 0 0.2
0 0.1 0.2 0.1 0.4
1 0 0.1 0.3 0.4
P_Y(y) 0.2 0.4 0.4
a) E(X)
Use marginal distribution P_X(x):
E(X)=∑ x ⋅ p (x) = (−1)(0.2) + (0)(0.4) + (1)(0.4)=−0.2+0+0.4=0.2
x
x
b) E(Y)
Use marginal distribution PY(y)P_Y(y):
E(Y)= ∑ y ⋅ p (y)=(−1)(0.2)+(0)(0.4)+(1)(0.4)=−0.2+0+0.4=0.2
y
y
c) Var(X)
We need E( x 2):
E( X 2 ) = ∑ x ⋅ p x(x) = (−1)2 (0.2) + 02 (0.4) + 12 (0.4) = 0.2+0+0.4=0.6
2
Var(X) = E( X 2 )−[E(X¿ ¿2 = 0.6 − ( 0.2 ¿ ¿2 = 0.6 − 0.04 = 0.56
d) Cov(X,Y)
We use:
Cov(X,Y) = E(XY)−E(X)E(Y)
Compute E(XY):
Multiply each x⋅y⋅P(X = x, Y= y):
E(XY)=(−1)(−1)(0.1)+(−1)(0)(0.1)+(−1)(1)(0)
+(0)(−1)(0.1)+(0)(0)(0.2)+(0)(1)(0.1)
+(1)(−1)(0)+(1)(0)(0.1)+(1)(1)(0.3)
=0.1+0+0+0+0+0+0+0+0.3=0.4
Final Answers:
a) E(X) = 0.2
b) E(Y) = 0.2
c) Var(X) = 0.56
d) Cov(X,Y) = 0.36
Ans. 3a- The Moment Generating Function (MGF) of a random variable X is defined as:
M x (t) = E[e tx]
for all values of t in some neighborhood around 0 where the expectation exists. The MGF
uniquely characterizes the distribution of a random variable (if it exists) and is widely used in
probability and statistics.
Deriving Moments from MGF
The key property of the MGF is that its derivatives at t = 0 give the moments of the random
variable:
The n-th moment about the origin is:
E
where MX(n)(0)M_X^{(n)}(0) is the nn-th derivative of MX(t)M_X(t) evaluated at t=0t = 0.
Thus:
Mean (first moment): μ=E[X]=MX′(0)\mu = \mathbb{E}[X] = M_X'(0)
Second moment: E[X2]=MX′′(0)\mathbb{E}[X^2] = M_X''(0)
Variance: Var(X)=E[X2]−(E[X])2=MX′′(0)−[MX′(0)]2\text{Var}(X) = \mathbb{E}
[X^2] - (\mathbb{E}[X])^2 = M_X''(0) - [M_X'(0)]^2
Practical Use
MGFs are especially useful when working with sums of independent random variables, as the
MGF of a sum equals the product of the individual MGFs. This property is critical in proving
the Central Limit Theorem and determining the distribution of linear combinations of
variables.
In summary, the MGF provides a powerful tool for calculating moments and understanding
the distributional behavior of random variables in both theoretical and applied statistics.
Ans. 3b-
Standard Deviation and Coefficient of Variation
Standard Deviation (SD)
Standard deviation is a measure of the spread or dispersion of a set of values around the
mean. For a random variable XX, the standard deviation is defined as:
σ=Var(X)=E[(X−μ)2]\sigma = \sqrt{\text{Var}(X)} = \sqrt{\mathbb{E}[(X - \mu)^2]}
Where:
μ=E[X]\mu = \mathbb{E}[X] is the mean of XX,
σ\sigma represents the standard deviation.
It gives an absolute measure of how much values deviate from the mean, expressed in the
same units as the data.
Coefficient of Variation (CV)
The coefficient of variation is a relative measure of dispersion and is defined as:
CV=σμ×100%CV = \frac{\sigma}{\mu} \times 100\%
Where:
σ\sigma is the standard deviation,
μ\mu is the mean.
CV is unitless and expresses the standard deviation as a percentage of the mean.
Significance
Standard deviation is useful for assessing the absolute variability in a dataset.
Coefficient of variation is useful for comparing relative variability across datasets
with different units or means.
When CV is Preferred Over SD
Comparing Variability Across Different Units
Example: Comparing the volatility of returns between two financial assets with
different price levels.
When the Mean Is Non-zero and Positive
CV standardizes the risk relative to expected return, which is essential in finance,
economics, and engineering.
Normalization in Quality Control and Production
CV helps assess process consistency across different manufacturing lines or
conditions.
In Summary, SD: Absolute dispersion; sensitive to scale.
CV: Relative dispersion; enables comparison across different datasets or units.
Use CV when relative variability matters more than absolute spread.
Ans. 4a-
To calculate the arithmetic mean using the direct method, we use the formula:
Xˉ=∑fX∑f\bar{X} = \frac{\sum fX}{\sum f}
Where: XX = wage
ff = frequency (number of workers)
Step 1: Create the table
Wages (X)(X) Workers (f)(f) f⋅Xf \cdot X
15 3 45
25 4 100
35 6 210
45 3 135
55 4 220
Total ∑f=20\sum f = 20 ∑fX=710\sum fX = 710
Step 2: Apply the formula
Xˉ=71020=35.5\bar{X} = \frac{710}{20} = \boxed{35.5}
Final Answer: The arithmetic mean of the wages is Rs. 35.5.
Ans. 4b-
To calculate the second and third central moments, we follow these steps:
Step 1: Compute the Arithmetic Mean Xˉ\bar{X}
Xˉ=∑fX∑f\bar{X} = \frac{\sum fX}{\sum f}
XX ff f⋅Xf \cdot X
1 3 3
2 2 4
3 4 12
4 6 24
5 5 25
Total 20 68
Xˉ=6820=3.4\bar{X} = \frac{68}{20} = 3.4
Step 2: Calculate (X−Xˉ)2(X - \bar{X})^2 and (X−Xˉ)3(X - \bar{X})^3
X−XˉX - \ (X−Xˉ)2(X - \ f⋅(X−Xˉ)2f \cdot (X−Xˉ)3(X - \ f⋅(X−Xˉ)3f \cdot (X
XX ff
bar{X} bar{X})^2 (X - \bar{X})^2 bar{X})^3 - \bar{X})^3
1 3 -2.4 5.76 17.28 -13.824 -41.472
2 2 -1.4 1.96 3.92 -2.744 -5.488
3 4 -0.4 0.16 0.64 -0.064 -0.256
4 6 0.6 0.36 2.16 0.216 1.296
5 5 1.6 2.56 12.8 4.096 20.48
36.8 -25.44
Step 3: Central Moments
Second central moment (Variance):
μ2=∑f(X−Xˉ)2∑f=36.820=1.84\mu_2 = \frac{\sum f(X - \bar{X})^2}{\sum f} = \frac{36.8}
{20} = \boxed{1.84}
Third central moment:
μ3=∑f(X−Xˉ)3∑f=−25.4420=−1.272\mu_3 = \frac{\sum f(X - \bar{X})^3}{\sum f} = \frac{-
25.44}{20} = \boxed{-1.272}
Step 4: Comment on Skewness
Since μ3<0\mu_3 < 0, the distribution is negatively skewed (left-skewed).
This means the left tail is longer or the mass of the distribution is concentrated on the
right.
Final Answers:
Second central moment (variance) μ2=1.84\mu_2 = \boxed{1.84}
Third central moment μ3=−1.272\mu_3 = \boxed{-1.272}
Skewness: The distribution is negatively skewed.
Ans. 5a-
Karl Pearson’s Coefficient of Correlation
Karl Pearson’s Coefficient of Correlation (denoted by rr) is a statistical measure that
quantifies the strength and direction of a linear relationship between two variables XX and
YY. It is defined as:
r=Cov(X,Y)σXσYr = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}
Where:
Cov(X,Y)\text{Cov}(X, Y) is the covariance between XX and YY,
σX\sigma_X and σY\sigma_Y are the standard deviations of XX and YY.
Alternatively, for sample data:
r=∑(xi−xˉ)(yi−yˉ)∑(xi−xˉ)2⋅∑(yi−yˉ)2r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\
sum (x_i - \bar{x})^2 \cdot \sum (y_i - \bar{y})^2}}
Properties of Pearson’s rr:
1. Range:
−1≤r≤1-1 \leq r \leq 1
2. Interpretation:
r=1r = 1: Perfect positive linear correlation
r=−1r = -1: Perfect negative linear correlation
r=0r = 0: No linear correlation
3. Symmetric:
r(X,Y)=r(Y,X)r(X, Y) = r(Y, X)
4. Unit-free:
It is a pure number; not affected by changes in scale or units.
5. Only measures linear association:
It does not detect non-linear relationships.
Significance in Statistical Analysis
Relationship Strength: Helps quantify how strongly two variables are related.
Predictive Modeling: High correlation indicates one variable can help predict the
other.
Feature Selection: In machine learning, correlated features may be redundant.
Economics & Finance: Used to analyze the relationship between economic indicators
(e.g., income vs. expenditure).
Social Sciences: Measures associations like education vs. income, stress vs.
performance, etc.
In Summary, Karl Pearson’s correlation coefficient is a widely-used, objective, and simple
measure of linear dependence. Its value helps analysts understand, visualize & model the
relationship between two quantitative variables effectively.
Ans. 5b-
To fit a trend line using the method of semi-averages, we have to follow these steps:
Step 1: Divide the data into two equal parts
The data has 6 years, so we divide into two halves:
First half: 2012, 2013, 2014 → Sales: 120, 135, 150
Second half: 2015, 2016, 2017 → Sales: 165, 180, 195
Step 2: Compute the average of each half
Average of first half:
Avg1=120+135+1503=4053=135\text{Avg}_1 = \frac{120 + 135 + 150}{3} = \frac{405}
{3} = 135
Average of second half:
Avg2=165+180+1953=5403=180\text{Avg}_2 = \frac{165 + 180 + 195}{3} = \frac{540}
{3} = 180
Step 3: Determine the midpoint of each half
Midpoint of first half: 2013
Midpoint of second half: 2016
Now, we have two points on the trend line:
Point 1: (2013,135)(2013, 135)
Point 2: (2016,180)(2016, 180)
Step 4: Find the equation of the trend line
The trend line is of the form:
Y=a+bXY = a + bX
We first convert years into simpler xx values (take 2014 as origin):
Year x=Year−2014x = \text{Year} - 2014
2013 -1
2016 2
Now the points are:
(−1,135)(-1, 135)
(2,180)(2, 180)
Find the slope bb:
b=180−1352−(−1)=453=15b = \frac{180 - 135}{2 - (-1)} = \frac{45}{3} = 15
Use point-slope form with x=−1x = -1, Y=135Y = 135:
Y−135=15(x+1)⇒Y=15x+150Y - 135 = 15(x + 1) \Rightarrow Y = 15x + 150
Final Trend Line Equation:
Y=15x+150\boxed{Y = 15x + 150}
Where:
x=Year−2014x = \text{Year} - 2014
YY is the estimated sales in lakhs
Interpretation: The trend suggests sales are increasing by 15 lakhs per year.
The estimated sales in 2014 (when x=0x = 0) is Y=150Y = 150 lakhs.
Ans. 6a-
One-Tailed vs Two-Tailed Tests in Hypothesis Testing
In hypothesis testing, we evaluate claims about a population parameter (like a mean or
proportion) using sample data. The direction of the test depends on the alternative hypothesis
(H₁).
1. One-Tailed Test
A one-tailed test checks for an effect in only one direction — either greater than or less than a
specified value.
Types:
Right-tailed test:
H0:μ≤μ0H_0: \mu \leq \mu_0
H1:μ>μ0H_1: \mu > \mu_0
Left-tailed test:
H0:μ≥μ0H_0: \mu \geq \mu_0
H1:μ<μ0H_1: \mu < \mu_0
Example (Right-tailed):
A machine is supposed to fill bottles with at least 500 ml. A test is conducted to check if it
fills more.
H0:μ=500H_0: \mu = 500
H1:μ>500H_1: \mu > 500
You'd use a right-tailed test since you're only interested in detecting if the mean is greater
than 500.
Diagram:
In a right-tailed test, the critical region is on the right side of the distribution.
|-----------------------------|-------------------> X
↑
Critical Value (Z or t)
(Reject H₀ if test statistic > critical value)
2. Two-Tailed Test
A two-tailed test checks for any significant difference — either higher or lower than the
hypothesized value.
Hypotheses:
H0:μ=μ0H_0: \mu = \mu_0
H1:μ≠μ0H_1: \mu \ne \mu_0
Example:
A manufacturer claims the average life of a battery is 100 hours. A researcher wants to test if
it is different from 100 (not specifically greater or less).
H0:μ=100H_0: \mu = 100
H1:μ≠100H_1: \mu \ne 100
This calls for a two-tailed test.
Diagram:
In a two-tailed test, the critical regions are on both sides of the distribution.
<--- Critical ---|-----------------|--- Critical --->
| |
Lower Upper
Critical Value Critical Value
(Reject H₀ if test statistic falls in either tail)
Conclusion:
Use a one-tailed test when you’re only interested in one direction of deviation.
Use a two-tailed test when you want to test for any difference, regardless of direction.
Ans. 6b-
To test the hypothesis that the population mean is 70 using a t-test, we follow these steps:
Step 1: State the Hypotheses
We test whether the population mean is different from 70.
Null Hypothesis (H₀): μ=70\mu = 70
Alternative Hypothesis (H₁): μ≠70\mu \ne 70 (two-tailed test)
Step 2: Given Data
Sample mean xˉ=68\bar{x} = 68
Population mean μ0=70\mu_0 = 70
Sample standard deviation s=5s = 5
Sample size n=10n = 10
Significance level α=0.05\alpha = 0.05
Step 3: Compute the Test Statistic
Use the t-statistic formula:
t=xˉ−μ0s/n=68−705/10=−25/3.162=−21.581≈−1.265t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}
= \frac{68 - 70}{5 / \sqrt{10}} = \frac{-2}{5 / 3.162} = \frac{-2}{1.581} \approx -1.265
Step 4: Find the Critical t-value
Degrees of freedom df=n−1=9df = n - 1 = 9
At α=0.05\alpha = 0.05, two-tailed test:
tcritical=±t0.025,9≈±2.262t_{\text{critical}} = \pm t_{0.025, 9} \approx \pm 2.262
If ∣t∣>tcritical|t| > t_{\text{critical}}, reject H0H_0
Step 5: Decision Rule
Here, ∣t∣=1.265<2.262|t| = 1.265 < 2.262
Conclusion: Since the calculated t-value does not exceed the critical value, we fail to reject
the null hypothesis. There is not enough evidence at the 5% significance level to conclude
that the population mean is different from 70.