0% found this document useful (0 votes)
56 views39 pages

Comprehensive Guide to Statistics

The document is a compilation of mathematics materials focused on statistics, covering discrete and continuous probability distributions, hypothesis testing, and their applications in various fields. It includes detailed explanations of binomial and Poisson distributions, their properties, formulas, and examples for practical understanding. The materials are applicable to courses in information technology, engineering, applied sciences, and business management.

Uploaded by

Thomas
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views39 pages

Comprehensive Guide to Statistics

The document is a compilation of mathematics materials focused on statistics, covering discrete and continuous probability distributions, hypothesis testing, and their applications in various fields. It includes detailed explanations of binomial and Poisson distributions, their properties, formulas, and examples for practical understanding. The materials are applicable to courses in information technology, engineering, applied sciences, and business management.

Uploaded by

Thomas
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Title Polytechnic Mathematics Materials Compilation Series – Statistics

Editor AprilDolphin
Date 8/4/2025

Topic [Category: Discrete Probability Distribution] Page


Discrete Probability Distribution 2
Binomial Distribution 3
Poisson Distribution – Fundamentals 8
Poisson Distribution as Approximation to Binomial Distribution with large 11
values of 𝑛 and small values of 𝑝

Topic [Category: Continuous Probability Distribution] Page


Continuous Probability Distribution 13
Normal Distribution – Fundamentals 15
Normal Distribution – Distribution of Sample Means 20
Normal Distribution – Central Limit Theorem 22
Confidence Intervals with Normal and T Distribution – Basics 23
Confidence Interval – Calculation 24

Topic [Category: Hypothesis Testing] Page


Hypothesis Testing with Normal and T Distribution 27
Chi-Square Test for Goodness of Fit 32
Chi-Square Test for Independence 34

Miscellaneous Page
Instruction on Use of Standard Normal Table – Probability from Far-Left 37
(Negative Infinity) of Normal Distribution of the Normal Distribution to
Z-Score

Applicable to the following Courses


• Information Technology – Statistics/Business Statistics/Computing Mathematics
• Engineering – Statistics/Engineering Mathematics
• Applied Sciences – Biostatistics
• Business Management – Business Statistics
Title Discrete Probability Distribution
Author Liu Hui Ling, Ngee Ann Polytechnic
Date 17/10/2018

Discrete Probability Distribution has the following properties.


• Takes in discrete variables (Whole number values 𝑘 , where 𝑘 ≥ 0)
• Countable number of values involved
• Takes in random variables (Sum of all probabilities must be equal to 1)

Example
Number of 0 1 2 3
Events
Probability 0.5 0.25 0.10 0.15

In the case of the Binomial Distribution, as represented by the formula below,


𝑛
𝑃(𝑋 = 𝑘 ) = ( ) 𝑝𝑘 (1 − 𝑝)𝑛−𝑘
𝑘

The following limitation is imposed, as any values that doesn’t comply to the following
limitation is undefined.
0≤𝑘≤𝑛

In the case of Poisson Distribution, as represented by the formula below,


𝜇𝑘
𝑃(𝑋 = 𝑘 ) = 𝑒 −𝜇 ( )
𝑘!
Where 0 ≤ 𝑘 < ∞

The inequality 0 ≤ 𝑘 < ∞, implies the number of events you are performing the
probability calculations for can be any finite whole number greater than or equal to 0.

While there is no upper limit to the value of 𝑘, a theorem guarantees the value of all
probability will sum up to 1:
As the probability within a Binomial Distribution approaches 0 and the number of trials
approaches infinity. The Binomial Distribution will converge to the Poisson Distribution.

This implies that the Poisson Distribution is just a special case of Binomial Distribution,
which means the probability will still sum up to 1 anyway.
Title Polytechnic and A Level H2 Mathematics (Statistics) Binomial
Distribution
Author Lim Wang Sheng, School of Information Technology, Nanyang
Polytechnic
[CCA: NYP Mentoring Club]
Date 9/6/2018

Applicable to the following levels


✓ School of Information Technology Students (Computing Mathematics)
✓ School of Engineering (Engineering Mathematics – Statistical Analysis)
✓ School of Business Management (Statistics – Business Statistics)
✓ School of Chemical and Life Sciences – Biostatistics
✓ JC/MI Students – H2 Mathematics – Statistics

Due to my school’s syllabus, it may or may not cover everything required for H2
Mathematics. JC/MI students should see referring to this guide as a last resort if you still
don’t know the basics.

To use the binomial distribution, the following requirements must be met.


➢ There will only be 2 possible outcomes (Success/Failure, Yes/No, etc.)
➢ Each trial is an independent event (that is, will not affect the subsequent trial or
be affected by past trial)

You must also know the following information or able to derive the following details
➢ You know the probability of each trial
➢ You are given the total number of trials and the number of trials the probability is
being calculated for, which will be shown in notation form in the next few pages.

Formula for Binomial Distribution Probability Given as Follows

𝑛
𝑃(𝑋 = 𝑘 ) = ( ) 𝑝𝑘 (1 − 𝑝)𝑛−𝑘
𝑘

[It may be written slightly differently in other textbooks. But they should mean the same
thing.]
Notation Meaning
𝑃 (𝑋 = 𝑘 ) Probability of obtaining an outcome, given the variable or the
number of trials being calculated for will be exactly equal to 𝑘
Or simply put, the number of trials the outcome is being
calculated for
𝑛 Total number of combinations the 2 outcomes can be
( )
𝑘 rearranged
𝑛 refers to the total number trials
𝑘 refers to the number of trials the outcome is being is being
calculated for

𝑝𝑘 The probability the outcome you are finding for after 𝑘


number of independent trials. (Example, the outcome can be
Yes or Success)
(1 − 𝑝)𝑛−𝑘 The probability of obtaining the alternate outcome after
𝑛 − 𝑘 number of independent trials. (Example, if your
outcome is Yes or Success, then the alternate corresponding
outcome are No or Failure respectively.)
𝑋~𝐵(𝑛, 𝑝) The random variable 𝑋 is to follow a binomial distribution,
over 𝑛 number of independent trials, which trial shall have a
𝑝 probability of obtain the outcome mentioned in question.

Formula List for Analyzing a Binomial Distribution


Formula for Mean (𝜇) 𝜇 = 𝑛𝑝
(Also called Expected Value)
Formula for Variance (𝜎 2 ) 𝜎 2 = 𝑛𝑝(1 − 𝑝)
Formula for Standard Deviation (𝜎) 𝜎 = √𝑛𝑝(1 − 𝑝)
Binomial Distribution Questions and Example
[Section I]: Basic Calculation
Q1: Given the following binomial distribution and information.
𝑋~𝐵(5,0.3)
Evaluate the following
(a) 𝑃(𝑋 = 2)
(b) 𝑃(𝑋 < 2)
(c) 𝑃(𝑋 < 3)
(d) 𝑃(𝑋 ≥ 2)

Q1(a)
5
𝑃(𝑋 = 2) = ( ) 0.32 (1 − 0.3)5−2 = 10(0.09)(0.343) = 0.3087
2
Q1(b)
𝑃 (𝑋 < 2) = 𝑃 (𝑋 = 0) + 𝑃 (𝑋 = 1)

5
𝑃(𝑋 = 0) = ( ) 0.30 (1 − 0.3)5−0 = 1(0.3)0 (0.7)5−0 = 0.16807
0

5
𝑃(𝑋 = 1) = ( ) 0.31 (1 − 0.3)5−1 = 5(0.3)1 (0.7)5−1 = 0.36015
1

𝑃(𝑋 < 2) = 0.16807 + 0.36015 = 0.52822

Q1(c)
𝑃(𝑋 < 3) = 1 − [𝑃(𝑋 = 3) + 𝑃(𝑋 = 4) + 𝑃(𝑋 = 5)]
[Values of all probabilities in binomial distribution must sum up to 1]
**Use the method that require the least number of calculation.

5
𝑃(𝑋 = 3) = ( ) 0.33 (1 − 0.3)5−3 = 10(0.027)(0.7)2 = 0.1323
3

5
𝑃(𝑋 = 4) = ( ) 0.34 (1 − 0.3)5−4 = 5(0.0081)(0.7) = 0.02835
4

5
𝑃(𝑋 = 5) = ( ) 0.35 (1 − 0.3)5−5 = 1(0.00243)(1) = 0.00243
5
𝑃(𝑋 = 3) + 𝑃(𝑋 = 4) + 𝑃(𝑋 = 5) = 0.16308

𝑃(𝑋 < 3) = 1 − 0.16308 = 0.83692


Q1(d) [From Answers Derived in Q1(b)]
𝑃(𝑋 ≥ 2) = 1 − [𝑃(𝑋 = 0) + 𝑃(𝑋 = 1)] = 1 − (0.16807 + 0.3015)
= 1 − 0.46458
= 0.53542

Section II (Application of Binomial Distribution)


Q2
A survey indicates that 60% of the school’s student population is interested to
participate in an event. You randomly selected 7 students who had participated in the
survey.

(a) Is binomial distribution suitable for this question, please justify your answer.
(b) Find the probability that exactly 4 students are interested in the event.
(c) Find the probability that at most 3 students are interested in the event.
(d) Find the expected value, standard deviation and variance of the distribution.

(a) Yes. Every student’s interest in the event can be regarded as independent.
There are only two possible outcomes, either a “YES” or a “NO”.

(b)
7
𝑃(𝑋 = 4) = ( ) (0.6)4 (1 − 0.6)7−4 = 35(0.1296)(0.064) = 0.290304
4

(c)
𝑃 (𝑋 ≤ 3) = 𝑃 (𝑋 = 0) + 𝑃 (𝑋 = 1) + 𝑃 (𝑋 = 2 ) + 𝑃 (𝑋 = 3)

7
𝑃(𝑋 = 0) = ( ) (0.6)0 (1 − 0.6)7−0 = 0.00164
0

7
𝑃(𝑋 = 1) = ( ) (0.6)1 (1 − 0.6)7−1 = 0.01720
1

7
𝑃(𝑋 = 2) = ( ) (0.6)2 (1 − 0.6)7−2 = 0.00741
2

7
𝑃(𝑋 = 3) = ( ) (0.6)3 (1 − 0.6)7−3 = 0.19354
3
𝑃(𝑋 ≤ 3) = 0.00164 + 0.01720 + 0.00741 + 0.19354 = 0.21979

(d)
𝜇 = 𝑛𝑝
𝜇 = 0.6(7) = 4.2

𝜎 2 = 𝑛𝑝(1 − 𝑝)
𝜎 2 = 4.2(1 − 0.6) = 1.62

𝜎 = √1.62 = 1.2728

Q3 [Question Taken from Nanyang Polytechnic Computing Mathematics 2 Exam Paper]


15
Given that the mean and variance of a Binomial Distribution 𝑋 is 5 and respectively,
8
find the value of 𝑛 and 𝑝 in the Binomial Distribution of 𝑋.

Mean 𝜇 = 𝑛𝑝 = 5
15
Variance =𝜎 2 = 𝑛𝑝(1 − 𝑝) =
8

Equation 1: 𝑛𝑝 = 5
15
Equation 2: 𝑛𝑝(1 − 𝑝) =
8

15
𝑛𝑝(1 − 𝑝) ( 8 )
1−𝑝= = = 0.375
𝑛𝑝 5

𝑝 = 1 − 0.375 = 0.625

𝑛𝑝 5
𝑛= = =8
𝑝 0.625
Title Polytechnic and JC H2 Mathematics – Poisson Distribution
Author Liu Hui Ling, Ngee Ann Polytechnic
(Assisted by Chen Xin Yi)
Date 10/6/2018

Applicable to the following levels and types of education institution


✓ JC/MI – H2 Mathematics (Statistics)
✓ Engineering, Physics, Chemistry and Biology – Statistical Calculations
✓ Information Technology – Data Analytics

Apart from studying Business related modules, we also do some Business Statistics
Module which drew my interest in this topic of probability distribution. I think we can
just get straight to the point and explain what are the prerequisite and reason for using
of this type of probability distribution.

Purpose of Poisson Distribution is to


➢ Calculate the probability of an event happening in the subsequent intervals when
the mean rate of occurrences per unit of interval is given.

Information needed
➢ Mean occurrence rate
➢ Unit of Intervals (Unit of interval is the key word here. Without this unit of
intervals, it is highly likely that the use of Poisson Distribution cannot be justified.
Unit of intervals can come in terms of the time-interval, area-interval, volume-
interval and etc.)

Requirements
➢ Multiple events cannot happen simultaneously
➢ All events must be independent (i.e. Unaffected by past events and will not affect
subsequent events)

Formula used (Explanation given at the next page)


−𝜇
𝜇𝑘
( )
𝑃 𝑋=𝑘 =𝑒 ( )
𝑘!
𝑃 (𝑋 = 𝑘 ) The probability that the number of events occurrence
within the unit interval being exactly equal to 𝑘.
𝜇 Mean occurrence of event per unit interval. (i.e.
Expected Mean or Expected Value)
𝑒 Euler Constant
(Approximately 2.71828, rounded off to 6 significant
figures)
Modern scientific calculators should have this
functionality, you just need to locate 𝑒 𝑥 or 𝑒.

Question 1.
The number of sick leaves taken by students in a class per week is known to follow a
Poisson distribution with a mean of 1.8.

Find the probability that


(a) There are no sick leaves taken by students in the class in a one-week period.
(b) At least 4 sick leaves are taken by students in the class in a one-week period.

(a)
−1.8
1.80
𝑃 (𝑋 = 0) = 𝑒 ( ) = 0.165298
0!

(b)
1.80
𝑃(𝑋 = 0) = 𝑒 −1.8 ( ) = 0.165298
0!

1.81
𝑃(𝑋 = 1) = 𝑒 −1.8 ( ) = 0.297538
1!

−1.8
1.82
𝑃 (𝑋 = 2) = 𝑒 ( ) = 0.267784
2!

−1.8
1.83
𝑃 (𝑋 = 3) = 𝑒 ( ) = 0.160671
3!
𝑃(𝑋 = 4) = 1 − [𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2) + 𝑃(𝑋 = 3)] = 0.108709
Q2 [Taken from NYP Exam Paper] [Added with help from Anonymous Engineering
Student from Nanyang Polytechnic who refuses to disclose his/her name]

(a)
(i)
2.40
𝑃(𝑋 = 0) = 𝑒 −2.4 ( ) = 0.09071795 = 0.0907(4𝑑𝑝)
0!

(ii)
2.41
𝑃(𝑋 = 1) = 𝑒 −2.4 ( ) = 0.2177231
1!
𝑃(𝑋 ≥ 2) = 1 − [𝑃(𝑋 = 0) + 𝑃(𝑋 = 1)] = 0.6916 (4𝑑𝑝)
(iii)Since there are 2.4 Network Errors a day, we can argue that in a 7-day-work
week, there should be 7 × 2.4 Network Errors which is an average of 16.8 network
errors per week. In this case, the mean occurrence rate per week is 𝜇 = 16.8.
16.80
𝑃(𝑋 = 0) = 𝑒 −16.8 ( ) = 5.0565313 × 10−8
0!
16.80
𝑃(𝑋 = 1) = 𝑒 −16.8 ( ) = 8.4949727 × 10−7
1!
𝑃(𝑋 > 1 ) = 1 − [𝑃(𝑋 = 0) + 𝑃(𝑋 = 1)] = 1 − [𝑃(𝑋 = 0) + 𝑃(𝑋 = 1)]
= 1.0000 (4𝑑𝑝)
Title Poisson Distribution
Approximation for Binomial Distribution for large values of 𝑛 and small
values of 𝑝
Author [Anonymous], Student from School of Engineering, Nanyang Polytechnic
Date 4/3/2019

Let’s look at the Poisson Limit Theorem closely.


“Given 𝑝 approaches 0 and the value of 𝑛 approaches ∞ (infinity) in a Binomial
Distribution, the distribution will approach the Poisson Distribution.”

As a result, the following are the requirements for any Binomial Distribution to be
approximated by a Poisson Distribution:

If 𝑛 > 50, 𝑝 < 0.1 such that 𝑛𝑝 < 5


𝑋~𝐵(𝑛, 𝑝) ≈ 𝑋~𝑃𝑜(𝑛𝑝)

Example 1. [Questions Obtained from School of Information Technology Examination


Papers – Computing Mathematics 2, with help from friends]
SIT/2019/January

A stamping machine produces components at a rate of 300 per day. It is known that 1%
of the output is defective. Assuming this rate is approximated by a Poisson Distribution.

(a) Estimate the mean of the Poisson Distribution


(b) Find the probability that no defective output is produced in any given day
(c) Find the probability that at least 1 and at most 10 defective outputs are produced
in any given day

(a)
𝑋~𝐵(𝑛, 𝑝) ≈ 𝑋~𝑃𝑜 (𝜇)
𝜇 = 𝑛𝑝 = 1%(300) = 3

(b)
𝑋~𝑃𝑜(3)

−𝑢
𝜇𝑘
𝑃 (𝑋 = 𝑘 ) = 𝑒 ( )
𝑘!

30
𝑃(𝑋 = 0) = 𝑒 −0 ( ) = 0.0497870683
0!
(c)

−3
31
𝑃(𝑋 = 1) = 𝑒 ( )
1!

32
𝑃(𝑋 = 2) = 𝑒 −3 ( )
2!

−3
33
𝑃(𝑋 = 3) = 𝑒 ( )
3!

−3
34
𝑃(𝑋 = 4) = 𝑒 ( )
4!

−3
35
𝑃(𝑋 = 5) = 𝑒 ( )
5!

−3
36
𝑃(𝑋 = 6) = 𝑒 ( )
6!

37
𝑃(𝑋 = 7) = 𝑒 −3 ( )
7!

38
𝑃(𝑋 = 8) = 𝑒 −3 ( )
8!

−3
39
𝑃(𝑋 = 9) = 𝑒 ( )
9!

−3
310
𝑃(𝑋 = 10) = 𝑒 ( )
10!

Summing up all the probabilities 𝑃(𝑋 = 0) 𝑡𝑜 𝑃(𝑋 = 10), we get the following values
𝑃(𝑋 ≤ 10) = 0.999707663

𝑃(1 ≤ 𝑋 ≤ 10) = 0.999707663 − 0.049787068 = 0.9499(4 decimal places)


Title Continuous Probability Distribution
Author Lim Wang Sheng, School of Information Technology, Nanyang Polytechnic
[CCA: NYP Mentoring Club]
Date 4/3/2019

In this topic we focus on


- Continuous random variables
- Basic Concepts of Area Under Curve as Probability Value

Continuous random variables occur in many areas in statistics, they can take on
uncountable number of variables in contrast with Discrete random variables which takes
on countable number of variables.

A continuous probability distribution takes on continuous random variables, where the


probability distribution is typically represented by a graph, which area under curve from
the left all the way to the right of the probability distribution is exactly equal to 1.

Example includes
- Height of students
- Test scores
- Weight of bobcats
- Time students spend studying and revising for exams
Basic Operations Involving Illustration using a graph (Normal Distribution as an
Continuous Probability Example)
Distribution
𝑃(𝑋 = 𝑘 ) = 0

Consequently
𝑃(𝑋 > 𝑘 ) = 𝑃(𝑋 ≥ 𝑘)
𝑃(𝑋 < 𝑘 ) = 𝑃(𝑋 ≤ 𝑘)

The red line refers to the variable 𝑋 = 𝑘 and it


extremely obvious that you can’t produce an area
under curve with just a point on the X-axis, you need
at least 2 points or a range of values to be able to
define what is the area under curve is.

𝑃(𝑋 > 𝑘 ) = 1 − 𝑃(𝑋 < 𝑘)

− =

𝑃(𝑋 < 𝑘 ) = 1 − 𝑃(𝑋 > 𝑘)

− =
𝑃(𝑎 < 𝑋 < 𝑏) =
𝑃(𝑋 < 𝑏) − 𝑃(𝑋 < 𝑎)

− =
Title Polytechnic and A Level H2 Mathematics (Statistics) – Normal Distribution
Author Lim Wang Sheng, School of Information Technology, Nanyang Polytechnic
[CCA: NYP Mentoring Club]
Date 15/6/2018

Applicable to the following levels


✓ School of Information Technology Students (Computing Mathematics)
✓ School of Engineering (Engineering Mathematics – Statistical Analysis)
✓ School of Business Management Students (Statistics – Business Statistics)
✓ School of Chemical and Life Science – Biostatistics
✓ JC/MI Students – H2 Mathematics – Statistics

Items needed to start the topic


✓ Standard Normal Table

(Recommended, print a Standard Normal Table to refer to while doing your homework
and assignments, while there are literally thousands of them on the internet, best is get
from your school teacher and keep it. I also recommend you upload a copy to a cloud
disks, just in case you lose the Standard Normal Table, you restore them quickly and
reprint them.)

SEAB do have a copy of Standard Normal Table on their website. With enough searching
you should be able to find it.

My school also issues its own version of the standard normal table
(I have seen Standard Normal Table issued by other schools before, they have different
way of expressing the value of area under curve and different numerical accuracy
requirements.)
Table of Notation
𝑿~𝑵(𝝁, 𝝈𝟐 ) This is how a Normally Distributed
Variable should be written. This literally
means,

The variable 𝑋 is to be normally


distributed, with a mean of 𝜇, and a
variance of 𝜎 2 . (Replace the symbols with
values as specified in the questions you
are going to answer)

𝒁~𝑵(𝟎, 𝟏) Standard Normal Distribution. With mean


as 0 and a variance of 1. Since √1 = 1,
the standard deviation of the distribution
is also 1 in the case of a Standardized
Normal Distribution.

In this case, 𝑍 is the number of standard


deviations away from the mean, also
called the Z-score.

Table of Formula
Formula for Standardization
𝑋−𝜇
𝑍~𝑁(0,1) =
𝜎

Properties of a Normal Distribution Curve.


• Mean, Median and Mode are all on the same value
• Symmetrical at mean*, implying the left side of the Normal Distribution has a
total area of 0.5 and the right side of the Normal Distribution has a total area of
0.5 as well.

(This is important to know as I am aware that some standard normal table out there are
not as straightforward, I have seen other schools’ standard normal table that shows
value of area under curve from the mean to the 𝑍-score, the most common types,
however, shows area from the left of the distribution to the mean and shows the area
from the left of the distribution all the way to the right of the distribution.)
Example Questions
Example 1:
(Taken from Oxford University Lecture Notes)

The marks of 500 candidates in an examination are normally distributed with a mean of
45 marks and a standard deviation of 20 marks.
If 20% of the candidates obtained a distinction by scoring 𝑥 marks or more, estimate the
value of 𝑥.

Written in Normal Distribution Notation


𝑋~𝑁(45, 202 )

𝑃(20% 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑠 𝑠𝑜𝑐𝑟𝑖𝑛𝑔 ≥ 𝑥 𝑚𝑎𝑟𝑘𝑠)


= 𝑃(80% 𝑜𝑓 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑠 𝑠𝑐𝑜𝑟𝑖𝑛𝑔 ≤ 𝑥 𝑚𝑎𝑟𝑘𝑠)

Within the Standard Normal Table, I will look for the probability value closest to 0.800
(In this case, the standard normal table doesn’t have a value exactly equal to 0.800.)

It turned out, the standard normal table probability value closest to 0.800 is 0.7995,
under 𝑧 = 0.84

Given
𝑥−𝜇
𝑧=
𝜎
Applying Standardization Formula

𝑥−45
0.84 =
20

20(0.84) = 𝑥 − 45

16.8 = 𝑥 − 45

𝑥 = 16.8 + 45 = 61.8
Example 2 (Taken from Online Sources)
The daily revenue of a small restaurant is approximately normally distributed with a
mean of $530 and a standard deviation of $120. To be in profit, the restaurant must
receive at least $350.

Find the probability that the restaurant will be in profit on any given day.

𝑥−𝜇
Given 𝑧 =
𝜎

Applying Standardization Formula


350 − 530
𝑧= = −1.5
(120)

Looking for z-𝑠𝑐𝑜𝑟𝑒 = 1.5 in the standard normal table, it turns out the probability
value is 0.9332, thus the probability of the restaurant getting ≥ $350, is 0.9332
Example 3 (Taken from NYP Computing Mathematics 2 Paper)

Rewritten in Normal Distribution Notation, we get this


𝑋~𝑁(𝜇, 𝜎 2 ) = 𝐾~𝑁(15, 4)
Implying the standard deviation 𝜎 = √4 = 2

Example 3
(a) (i)
10−15 5
𝑍(𝐾 = 10) = = − = −2.5
2 2

20 − 15
𝑍(𝐾 = 20) = = 2.5
2

𝑃(𝑍 < −2.5) = 0.0062


𝑃(𝑍 < 2.5) = 0.9938
𝑃(−2.5 < 𝑍 < 2.5) = 0.9938 − 0.0062 = 0.9876

(ii)
18−15 3
𝑍(𝐾 = 18) = = = 1.5
2 2

𝑃(𝑍 < 1.5) = 0.9332


𝑃(𝑍 > 1.5) = 𝑃(𝐾 > 18) = 1 − 0.9332 = 0.0668
Title Normal Distribution – Distribution of Sample Mean
Date 13/8/2018
Author Lim Wang Sheng, School of Information Technology, Nanyang Polytechnic
[CCA: NYP Mentoring Club]

Applicable to
• Nanyang Polytechnic – School of Chemical and Life Sciences (Biostatistics)
• Nanyang Polytechnic – School of Engineering (Engineering Mathematics)
• Nanyang Polytechnic – School of Information Technology (Computing
Mathematics – Statistics)

Assumptions
• You already understood normal distribution and how to read the standard
normal table. (Do read up on Normal Distribution if you don’t understand Normal
Distribution as the notations and calculations used in this topic are rather
similar.)

(It is unclear if this topic would apply to ‘A’ Level students)


Purpose of this topic
• Determining probability of obtaining a certain range of mean values from a
defined sample, given a normally distributed or approximately normally
distributed population.

Notation Meaning
𝜇 Population Mean
𝑋̅ Sample Mean
𝜎 Population standard deviation
𝜎𝑋̅ Standard Deviation of Sampling
Distribution (Also referred to as standard
error)
𝑛 Sample Size (Number of subjects you are
performing the analysis on)

Formula List
𝜎
𝜎𝑋̅ =
√𝑛

𝑋−𝜇
𝑍 − 𝑠𝑐𝑜𝑟𝑒 =
𝜎𝑋
̅
Question 1. (Taken from SCL Notes)
In a certain population of swordtail fish, the length of individual fish follows an
approximately normal distribution, with a mean of 52.0 mm and standard deviation of
6.0mm. Find the probability that a random sample of 25 swordtail fishes with have an
average length of
a) Less than 48.6 mm
b) Between 52.4mm and 54.4mm

Population Normal Distribution to Be Written as Follows


𝑋~𝑁(52.0, 6.02 )

Sample Normal Distribution Values to Be Written as Follows


𝑋̅~𝑁(𝜇, 𝜎2𝑥 )
Computation of 𝜎𝑥 as follows
𝜇 = 52.0
𝜎 6.0 6.0
𝜎𝑥 = = = = 1.2
√𝑛 √(25) 5
Rewritten as: 𝑋̅ ~𝑁(52.0,1.22 )

Answering Question 1(a)


(48.6−52.0)
𝑍(𝑋̅ = 48.6) = = −2.83
1.2

𝑃(𝑍 < −2.83) = 0.0023

Answer: 0.0023

Answering Question 1(b)


54.4−52.0
𝑍(𝑋̅ = 54.4) = = 2.00
1.2

52.4−52.0
𝑍(𝑋̅ = 52.4) = = 0.33
1.2

𝑃(𝑋̅ < 54.4) = 0.9772

𝑃(𝑋̅ < 52.4) = 0.6293

𝑃(54.4 > 𝑋̅ > 52.4) = 0.9772 − 0.6293 = 0.3479


Answer: 0.3479
Title Normal Distribution – Central Limit Theorem
Editor -
Date 6/4/2019

Formal Statements of Central Limit Theorem as Follows


The central limit theorem states that if you have a population of mean 𝜇 and take
sufficiently large random sample (size 𝑛 ≥ 30) from the population with replacement,
the distribution of the sample means will be approximately normally distributed.

If the population is normally distributed or approximately normally distributed to start


with and random samples are taken from the population, regardless of the sample size,
the distribution of sample mean will also be normally or approximately normally
distributed.

Why bother with central limit theorem?


- 𝑡 −distribution for large degrees of freedom approximates the Normal
Distribution.
- Chi-Square distribution for large degree of freedom is also approximately
normally distributed

Examination which lots of candidates participate in uses the Normal Distribution to


conduct grading, data reporting and data analysis as such examination will have many
candidates and therefore, invoke the Central Limit Theorem.
Title Normal Distribution and 𝑡 distribution – Construction of Confidence
Interval – Basics Theory
Author Lim Wang Sheng, School of Information Technology, Nanyang Polytechnic
[CCA: NYP Mentoring Club]
Date 22/8/2018

Applicable to the following schools of Nanyang Polytechnic


• School of Information Technology – Computing Mathematics 2
• School of Engineering – Engineering Mathematics 2B
• School of Chemical and Life Sciences – Biostatistics
• School of Business Management – Business Statistics

Applicable to A Level Syllabus


• H2 Further Mathematics

The following table best illustrate the prerequisite to using either z-score method (z-
test) or t-score method (t-test) to compute the confidence interval of a sample.

Situation Action Taken


Question asks you to construct Construct Confidence Interval Using
confidence interval with population Standard Normal Distribution (𝑧 − score)
standard deviation 𝜎 known
Question asks you to construct Construct Confidence Interval Using
confidence interval with small sample size Standard Normal Distribution (𝑧 − score)
but stating the sample is approximately
normally distributed
Question asks you to construct By Central Limit Theorem, the distribution
confidence interval of a sample with large is approximately normal and thus we use
sample sizes (𝑛 ≥ 30) the Standard Normal Distribution
(𝑧 −score) to construct confidence
interval
Question ask you to construct confidence Construct Confidence Interval Using
interval of small sample sizes (𝑛 < 30), 𝑡 −score values from the 𝑡 −distribtuion.
population standard deviation 𝜎
unknown
Title Normal and 𝑡-Distribution – Construction of Confidence Interval –
Calculation Phase
Author Lim Wang Sheng, School of Information Technology, Nanyang Polytechnic
[CCA: NYP Mentoring Club]
Date 14/9/2018

**This guide assumes you have already read my previous guide on accessing which
method is the most suitable for construction of confidence interval in various situations.

Margin of Error  𝜎
𝐸 = 𝑍𝐶 ( )
• Population Standard Deviation (𝜎) √𝑛
Known
• Question Mentions the Sample is 𝐸 refers to the margin of error
Normally or Approximately 𝑍𝑐 refers to the 𝑧 − 𝑠𝑐𝑜𝑟𝑒 of the
Normally Distributed confidence interval in question
• Large Sample Size of 𝑛 ≥ 30 𝜎 refers to the population standard
deviation
𝑛 refers to the sample size
Margin of Error  𝑠
𝐸 = 𝑡𝑐 ( )
• Population Standard Deviation (𝜎) √𝑛
not known and small sample size
of 𝑛 < 30 𝐸 refers to the margin of error
𝑡𝑐 refers to the 𝑡 − score of the
confidence interval in question
𝑠 refers to the standard deviation of the
sample
𝑛 refers to the sample size

Confidence Interval Formula  𝑋̅ ± 𝐸

Degrees of Freedom  𝑛−1


Where 𝑛 is the sample size in the
question
Reason for this topic:
Confidence interval serves as a robust and analytical approach to determine how much
will actual value deviate from observed value. The idea of confidence interval is that it is
a range of value where we are reasonably sure our population mean lies in.

A 0.95 or 95% confidence interval has a 0.95 probability of containing the population
mean under the curve of the distribution.
(Taken from University of Texas at Dallas Website)
Q1. A sample size of 𝑛 = 100 produced a sample mean of 𝑋̅ = 16. Assuming the
population standard deviation 𝜎 = 3, compute the 95% confidence interval for the
population mean 𝜇.

Since the population standard deviation is known, use 𝑧 −score method.

From my school’s Standard Normal Table, the 𝑧-score of the confidence interval
mentioned in the question is 1.960.

Standard Error
𝜎 3
= = 0.3
√𝑛 √100
𝜎
Margin of Error = 𝐸 = 𝑧𝑐 ( )
√𝑛

𝐸 = 0.3(1.960) = 0.588

Confidence Interval 16 ± 0.588


Confidence Interval at Between 15.412 and 16.588
Q2. To access the accuracy of a laboratory scale, a standard weight known to weigh 1
gram is repeatedly weighed 4 times. The resulting measurements are (In grams): 0.95,
1.02, 1.01, 0.98.

Compute the Confidence Interval for 𝜇.

Since 𝑛 ≤ 30 and the population standard deviation 𝜎 is not known, we answer the
question using the t-score method.

0.95+1.02+1.01+0.98
Sample Mean 𝑋̅ = = 0.99
4

Standard Deviation of Sample 𝑠 = 0.03162

0.03162
Standard Error = = 0.01581
√4

At Degrees of Freedom = 3 and 0.95 Confidence Interval


𝑠
Margin of Error = 𝑡𝑐 ( ) = 3.182(0.01581) = 0.05030742
√𝑛

Confidence Interval at 0.99 ± 0.05030742


Confidence Interval at Between 0.940 and 1.040
Title Normal and 𝑡 −distribution – Hypothesis Testing (One Sample)
Date 27/9/2018
Author(s) Lim Wang Sheng, School of Information Technology, Nanyang Polytechnic
[CCA: NYP Mentoring Club]
Liu Hui Ling, Ngee Ann Polytechnic

𝐻0 : The person’s claim is valid. We do not reject the null hypothesis.


𝐻𝑎 : The person’s claim is not valid. We reject the null hypothesis in favor of an
alternative hypothesis.

Left-Tail Two-Tail Right-Tail


Symbol Used for 𝜇 ≥𝑥 𝜇=𝑥 𝜇≤𝑥
Null Hypothesis
Symbol Used for 𝜇<𝑥 𝜇≠𝑥 𝜇>𝑥
Alternative
Hypothesis
Objective of 𝐻0 : 𝐻0 :Testing if value 𝐻0 : Testing if value 𝐻0 :Testing if the
is above a certain is within a certain value is below a
minimum acceptable range of certain maximum
threshold. values. threshold.

Once again, the same concepts from other topics will apply. I want to elaborate in the
context of this topic in this case.

If the question specifies, the distribution is approximately normal, normal, population


standard deviation 𝜎 known or sample size 𝑛 ≥ 30, use the 𝑧 − 𝑠𝑐𝑜𝑟𝑒 method to solve
the question.

If question doesn’t specify a known standard deviation and sample size 𝑛 < 30, use the
𝑡 − 𝑠𝑐𝑜𝑟𝑒 method to solve the question.

Steps to hypothesis testing as follows:


1. State null and alternative hypothesis
2. Determine nature of test and write down criteria for rejecting null hypothesis
3. Compute the standard error in question
4. Compute test statistics (𝑧 − 𝑠𝑐𝑜𝑟𝑒 or 𝑡 − 𝑠𝑐𝑜𝑟𝑒)
5. Make your decision and justify why you fail to reject or rejected your null
hypothesis
Standard Error
𝜎
Sample Size 𝑛 ≥ 30, Standard Error =
√𝑛
Normally Distributed, 𝜎 is the population standard deviation.
Approximately Normally
Distributed or 𝜎 known
𝑠
Sample Size 𝑛 < 30, 𝜎 not Standard Error =
√𝑛
known 𝑠 is the sample standard deviation
Formula for Test Statistics
𝑧 − 𝑠𝑐𝑜𝑟𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 − ℎ𝑦𝑝ℎ𝑜𝑡ℎ𝑒𝑠𝑖𝑧𝑒𝑑 𝑚𝑒𝑎𝑛
𝑧=
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟

𝑡 − 𝑠𝑐𝑜𝑟𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 − ℎ𝑦𝑝ℎ𝑜𝑡ℎ𝑒𝑠𝑖𝑧𝑒𝑑 𝑚𝑒𝑎𝑛


𝑡=
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟
(Questions 1 and Question 2 Obtained from NYP SCL Notes)
Question 1.
A report claims that an adult has an average of 130 Facebook friends. A random sample
of 50 adults revealed that the average number of Facebook friends is 142 with a
standard deviation of 38.2. At 5% significance level, is there enough evidence to reject
the claim?

Since question doesn’t specify words that imply “more than” or “less than”, the test is
said to be two-tailed in nature, the null and alternative hypothesis will follow.
𝐻0 : 𝜇 = 130
𝐻𝑎 : 𝜇 ≠ 130, implying 𝜇 > 130 𝑂𝑅 𝜇 < 130

We will also need to set the criteria for not rejecting and rejecting the null hypothesis.
Since 𝑛 ≥ 30, we will use 𝑧 − 𝑠𝑐𝑜𝑟𝑒 to perform the test. (As inferred from the standard
normal table issued by my school.)

𝐻0 : −1.960 ≤ 𝑧 ≤ 1.960
𝐻𝑎 : 𝑧 > 1.960 𝑂𝑅 𝑧 < −1.960

Calculate Standard Error


38.2
= 5.402 296
√50

Compute Test Statistics


142 − 130
𝑧= = 2.221
38.2
√50

Since the test statistics falls in the rejection region I mentioned above,
𝐻0 is to be rejected, as there is a lack of evidence to support the claim.
Question 2.
The management of a weight loss club claims it’s members lose an average of 3 kg or
more within the first month after joining the club. A consumer agency that wanted to
check this claim took a random sample of 36 members of this club and they lost an
average of 2.9 kg with standard deviation of 0.6 within the first month of membership.
Test at 10% significance level if the management’s claim is true.

The question stated the claim as “3 kg or more”, implying the objective of the test is
reject the hypothesis should the value falls below a certain threshold, the test is left-
tailed in nature.

𝐻0 : 𝜇 ≥ 3
𝐻𝑎 : 𝜇 < 3

Since the test is left tailed nature, involving sample size 𝑛 ≥ 30, the following
information is required to test the claim.
𝐻0 : 𝑧 ≥ −1.282
𝐻𝛼 : 𝑧 < −1.282

Calculate Standard Error


0.6
= 0.1
√36

Compute test statistics

2.9 − 3.0
𝑧= = −1
0.6
√36

Since test statistics doesn’t fall within the rejection region as specified earlier,
We will not reject the claim as there is enough evidence to support the management’s
claim.
Question 3. [Question Created by Hui Ling herself.]
A report from XYZ Clinics claims that the waiting time for each patient from registration
to consultation is 25 minutes or less. A civil servant from the Ministry of Health was
tasked to check if the claim is valid and took a random sample of 15 patients and found
out the average waiting time for each patient is 26.5 minutes, with a standard deviation
of 8 minutes. Given the test is to be performed at 𝛼 = 0.05, what conclusion should
that civil servant come to?

The claim specifies 25 minutes or less, implying the aim of the test is to reject the claim
should the value fall above a certain threshold, which further implies a right tailed test is
to be conducted.

𝐻0 : 𝜇 ≤ 25
𝐻𝛼 : 𝜇 > 25

Since sample size is small and question did not mention “normally”, “approximately
normally distributed” or the population standard deviation, we use the 𝑡 − 𝑠𝑐𝑜𝑟𝑒
method to approach the question. At degrees of freedom of 𝑛 − 1 = 14, the following
information is obtained.

𝐻0 : 𝑡 ≤ 1.761
𝐻𝛼 : 𝑡 > 1.761

Compute Standard Error


8
= 2.065591
√15

Compute test statistics


𝑆𝑎𝑚𝑝𝑙𝑒 𝑀𝑒𝑎𝑛−𝐻𝑦𝑝ℎ𝑜𝑡ℎ𝑒𝑠𝑖𝑧𝑒𝑑 𝑀𝑒𝑎𝑛 26.5−25
𝑡= = 8 = 0.726
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐸𝑟𝑟𝑜𝑟
√15

Since the test statistics doesn’t falls in the rejection region, we conclude the following:

Since 𝑡 ≤ 1.761, the civil servant should not reject the claim mentioned in the report of
XZY Clinics.
Title Chi-Squared-Test for Goodness-of-Fit
Author Lim Wang Sheng, School of Information Technology, Nanyang Polytechnic
[CCA: NYP Mentoring Club]
Date 19/10/2018
***You will need a Chi-Squared table (for assignments) or a software (for projects) in
order to be able to calculate or obtain the Chi-Squared Critical Values.

The purpose of Chi-Squared-Test in general is to provide a robust, mathematical and


analytical approach towards the following goals
• Determine how much categorical variables differ in terms of hypothesized value
and observed value
• Determine whether two categorical variables are independent.

In this topic, we will focus mainly on the first goal, to measure the difference between
the hypothesized value and observed value and from there, we will arrive at a decision
on whether the hypothesized value is considered reliable.

Formula List as given:

(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒)2


𝜒2 = ∑
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒

𝜒 2 refers to the chi-squared value.


𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒 refers to the value under each category as obtained from the sample.
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒 refers to the value of the respective category as hypothesized.
The above formula literally implies:
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒−𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒)2
You compute the sum of for every category in the
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒
question to obtain the overall 𝜒 2 value.

Degrees of Freedom = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑒𝑠 – 1

(𝜒 2 test for Goodness-of-Fit are always right-tailed in nature. You reject the null
hypothesis should the 𝜒 2 value goes beyond a certain threshold as obtained in your
𝜒 2 𝑡𝑎𝑏𝑙𝑒. That value is sometimes called the “critical value”.)
Example 1 (Obtained from NYP SCL Biostatistics Notes):
A recruitment agency’s manager says that 22% of the undergraduates do not work, 26%
work 1 to 20 hours per week, 18% work 21 to 34 hours, and 34% work 35 or more hours
per week. You randomly selected 120 undergraduates and gather the results shown in
the table. At 𝛼 = 0.01, can you reject the manager’s claim?

Response Frequency
Do not work 29
Work 1 to 20 hours 26
Work 21 to 34 hours 25
Work 35 or hours 40

Step 1. Propose a null and alternative hypothesis.


𝐻0 : The manager’s claim is reliable.
𝐻𝛼 : The manager’s claim is not reliable.

Step 2. Set Rejection Criteria (As obtained from Chi-Square Table)


Under degrees of freedom = 3 and 𝛼 = 0.01
𝑯𝟎 : 𝝌𝟐 < 𝟏𝟏. 𝟑𝟒𝟓
𝑯𝜶 : 𝝌𝟐 ≥ 𝟏𝟏. 𝟑𝟒𝟓

Step 3. Compute Expected Values in Question:


Response Frequency (Expected Values)
Do not work 26.4 = 22% × 120
Work 1 to 20 hours 31.2 = 26% × 120
Work 21 to 34 hours 21.6 = 18% × 120
Work 35 or hours 40.8 = 34% × 120

(𝑶𝒃𝒔𝒆𝒓𝒗𝒆𝒅−𝑬𝒙𝒑𝒆𝒄𝒕𝒆𝒅)𝟐
Step 4. Compute ∑ to get 𝝌𝟐 value.
𝑬𝒙𝒑𝒆𝒄𝒕𝒆𝒅

2
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒)2
𝜒 =∑
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒

(29 − 26.4)2 (26 − 31.2)2 (25 − 21.6)2 (40 − 40.8)2


𝜒2 = + + + = 1.6736
26.4 31.2 21.6 40.8

Step 5. Make a decision to reject or not reject 𝑯𝟎 .


Since the 𝜒 2 value < 11.345 , we have to conclude the following:
We do not reject 𝐻0 .
Title Chi-Square Test of Independence
Author Lim Wang Sheng, School of Information Technology, Nanyang Polytechnic
[CCA: NYP Mentoring Club]
Date 21/10/2018

As mentioned in my previous topic, Chi-Square test is also used to determine if two


categorical variables are dependent or independent of each other. What happen in this
scenario is, you will be given a table (i.e. contingency table) where the rows represent a
categorical variable, the columns represent another categorical variable. The aim of
such test is to determine if row is independent of the column.

Despite similarities in formula, some major difference is to be noted.

Formula for Degrees of Freedom on a Contingency Table


𝑑. 𝑓. = (𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑅𝑜𝑤𝑠 − 1)(𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑜𝑙𝑢𝑚𝑛𝑠 − 1)

Formula for Grand Total


𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙 = ∑ 𝑉𝑎𝑙𝑢𝑒 𝑜𝑓 𝐸𝑣𝑒𝑟𝑦 𝐶𝑒𝑙𝑙

Formula for Expected Value for Each Cell


(𝑅𝑜𝑤 𝑇𝑜𝑡𝑎𝑙)(𝐶𝑜𝑙𝑢𝑚𝑛 𝑇𝑜𝑡𝑎𝑙)
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒 =
𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙

Formula for Chi-Square Statistics Value


2
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒)2
𝜒 =∑
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒

(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒−𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒)2


The above formula means, you compute for every cell and
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒
add them up to get the Chi-Square Statistics Value.

Null Hypothesis
𝐻0 : The 2 categorical variables in the question are independent.
𝐻𝛼 : The 2 categorical variables in the question are dependent.
Example 1. (Taken from NYP SCL Biostatistics Notes)
A health club manager wants to determine whether the number of days per week that
students spent exercising is dependent of gender. A random sample of 275 students is
selected and the results are shown as classified in the table. At 5% level, is there enough
evidence to conclude that the number of days spent exercising per week is dependent
of gender?

Days spent per week exercising


Gender 0-1 2-3 4-5 6-7
Male 40 53 26 6
Female 34 68 37 11

Step 1: Define Null and Alternative Hypothesis


𝐻0 : The number of days spent exercising per week is independent of gender.
𝐻𝛼 : The number of days spent exercising per week is dependent of gender.

Step 2: Identify Degrees of Freedom


𝑑. 𝑓. = (𝑁𝑢𝑚𝑏𝑒𝑟 𝑅𝑜𝑤𝑠 − 1)(𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑜𝑙𝑢𝑚𝑛𝑠 − 1) = (4 − 1)(2 − 1) = 3

Step 3: Set rejection criteria


At 𝛼 = 0.05
𝐻0 : 𝜒 2 < 7.815
𝐻𝛼 : 𝜒 2 ≥ 7.815

Step 4: Calculate Row Total, Column Total and Grand Total


Row Total in Green Parenthesis
Column Total in Blue Parenthesis
Days spent per week exercising
Gender 0-1 2-3 4-5 6-7 Row
Totals ↓
Male 40 53 26 6 (125)
Female 34 68 37 11 (150)
Column Totals  (74) (121) (63) (17)
Grand Total = 275
Step 5: Compute Expected Value for Every Cell Using the Formula
(𝑅𝑜𝑤 𝑇𝑜𝑡𝑎𝑙)(𝐶𝑜𝑙𝑢𝑚𝑛 𝑇𝑜𝑡𝑎𝑙)
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒 =
𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙

The following table is the result of calculation of the Expected Value using the above
formula as shown.

(Expected Values Days spent per week exercising


Table) 0-1 2-3 4-5 6-7
Gender
370 315 85
Male 55
11 11 11

444 378 102


Female 66
11 11 11

Step 6:
Compute Chi Square Statistic Value by Applying the Following Formula
2
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑉𝑎𝑙𝑢𝑒 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒)2
𝜒 =∑
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒

370 2 315 2 85 2 444 2


(40 − ) (53 − 55) 2 (26 − ) (6 − ) (34 − )
𝜒2 = 11 + + 11 + 11 + 11
370 55 315 85 444
11 11 11 11
2 2
378 102
(68 − 66)2 (37 − 11 ) (11 −
11
)
+ + +
66 378 102
11 11

𝜒 2 = 3.493

Step 7:
Make a conclusion
Since 𝜒 2 < 7.815, we do not reject the null hypothesis, that number of days spent
exercising per week is independent of gender.
Title Utilizing a Standard Normal Table - Probability as Area from Far Left of
the Normal Distribution to the Z-Score
Author -
Date 10/4/2019
Probability as Area from Far Left of the Normal Distribution to the Z-Score
(There are many types of Standard Normal Table out there, check before proceeding.
For a different type of Standard Normal Table, consult your teachers, professors or
lecturers for help as I cannot accommodate to all the possible types with limited
resources.)

2nd
decimal
place

Whole
Number and
1st Decimal
Place
Objective Instructions
Probability for Which Z-score ≤ 0 Look up the whole number and first
decimal place, then, look up the second
decimal place and take the probability as
shown in the table.

Example: To find the probability value of


z-score ≤ −1.5, look up the row “-1.5”
and look up the column “0.00” for the z-
score value, which turns out to be 0.0668
Probability for Which Z-score > 0 Look up the whole number and first
decimal place of the negative
counterpart, then, look up the second
decimal place. Deduct the value from 1 to
get the probability value.

Example: To compute the probability


value of z-score ≤1.33, you search for the
probability value for -1.33 which is 0.0918
and deduct the value from 1 to get 0.9082
𝑍𝑐 of Confidence Interval Obtain the confidence interval value and
the corresponding probability value, then
find the value of 𝑧𝑐 (z-score of confidence
interval)

Example:
If question wants 95% confidence
interval.

The corresponding p-value is


1−0.95
0.95 + = 0.95 + 0.025 = 0.975
2
1−𝑐
General Formula 𝑝 = 𝑐 +
2

Deduct p-value from 1 to get, 0.025


𝑝 − 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 0.025 on the standard
normal table corresponds to 𝑧 = −1.96

Therefore 𝑧𝑐 = 1.96
𝛼 𝛼
𝑧 − 𝑠𝑐𝑜𝑟𝑒 for 2-tailed test Compute and find the 𝑧-score for
2 2

If the question wants a significance level


of 𝛼 = 0.05

𝛼 0.05
Compute = = 0.025
2 2
Find the z-score corresponding to 𝑝 =
0.0025

Value turns out to be −1.96, therefore


𝐻0 : −1.96 ≤ 𝑧 ≤ 1.96
𝐻𝛼 : 𝑧 < −1.96 𝑂𝑅 𝑍 > 1.96

You might also like