0% found this document useful (0 votes)
4 views41 pages

Understanding Hypothesis Testing Basics

Hypothesis testing is a statistical method used to make assumptions about population parameters based on sample data, involving the null hypothesis (H0) and alternative hypothesis (Ha). The process includes stating hypotheses, choosing a significance level, analyzing sample data, and interpreting results, with potential for Type I and Type II errors. Various tests such as Z tests, T tests, and Chi-square tests are used to evaluate hypotheses and determine relationships between variables.

Uploaded by

komalbanik
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views41 pages

Understanding Hypothesis Testing Basics

Hypothesis testing is a statistical method used to make assumptions about population parameters based on sample data, involving the null hypothesis (H0) and alternative hypothesis (Ha). The process includes stating hypotheses, choosing a significance level, analyzing sample data, and interpreting results, with potential for Type I and Type II errors. Various tests such as Z tests, T tests, and Chi-square tests are used to evaluate hypotheses and determine relationships between variables.

Uploaded by

komalbanik
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Hypothesis Testing

What is Hypothesis Testing?


• A statistical hypothesis is a method to make assumption about a
population parameter based on sample size

• This assumption may or may not be true.

• Hypothesis testing refers to the formal procedures used by


statisticians to accept or reject statistical hypotheses.
Statistical Hypotheses
• There are two types of statistical hypotheses.
• Null Hypothesis (H0)

• State the assumed value of the parameter before sampling


• It is a assumption that we wish to test
• If a population mean is equal to hypothesized mean then the null hypothesis
can be written as
H0: µ = µ0
Alternate Hypothesis (Ha)
• All possible alternate other than null hypothesis
• If null hypothesis is H0: µ = µ0 then the alternate hypothesis can be
written as
Ha: µ ≠ µ0
Ha: µ > µ0
Ha: µ < µ0
Hypothesis Tests
• Statisticians follow a formal process to determine whether to reject a
null hypothesis, based on sample data.
• This process, called hypothesis testing, consists of four steps.
1. State the hypotheses
• This involves stating the null and alternative hypotheses. The
hypotheses are stated in such a way that they are mutually exclusive.
That is, if one is true, the other must be false.
2. Choose significance level
The significance level (α) is the probability threshold that you set before
conducting a hypothesis test.
3. Formulate an analysis plan and Analyze sample data
• The analysis plan describes how to use sample data to evaluate the
null hypothesis. The evaluation often focuses around a single test
statistic. Find the value of the test statistic (mean score, proportion, t
statistic, z-score, etc.) described in the analysis plan.
4. Interpret results
• Apply the decision rule described in the analysis plan. If the value of
the test statistic is unlikely, based on the null hypothesis, reject the
null hypothesis.
Two tailed test at 5% Significance level
Left tailed test at 5% Significance level
Right tailed test at 5% Significance level
Decision Errors
Type I Error
• Rejection of the null hypothesis when it is true is called a type I
error.

• Rejecting H0 in favor of H1 when, in fact, H0 is true. Such an error is


called a type I error.
Type II Error
• Nonrejection of the null hypothesis when it is false is called a type II
error.
• We fail to reject H0 when in fact H0 is false. This is called a type II
error.
Z test for a single mean

Z test for a single mean
T test for a single mean



• Now sample size n =12, df = 12-1 = 11

• Hence, if t >2.201 then reject H0


if t <-2.201 then reject H0

Paired T Test

subject Score1 Score2
1 3 20
2 3 13
3 3 13
4 12 20
5 15 29
6 16 32
7 17 23
8 19 20
9 23 25
10 24 15
11 32 30
subject Score1(X) Score2(Y) X-Y (X-Y)2
1 3 20 -17 289
2 3 13 -10 100
3 3 13 -10 100
4 12 20 -8 64
5 15 29 -14 196
6 16 32 -16 256
7 17 23 -6 36
8 19 20 -1 1
9 23 25 -2 4
10 24 15 9 81
11 32 30 2 4
Significance level = (5%). For this sample
problem, with df=10,
The t-value is (2.228, -2.228)

We can reject the null hypothesis that there is


no difference between means.
Two-Sample t-Test

• Difference between the two populations means is equal to some
constant μ1−μ2=d0

• The two-sample t-test for the data is defined as

H0: μ1=μ2
Ha: μ1≠μ2

As, μ1=μ2 then μ1-μ2 = 0


Two Sample T Test (Unequal Variance)

• n1 and n2 are sample sizes


• S12 & S22 are variances of sample 1 and sample 2
• x̅1 and x̅2 are means of sample sizes
• In a factory, samples are extracted from the Machine A and Machine
B Assuming the population variance is unequal, at 95% confidence
level we have to figure it out whether Machine A and B are producing
of equal mean or mean of both the machine have been changed?
Machine A Machine B
140 134
142 152
144 167
142 140
141 130
• Null Hypothesis -> Mean of Machine A =Mean of Machine B
• Alternate Hypothesis: Mean of Machine A not equal to Mean of
Machine B
Chi-Square Test (Test for Independence )
• The chi-square independence test is a procedure for testing if two
categorical variables are related in some population.
Example:
• a scientist wants to know if education level and marital status are
related for all people in some country.

• He collects data on a simple random sample of n = 300 people, part of


which are shown below.
Observed Frequencies

there's 4 marital status categories and 5 education levels;


Column Percentages

is marital status related to education level ??


Null Hypothesis
• The null hypothesis for a chi-square independence test is that
• two categorical variables are independent in some population.

• independence means that one variable doesn’t “say anything” about


another variable.
Expected Frequencies
• Expected frequencies are the frequencies we expect in our sample
if the null hypothesis holds.
• These expected frequencies are calculated as

• eij is an expected frequency;


• oi is a marginal column frequency;
• oj is a marginal row frequency;
• N is the total sample size.
Expected Frequency
• So for our first cell, that'll be
Test Statistic
• The chi-square test statistic is calculated as
• degrees of freedom
• i is the number of rows in our table and
• j is the number of columns

• df=(5−1)⋅(4−1)=12.
• And with df = 12, the probability of finding χ2 ≥ 23.57 ≈ 0.023
• It basically means, there's a 0.023 (or 2.3%) chance of finding this
association

• Conclusion: marital status and education are related in our


population.
References
• [Link]
• [Link]
est/
• [Link]
• [Link]
• [Link]
• [Link]
• Probability & Statistics for Engineers & Scientists by Ronald E. Walpole,
Raymond H. Myers, Sharon L. Myers, Keying Ye

You might also like