Hypothesis Testing
#What is t test?
A t-test is a statistical test used to compare the means of two groups.
#Difference between null & alternate hypothesis
The null hypothesis (H0) and the alternative hypothesis (H1) are two opposing statements that are used in statistical
hypothesis testing. The null hypothesis is the statement that there is no difference between the two groups being
compared, while the alternative hypothesis is the statement that there is a difference between the two groups.
The null hypothesis is typically denoted by the symbol H0, and the alternative hypothesis is denoted by the symbol H1.
The null hypothesis is always tested first. If the null hypothesis is rejected, then the alternative hypothesis is accepted.
However, if the null hypothesis is not rejected, then there is not enough evidence to conclude that the alternative
hypothesis is true.
Here are some examples of null and alternative hypotheses:
Null hypothesis: The average height of men is the same as the average height of women.
Alternative hypothesis: The average height of men is different from the average height of women.
Null hypothesis: There is no relationship between smoking and lung cancer.
Alternative hypothesis: There is a relationship between smoking and lung cancer.
#Difference between 1 tailed t test and 2 tailed t test
The main difference between a one-tailed t-test and a two-tailed t-test is the direction of the alternative hypothesis. A
onetailed t-test tests for a difference in one direction only, while a two-tailed t-test tests for a difference in either
direction.
#What is p value in statistics?
In statistics, a p-value is a measure of the probability of obtaining the observed results of a statistical test, assuming that
the null hypothesis is true. The p-value is calculated by comparing the test statistic to a critical value. The critical value is
a threshold value that determines whether the results of the test are statistically significant.
A p-value of 0.05 or less is generally considered statistically significant. This means that there is less than a 5% chance of
obtaining the observed results if the null hypothesis is true. A p-value of 0.01 or less is considered highly statistically
significant.
#When your sample is less than 30, then you use t test & When your sample is more than 30, then you use z test.
# Difference between type 1 and type 2 error.
A type I error is rejecting the null hypothesis when it is actually true.
A type II error is failing to reject the null hypothesis when it is actually false.
# Parametric test vs non parametric test.
Parametric tests and non-parametric tests are two types of statistical tests used to compare two or more groups.
Parametric tests make assumptions about the data, such as the data being normally distributed and having equal variances.
Nonparametric tests do not make any assumptions about the data.
#When data is categorical then we use chi-square test.
# What is categorical data in staistics?
Categorical data is data that can be categorized into groups or classes. It is a type of qualitative data, meaning that it is not
numerical. Examples of categorical data include:
Gender: Male, female, transgender
Marital status: Single, married, divorced, widowed
Race: White, black, Hispanic, Asian, Native American
Occupation: Doctor, lawyer, teacher, engineer, student Eye color: Brown, blue, green, hazel
# What is ANOVA test in statistics?
Analysis of variance (ANOVA) is a statistical test used to compare the means of two or more groups. # Difference
between 1 way ANOVA and 2 way ANOVA .
Chapter Solutions
#What is the purpose of statistical hypothesis?
The purpose of statistical hypothesis testing is to decide whether there is enough evidence in a sample data to draw
conclusions about a population. It involves formulating two competing hypotheses, the null hypothesis (H0) and the
alternative hypothesis (Ha), and then collecting data to assess the evidence.
The null hypothesis is the default assumption, which is typically that there is no difference or relationship between the
variables being studied. The alternative hypothesis is the hypothesis that the researcher is trying to support, and it
typically states that there is a difference or relationship between the variables.
If the p-value is less than a certain threshold (typically 0.05), then the null hypothesis is rejected and the alternative
hypothesis is accepted. This means that there is enough evidence to conclude that the difference or relationship between
the variables is real.
#What is a significance level? How does a researcher are choose a significance level?
The significance level, also known as alpha (α), is the probability of rejecting the null hypothesis when it is true. It is a
measure of how strong the evidence must be before the researcher will conclude that there is a statistically significant
effect.
The researcher chooses the significance level before conducting the experiment. The most common significance level is
0.05, which means that there is a 5% chance of rejecting the null hypothesis when it is true. However, other significance
levels, such as 0.01 or 0.10, can also be used depending on the specific research question and the field of study.
Here are some factors that researchers consider when choosing a significance level:
* The type of experiment being conducted. For example, experiments with high stakes, such as clinical trials, may use a
lower significance level to reduce the risk of making a Type I error (falsely rejecting the null hypothesis).
* The cost of making a Type II error (failing to reject the null hypothesis when it is false). For example, if the researcher
is studying a new drug that could potentially save lives, they may be willing to accept a higher significance level in order
to avoid missing a real effect.
* The prevailing practices in the field of study. For example, some fields, such as psychology and medicine, traditionally
use a significance level of 0.05, while other fields, such as economics and finance, may use a significance level of 0.10.
# What is the difference between a significance level and a P value?
*Significance level (alpha)* is a pre-chosen probability of rejecting the null hypothesis, even when the null hypothesis is
true. It is typically set to 0.05, which means that there is a 5% chance of making a Type I error, which is rejecting the null
hypothesis when it is true.
*P-value* is the probability of obtaining the observed results or more extreme results, assuming that the null hypothesis is
true. It is a measure of the strength of evidence against the null hypothesis.
The difference between significance level and p-value is:
* *Significance level* is a pre-chosen probability, while *p-value* is a calculated probability.
* *Significance level* is a fixed value, while *p-value* can vary depending on the data.
We can use the p-value to decide whether to reject the null hypothesis. If the p-value is less than or equal to the
significance level, we reject the null hypothesis. If the p-value is greater than the significance level, we fail to reject the
null hypothesis.
#What are the factors that determine that choice of the appropriate statistical technique.
There are several factors that determine the choice of the appropriate statistical technique, including:
* *The research question:* What are you trying to learn from your data? The specific statistical technique you choose
will depend on the specific research question you are asking. For example, if you are trying to compare the means of two
groups, you might use a t-test. If you are trying to predict a continuous outcome variable based on one or more predictor
variables, you might use a regression model.
* *The type of data:* What type of data do you have? Some statistical techniques are only appropriate for certain types of
data. For example, you cannot use a t-test on categorical data.
* *The level of measurement:* What is the level of measurement of your variables? The level of measurement refers to
how the data is collected and how it can be meaningfully manipulated. There are four levels of measurement: nominal,
ordinal, interval, and ratio. Some statistical techniques are only appropriate for certain levels of measurement. For
example, you cannot calculate a correlation coefficient between two nominal variables.
* *The sample size:* How large is your sample? Some statistical techniques require a large sample size in order to be
accurate.
* *The assumptions of the test:* Each statistical test has a set of assumptions that must be met in order for the results to
be valid. It is important to check the assumptions of the test before using it on your data.
# Give an example in which a Type 1 error may be more serious than a type 2 error.
A Type 1 error is when you reject the null hypothesis when it is true. A Type 2 error is when you fail to reject the null
hypothesis when it is false.
*Example:*
A new drug is being developed to treat a rare and deadly disease. The drug is tested on a sample of patients, and the
results show that the drug is effective in treating the disease. However, there is a small chance that the results are due to
chance, and that the drug is not actually effective.
If the researchers make a Type 1 error and approve the drug, even though it is not effective, patients who take the drug
will not receive the treatment they need. This could have serious consequences, as the disease is deadly.
On the other hand, if the researchers make a Type 2 error and reject the drug, even though it is effective, patients who
could have benefited from the drug will not receive it. This is also a serious consequence, but it is generally less serious
than approving a drug that is not effective.
In this example, a Type 1 error is more serious than a Type 2 error because it could lead to patients receiving a drug that
is not effective, which could have serious health consequences.