0% found this document useful (0 votes)
11 views44 pages

Averages, Dispersion, and Correlation Analysis

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views44 pages

Averages, Dispersion, and Correlation Analysis

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT 1

Averages:

Averages are measures used to summarize a set of data by identifying a central value. The
requisites for a good average are that it should be simple to understand, easy to compute,
and representative of the entire data set.

Types of Averages:

1.​ Mean: The mean is the arithmetic average, calculated by adding all values in the
dataset and dividing by the number of values.
○​ Example: For data {2, 4, 6, 8}, the mean is (2+4+6+8)/4 = 5.
2.​ Median: The median is the middle value when the data is arranged in ascending or
descending order. If there is an even number of values, the median is the average of the
two middle numbers.
○​ Example: For data {2, 4, 6, 8}, the median is (4 + 6) / 2 = 5.
3.​ Mode: The mode is the value that appears most frequently in a dataset. A set can
have no mode, one mode, or multiple modes.
○​ Example: For data {2, 4, 4, 6, 8}, the mode is 4 because it appears twice.

Dispersion:

Dispersion refers to the extent to which values in a dataset deviate from the central value
(mean or median). Measuring variation helps understand how spread out the data is.

Significance of Measuring Variation:

It provides insights into the consistency or variability within a dataset. In some cases,
understanding the variation is more important than knowing the central value itself,
especially in fields like finance or manufacturing.

Methods of Studying Variation:

1.​ Mean Deviation: The mean deviation is the average of the absolute differences
between each data point and the mean.
○​ Example: For data {1, 3, 5}, the mean is (1+3+5)/3 = 3. The mean deviation is
[(|1-3| + |3-3| + |5-3|)/3] = (2 + 0 + 2)/3 = 1.33.
2.​ Standard Deviation: The standard deviation measures the average distance
between each data point and the mean, considering squared differences to give more
weight to larger deviations.
○​ Example: For data {1, 2, 3}, the mean is 2. The squared deviations are (1-2)² = 1,
(2-2)² = 0, and (3-2)² = 1. The variance is the average of these squared differences,
(1+0+1)/3 = 0.67, and the standard deviation is the square root of 0.67 ≈ [Link] mean
deviation and standard deviation provide insights into the spread of the data, with standard
deviation being more commonly used due to its sensitivity to extreme values.
UNIT 2

Correlation Analysis:

Significance of Studying Correlation: Correlation analysis helps in understanding


the relationship between two or more variables. It shows whether and how strongly
pairs of variables are related, which can inform decisions in fields like economics,
finance, and social sciences.

Types of Correlation:

1.​ Positive Correlation: When two variables move in the same direction. If one
increases, the other also increases.
○​ Example: Height and weight often show a positive correlation—taller
individuals tend to weigh more.
2.​ Negative Correlation: When two variables move in opposite directions. If one
increases, the other decreases.
○​ Example: The number of hours spent watching TV and academic performance
might have a negative correlation—more hours of TV could relate to lower academic
scores.
3.​ No Correlation: When there is no predictable relationship between two
variables.
○​ Example: Shoe size and intelligence likely have no correlation.

Methods of Studying Correlation:

1.​ Karl Pearson’s Coefficient of Correlation (Pearson’s r): This measures the
strength and direction of a linear relationship between two variables. It ranges from
-1 to 1.​

Formula: ​

○​ A result of 1 indicates perfect positive correlation, -1 indicates perfect negative


correlation, and 0 indicates no correlation.
2.​ Rank Correlation Coefficient (Spearman's Rank Correlation): This
method is used when the data is in ranked form, particularly useful when the
relationship between variables isn’t linear. It measures the degree of similarity
between two rankings.​

○​ Formula:

○​ Where d is the difference in ranks for each pair of values and n is the number
of pairs.

Regression Analysis:

Use of Regression Analysis: Regression analysis is used to model and analyze the
relationship between a dependent variable and one or more independent variables.
It helps predict the value of the dependent variable based on known values of
independent variables.

Difference between Correlation and Regression:

The difference between Correlation and Regression lies primarily in their purpose,
interpretation, and the type of relationship they describe:

1. Purpose:

●​ Correlation: Measures the strength and direction of a linear relationship


between two variables. It simply shows how two variables are related, but doesn’t
help with prediction.​

○​ Example: Correlation can tell you if there is a relationship between hours of


study and exam scores, but it won’t tell you how exam scores will change if study
hours change.
●​ Regression: Focuses on predicting or explaining the relationship between a
dependent variable and one or more independent variables. It tells you how much
one variable changes when another variable changes.​

○​ Example: Regression allows you to predict a person’s exam score based on the
number of hours they study.

2. Relationship Type:

●​ Correlation: It only indicates if there is a relationship (positive, negative, or


none) between two variables, without indicating any cause or effect.
○​ It doesn't imply that one variable causes the other to change.
●​ Regression: It assumes a cause-and-effect relationship, where one variable
(independent) is considered to cause the change in the other variable (dependent).
○​ For example, in predicting salary from years of experience, years of experience
is the independent variable assumed to influence salary (dependent).

3. Mathematical Expression:

●​ Correlation: Represented by a single number, usually Pearson’s correlation


coefficient (r), which ranges from -1 to 1.​

○​ r = 0: No linear relationship.
○​ r = 1 or -1: Perfect positive or negative relationship.
●​ Regression: Represented by an equation, such as Y = a + bX, where Y is the
dependent variable, X is the independent variable, a is the intercept, and b is the
slope (the rate of change).​

4. Symmetry:

●​ Correlation: The correlation between X and Y is the same as the correlation


between Y and X. The relationship is symmetric.​

○​ Example: The correlation between height and weight is the same as that
between weight and height.
●​ Regression: Regression is asymmetric. The regression of Y on X is not the
same as the regression of X on Y. It depends on which variable is considered
dependent and which is independent.​

○​ Example: The regression of weight on height is different from the regression of


height on weight.
5. Interpretation:

●​ Correlation: Tells you how strongly two variables are related, but not how
one variable affects the other.​

○​ It doesn't give a way to predict future values or quantify the relationship in


terms of units.
●​ Regression: Provides a way to predict the value of one variable based on
another and describes the relationship in terms of a formula with specific coefficients
(e.g., slope and intercept).​

●​ Correlation measures the strength and direction of a relationship between


variables, but it does not imply causation. It simply tells you if and how two
variables are related.
●​ Regression quantifies the relationship and provides a formula to predict the
dependent variable based on the independent variable(s). It’s about establishing a
cause-effect relationship.

Two Regression Equations:

1.​ Regression of Y on X (Y = a + bX): Predicts the dependent variable Y from


the independent variable X. The equation consists of an intercept a and a slope b.
○​ Example: Predicting someone’s salary (Y) based on years of experience (X).
2.​ Regression of X on Y (X = a' + b'Y): Predicts the independent variable X
based on the dependent variable Y. This equation is similar but with the roles of X
and Y reversed.

In simple terms, while correlation tells you whether two variables are related,
regression helps you understand how one variable influences the other and can be
used to make predictions.
UNIT 3

Time Series Analysis:

Utility of Time Series Analysis:

●​ Time series analysis is essential for understanding patterns and trends in


data over time. This analysis helps businesses and economists make informed
decisions based on past behavior, predict future values, and plan strategies.
●​ Use cases:
○​ Sales forecasting: A retail business may use time series analysis to predict
sales for the next month based on past sales data.
○​ Stock market predictions: Analysts use time series data to predict stock
prices, interest rates, or economic performance.
○​ Weather forecasting: Meteorologists use historical weather data to predict
future conditions.

Components of Time Series: Time series data typically consists of four


components:

1.​ Trend:
○​ A trend is the long-term direction in the data, either upward (increasing) or
downward (decreasing), or flat (no change). Trends may be caused by factors like
technological advancements, population growth, or shifts in market behavior.
○​ Example: A company’s sales might have an upward trend over the years as it
grows, or there might be a downward trend if the product becomes outdated.
2.​ Seasonal Variations:
○​ These are regular patterns that repeat at fixed periods, such as yearly,
monthly, or weekly cycles. Seasonal changes can be due to environmental factors,
holidays, or regular shifts in demand.
○​ Example: Retail sales typically spike during the holiday season every year, or
tourist destinations see more visits during the summer.
3.​ Cyclic Movements:
○​ These fluctuations are long-term, irregular changes that are often linked to
economic cycles or business cycles. Unlike seasonal changes, cycles do not follow a
fixed period, and their duration can vary.
○​ Example: A recession or economic boom affects the stock market or business
operations in a cyclical manner, with periods of growth followed by downturns.
4.​ Irregular Fluctuations (Random Noise):
○​ These are one-off events or random fluctuations that cannot be predicted
or explained easily, such as natural disasters, strikes, or sudden market shocks.
○​ Example: A company’s sales could drop drastically due to a factory fire, but
this would be a random and irregular fluctuation, not part of any regular pattern.
Methods of Measurement: To analyze and smooth time series data, various
methods are used to identify the trend and eliminate irregularities.

1.​ Moving Averages:


○​ Moving averages are used to smooth out fluctuations in data and make the
underlying trend clearer. There are two common types:
■​ Simple Moving Average (SMA): This is the average of a set number of past
data points.
■​ Example: A 3-month simple moving average for sales data would average the
sales of the past three months to predict the next month.
■​ Weighted Moving Average: This assigns different weights to different
observations, giving more importance to recent data.
■​ Example: In a weighted 3-month moving average, the most recent month's
sales could be weighted more heavily than earlier months.
2.​ Method of Least Squares:
○​ The method of least squares is used to fit a trend line to time series data.
It minimizes the sum of squared differences between the actual data points and the
values predicted by the trend line.
○​ Example: A company might use the method of least squares to create a linear
regression model that fits a straight line to historical sales data, allowing them to
predict future sales.

Business Forecasting:

Steps in Forecasting: Forecasting involves predicting future outcomes based on


past data and trends. Here's a breakdown of the typical steps in the forecasting
process:

1.​ Problem Definition:


○​ Clearly define what needs to be forecasted—whether it's sales, demand,
production levels, or financial performance.
2.​ Data Collection:
○​ Gather the historical data that will be used for forecasting. This could include
sales history, customer behavior, market trends, and other relevant data points.
○​ Example: A clothing retailer might collect monthly sales data for the last 5
years to forecast next season's demand.
3.​ Model Selection:
○​ Choose the appropriate forecasting method based on the available data and
the problem. This could involve quantitative models (like time series analysis) or
qualitative methods (like expert judgment).
○​ Example: If historical sales data is available, a time series forecasting model
may be used. If data is scarce, judgmental forecasting methods may be used.
4.​ Model Application:
○​ Apply the selected model to historical data and generate forecasts.
○​ Example: Apply moving averages or regression models to predict future sales
based on past performance.
5.​ Forecasting and Interpretation:
○​ The forecast results need to be interpreted to inform business decisions. This
might involve adjusting for external factors or uncertainties.
○​ Example: After forecasting sales, the business might need to adjust their
production plan or marketing strategy based on predicted demand.
6.​ Monitoring and Revision:
○​ Forecasts should be monitored regularly, and adjustments should be made
as needed. If predictions consistently miss the mark, the model may need to be
refined.
○​ Example: If actual sales frequently deviate from the forecast, the forecasting
model should be reviewed and updated with more accurate data or methods.

Methods of Forecasting:

1.​ Qualitative Methods:


○​ These methods rely on subjective judgment, experience, or opinions. They are
typically used when there is limited historical data or when predicting future events
that don't follow a clear pattern.
○​ Example: The Delphi method involves a panel of experts who independently
provide their forecasts, and the results are aggregated in several rounds.
2.​ Quantitative Methods:
○​ These rely on historical data and statistical models to make predictions.
Quantitative forecasting methods are particularly useful when data patterns are
stable and predictable.
○​ Example: Time series analysis, regression analysis, and moving averages are
common quantitative methods.
3.​ Causal Methods:
○​ These methods assume that the variable to be forecasted is influenced by one
or more independent variables. By analyzing the relationship between the dependent
and independent variables, you can predict future values.
○​ Example: A business might forecast future sales based on the amount spent on
advertising and economic factors, using regression models to predict outcomes.

Theories of Business Forecasting:

1.​ Extrapolation Theory:


○​ This theory assumes that past patterns will continue into the future. It's
commonly used in time series forecasting, where historical data is extended into the
future to predict future behavior.
○​ Example: If a company has seen steady sales growth of 10% annually,
extrapolation suggests that this trend will continue.
2.​ Causal Theory:
○​ This theory suggests that the variable being forecast is influenced by
external factors. These factors can be identified and used in models to predict
future outcomes.
○​ Example: The sales of a car dealership may be influenced by interest rates,
consumer confidence, and economic growth. These factors are analyzed to predict
future sales.
3.​ Judgmental Forecasting:
○​ This method is used when objective data is not available or is difficult to
analyze. It relies on expert opinion or subjective judgment.
○​ Example: If a new product is launching and there’s no historical data, an
expert or manager may use their intuition to predict how it will perform in the
market.
UNIT 4

Probability:

1. Three Approaches to Probability:

1.​ Classical Approach:​

○​ Based on equally likely outcomes, where the probability of an event is the ratio
of favorable outcomes to the total number of possible outcomes.
○​ Formula:

P(E)=(Number of favorable outcomes)/(Total number of outcomes)

○​ Example: In a fair die, the probability of rolling a 3 is

P(3)=1/6

2.​ Empirical Approach:​

○​ Based on experimental or observed data. It calculates probability by dividing


the number of occurrences of an event by the total number of trials.
○​ Formula:

P(E)=(Number of times event E occurs)/(Total number of trials)

○​ Example: If you flip a coin 100 times and get 55 heads, the empirical
probability of getting heads is

P(Heads)=55/100=0.55

3.​ Axiomatic Approach:​

○​ Based on a set of axioms (rules) for defining probability. This approach uses
formal rules to derive probabilities, focusing on consistency and logical structure.
○​ Axioms:
1.​ P(S)=1P(S) = 1, where SS is the sample space.
2.​ P(E)≥0P(E) \geq 0 for any event EE.
3.​ If A1,A2,A3,…A_1, A_2, A_3,..... are mutually exclusive events, then
P(A1∪A2∪… )=P(A1)+P(A2)+…

2. Addition and Multiplication Theorems of Probability:

1.​ Addition Theorem:​


○​ This theorem is used to find the probability of the union of two events.
○​ For two events a and b:
■​ If a and b are mutually exclusive (cannot happen at the same time):

P(A∪B)=P(A)+P(B)

■​ If a and b are not mutually exclusive (can occur together):

P(A∪B)=P(A)+P(B)−P(A∩B)

○​ Example: The probability of drawing a card that is either a heart or a red card
(mutually exclusive)
2.​ Multiplication Theorem:​

○​ This theorem is used to find the probability of the intersection of two events.
○​ For two events a and b:
■​ If a and b are independent (the outcome of one does not affect the other):

P(A∩B)=P(A)×P(B)

■​ If a and b are not independent:

P(A∩B)=P(A)×P(B∣A)

Example: The probability of getting heads in two consecutive coin flips is

P(Heads on 1st∩Heads on 2nd)= 1/2​×1/2​ = 1/4​.

3. Bayes' Theorem:

●​ Bayes' Theorem is used to update the probability of an event based on new


evidence.
●​ Formula: P(A∣B) = P(B∣A)×P(A)
​ ​ ​ ​ P(B)

●​ Where:
○​ P(A∣B)P(A|B) is the probability of A given B.
○​ P(B∣A)P(B|A) is the probability of B given A.
○​ P(A)P(A) is the prior probability of a.
○​ P(B)P(B) is the total probability of b.
●​ Example: If a test for a disease is 95% accurate, Bayes’ theorem helps you
calculate the probability that a person actually has the disease, given a positive test
result.
4. Probability Distributions:

1.​ Binomial Distribution:​

○​ Used for experiments with two outcomes (success/failure) and a fixed number
of trials.
○​ Formula for the probability of exactly k successes in n trials:

○​ Where p is the probability of success, n is the number of trials, and kk is the


number of successes.
○​ Example: The probability of flipping exactly 3 heads in 5 coin flips.
2.​ Poisson Distribution:
○​ Describes the probability of a number of events occurring in a fixed interval of
time or space, where events happen at a constant average rate and independently
of each other.

○​ Formula:
○​ Where λ\lambda is the average rate of events, and kk is the number of events.
○​ Example: The probability of receiving exactly 5 calls in an hour if, on average,
3 calls are received per hour.
3.​ Normal Distribution:
○​ A continuous probability distribution that is symmetric around the mean, often
referred to as a bell curve.
○​ Parameters: Mean (μ\mu) and Standard deviation (σ\sigma).
○​ The probability density function is:

Example: The distribution of heights of adult men in a population can often


be modeled by a normal distribution.
Intervention in Markets:

1. Price Controls:

●​ Price ceilings: The maximum price that can be charged for a good or service,
often used to prevent prices from being too high (e.g., rent control).
●​ Price floors: The minimum price that can be charged, used to ensure prices
are not too low (e.g., minimum wage laws).

2. Support Price:

●​ A support price is the minimum price set by the government for certain goods,
often agricultural products, to ensure that producers receive a fair price that covers
their costs and protects them from market fluctuations.
●​ Example: Governments may set support prices for crops like wheat or rice to
protect farmers' income.

3. Prevention and Control of Monopolies:

●​ Governments implement antitrust laws and regulations to prevent


monopolies from forming, as monopolies can lead to higher prices, reduced
consumer choice, and inefficiency in the market.
●​ Methods include breaking up large firms, preventing mergers and acquisitions
that could reduce competition, and regulating monopolistic industries.

4. System of Dual Price:

●​ A dual price system refers to a system where a country or market has two
different prices for a good, one for the domestic market and one for international
trade.
●​ Often used in situations like agriculture, where the government might set a
lower domestic price for consumers while selling at a higher price internationally to
boost exports and protect domestic producers.
●​ Example: A government may provide subsidized food prices to its citizens while
selling surplus production at higher prices to foreign buyers.
UNIT 5

Statistical Inference:

Statistical Inference involves drawing conclusions about a population based on a


sample of data. It typically involves hypothesis testing, where you evaluate
whether there is enough evidence to support or reject a claim about the population.

Procedure of Testing Hypothesis:

The process of hypothesis testing follows these steps:

1.​ State the Hypotheses:​

○​ Null Hypothesis (H₀): This represents the default assumption or no effect.


○​ Alternative Hypothesis (H₁ or Ha): This represents the claim you want to
test, suggesting an effect or difference.
2.​ Choose the Significance Level (α):​

○​ The significance level (α) is the probability of rejecting the null hypothesis
when it is true. A common value is 0.05 (5%).
3.​ Select the Test Statistic:​

○​ Depending on the type of data and sample size, select an appropriate test
statistic (e.g., t-test, chi-square test).
4.​ Compute the Test Statistic:​

○​ Using sample data, calculate the test statistic (e.g., t-statistic, chi-square
statistic).
5.​ Decision Rule:​

○​ Compare the test statistic to a critical value from statistical tables (based on α)
to make a decision.
○​ If the test statistic falls in the rejection region, reject H₀. If it falls in the
non-rejection region, fail to reject H₀.
6.​ Conclusion:​

○​ State the conclusion in context, either rejecting or failing to reject the null
hypothesis based on the test results.
Two Types of Errors in Testing Hypothesis:

1.​ Type I Error (False Positive):​

○​ Occurs when the null hypothesis is rejected when it is actually true. This
means concluding that there is an effect or difference when there isn't.
○​ Example: A medical test incorrectly indicates a person has a disease when they
do not.
2.​ Type II Error (False Negative):​

○​ Occurs when the null hypothesis is not rejected when it is actually false. This
means failing to detect an effect or difference that truly exists.
○​ Example: A medical test fails to detect a disease when the person actually has
it.

Two-Tailed and One-Tailed Tests of Hypothesis:

1.​ Two-Tailed Test:​

○​ Used when the alternative hypothesis suggests that the parameter could be
either greater than or less than a certain value.
○​ The rejection region is in both tails of the distribution.
○​ Example: Testing if the mean of a population is different from a specific value
(e.g., μ≠50\mu \neq 50).
2.​ One-Tailed Test:​

○​ Used when the alternative hypothesis suggests that the parameter is either
greater than or less than a certain value, but not both.
○​ The rejection region is in only one tail of the distribution.
○​ Example: Testing if the mean is greater than a specific value (e.g., μ>50\mu
> 50).
Types of Statistical Tests:

1.​ t-Test:
○​ A t-test is used to compare the means of two groups, especially when the
sample size is small (typically less than 30) and the population standard deviation is
unknown.​

○​ Types of t-tests:​

■​ One-sample t-test: Compares the sample mean to a known value (e.g., a


population mean).
■​ Independent two-sample t-test: Compares the means of two independent
groups.
■​ Paired t-test: Compares the means of two related groups (e.g., before and
after measurements).
○​ Formula for a one-sample t-test:


Where xˉ is the sample mean, μ\mu is the population mean, s is the
sample standard deviation, and n is the sample size.​

2.​ F-Test:​

○​ The F-test is used to compare the variances of two populations. It is often


used in the analysis of variance (ANOVA).
○​ Formula:

F = (Variance of sample 1)/(Variance of sample 2)

○​ If the F-value is significantly large, it suggests that the variances are different.
3.​ Chi-Square Test:​

○​ The chi-square test is used for categorical data to test the goodness of fit or
the independence of two variables.
○​ Goodness of Fit: Tests whether the observed data fits a specific distribution
(e.g., uniform distribution).
○​ Test of Independence: Tests if two categorical variables are independent or
related.
○​ Formula for chi-square statistic:

Where Oi is the observed frequency and Ei is the expected frequency.

4.​ Analysis of Variance (ANOVA):​

○​ ANOVA is used to compare the means of three or more groups to see if


there is a significant difference between them.​

○​ It tests the null hypothesis that all group means are equal.​

○​ One-way ANOVA: Compares one independent variable across multiple


groups.​

○​ Two-way ANOVA: Compares two independent variables, and their interaction


effect on a dependent variable.​

○​ ANOVA uses the F-test to compare variances between groups (mean squares
between groups) and within groups (mean squares within groups).​

○​ Formula for F-statistic in ANOVA:

F=(Mean Square Between Groups)/(Mean Square Within Groups)

Where Mean Square Between Groups is the variance between the group means
and Mean Square Within Groups is the variance within each group.​
Q1 a. What do you mean by correlation? Mention any four uses of it?

b. Discuss the difference between Parametric and Non-Parametric tests?

Answer:

Correlation represents the relationship between two variables and it's an important
metric when analyzing data sets. Learning more about how to calculate correlation
and how to interpret your results can help you make efficient financial and
marketing decisions.

●​ Correlation is the relationship between two variables and it can be measured to


inform financial and marketing decisions.
●​ Correlation can be positive when two variables move in the same direction,
negative when the two variables move in opposite directions or zero when there is
no relationship between two variables.
●​ Types of correlations include the Pearson correlation for linear relationships,
the Spearman correlation, which determines a monotonic relationship between
variables and the Kendall correlation that measures the strength of dependence
between two datasets.
●​ Types of Correlation
●​ Positive Linear Correlation. There is a positive linear correlation when the
variable on the x -axis increases as the variable on the y -axis increases. ...
●​ Negative Linear Correlation. ...
●​ Non-linear Correlation (known as curvilinear correlation) ...
●​ No Correlation.

common uses of correlation:

1.​ Predictive Modeling: Correlation can be used to build predictive models to


estimate the relationship between two variables. For example, in economics, the
correlation between interest rates and consumer spending can be used to predict
how changes in interest rates will affect consumer spending.
2.​ Quality Control: Correlation is used in quality control to measure the
relationship between two variables that affect product quality. For example, in
manufacturing, the correlation between temperature and product quality can be
used to control the temperature to ensure that the product meets quality standards.
3.​ Survey Research: Correlation is used in survey research to measure the
relationship between two variables that are measured using survey questions. For
example, in social science research, the correlation between income and education
level can be used to study the relationship between these two variables.
4.​ Medical Research: Correlation is used in medical research to study the
relationship between two variables, such as a treatment and a disease outcome. For
example, the correlation between smoking and lung cancer can be used to study the
relationship between smoking and the risk of developing lung cancer.
5.​ Risk Management: Correlation is used in risk management to estimate the
relationship between two risks. For example, in financial risk management, the
correlation between two assets can be used to estimate the risk of a portfolio that
includes both assets.

b) Definition of Parametric and Nonparametric Test

Parametric Test Definition

In Statistics, a parametric test is a kind of hypothesis test which gives


generalizations for generating records regarding the mean of the primary/original
population. The t-test is carried out based on the students’ t-statistic, which is often
used in that value.

The t-statistic test holds on the underlying hypothesis, which includes the normal
distribution of a variable. In this case, the mean is known, or it is considered to be
known. For finding the sample from the population, population variance is identified.
It is hypothesized that the variables of concern in the population are estimated on
an interval scale.

Non-Parametric Test Definition

The non-parametric test does not require any population distribution, which is meant
by distinct parameters. It is also a kind of hypothesis test, which is not based on the
underlying hypothesis. In the case of the non-parametric test, the test is based on
the differences in the median. So this kind of test is also called a distribution-free
test. The test variables are determined on the nominal or ordinal level. If the
independent variables are non-metric, the non-parametric test is usually performed.

Advantages and Disadvantages of Parametric and Nonparametric Tests

A lot of individuals accept that the choice between using parametric or


nonparametric tests relies upon whether your information is normally distributed.
The distribution can act as a deciding factor in case the data set is relatively small.
Although, in a lot of cases, this issue isn't a critical issue because of the following
reasons:

Parametric tests help in analyzing non normal appropriations for a lot of datasets.

· Nonparametric tests when analyzed have other firm conclusions that are harder to
[Link] appropriate response is usually dependent upon whether the mean or
median is chosen to be a better measure of central tendency for the distribution of
the data.
· A parametric test is considered when you have the mean value as your central
value and the size of your data set is comparatively large. This test helps in making
powerful and effective decisions.

A non-parametric test is considered regardless of the size of the data set if the
median value is better when compared to the mean value.

Properties Parametric Non-parametric

Assumptions Yes No

central tendency Mean value Median value


Value

Correlation Pearson Spearman

Probabilistic Normal Arbitrary


distribution

Population Requires Does not require


knowledge

Used for Interval data Nominal data

Applicability Variables Attributes &


Variables
Examples z-test, t-test, Kruskal-Wallis,
etc. Mann-Whitney

Q.3 Consider the following frequency distribution. Calculate the


mean weight of students.

Weight (in 31- 36 – 41 – 46 – 51 – 56 – 61 – 66 – 71 –


kg) 35 40 45 50 55 60 65 70 75

Number of 9 6 15 3 1 2 2 1 1
Students

Solution:

The given distribution has discontinuous class intervals, so we need to make


them continuous.

Class Number of students Class mark di = xi – fidi


intervals (fi) (xi) a

30.5 – 35.5 9 33 -10 -90

35.5 – 40.5 6 38 -5 -25

40.5 – 45.5 15 43 = a 0 0

45.5 – 50.5 3 48 5 15
50.5 – 55.5 1 53 10 10

55.5 – 60.5 2 58 15 30

60.5 – 65.5 2 63 20 40

65.5 – 70.5 1 68 25 25

70.5 – 75.5 1 73 30 30

Total ∑fi = 40 ∑fidi =


35

Here, ∑fi = 40 and ∑fidi = 35

By Assumed mean method,

Mean = a + (∑fidi/∑fi)

= 43 + (35/40)

= 43 + 0.875

= 43.875

Therefore, the mean weight of the students is 43.875 kg.

Q.4 Calculate the median marks of students from the


following distribution.

Marks 10 – 20 – 30 – 40 – 50 – 60 – 70 –
20 30 40 50 60 70 80
Number of 7 10 10 20 20 15 8
Students

Solution:

Class interval Number of students (frequency) Cumulative frequency

10 – 20 7 7

20 – 30 10 17

30 – 40 10 27 = cf

40 – 50 20 = f 47

50 – 60 20 67

60 – 70 15 82

70 – 80 8 90

N/2 = 90/2 = 45

Cumulative frequency greater and nearer to 45 is 47, which lies in the


interval 40 – 50

Median class is 40 – 50.

Lower limit of the median class = l = 40

Class size = h = 10
Frequency of the median class = f = 20

Cumulative frequency of the class preceding the median class = cf = 27

As we know,

Median = 40 + [(45 – 27)/20] × 10

= 40 + (18/2)

= 40 + 9

= 49

Hence, the median marks of the students = 49.

Q.4 Question 2: Find the mean, median, mode, and range


for the given data

190, 153, 168, 179, 194, 153, 165, 187, 190, 170, 165, 189, 185,
153, 147, 161, 127, 180

Solution:

For Mean:

190, 153, 168, 179, 194, 153, 165, 187, 190, 170, 165, 189, 185, 153, 147,
161, 127, 180

Number of observations = 18

Mean = (Sum of observations) / (Number of observations)

=
(190+153+168+179+194+153+165+187+190+170+165+189+185+153+
147 +161+127+180) / 18

= 2871/18

= 159.5

Therefore, the mean is 159.5

For Median:

The ascending order of given observations is,

127, 147, 153, 153, 153, 161, 165, 165, 168, 170, 179, 180, 185, 187, 189,
190, 190, 194

Here, n = 18
Median = 1/2 [(n/2) + (n/2 + 1)]th observation​
= 1/2 [9 + 10]th observation​
= 1/2 (168 + 170)​
= 338/2​
= 169

Thus, the median is 169

For Mode:

The number with the highest frequency = 153

Thus, mode = 53

For Range:

Range = Highest value – Lowest value​


= 194 – 127​
= 67

Q.5 On a final exam, a lecturer noted each student's score.


This frequency distribution is the grouping of the scores:

Score Range Frequency

40-50 3

50-60 5

60-70 8

70-80 6
80-90 2

90-100 1

Calculate the mean score of the students.

Solution:

Score Midpoint
Range (x) frequency (f) f·x

3 135
45
40-50

5 275
55
50-60

8 520
65
60-70

6 450
75
70-80

2 170
85
80-90

1 95
95
90-100

∑ (f · x) = 135+275+520+450+170+95

=1645

Total of frequency = ∑ (f)


∑ (f) = 3+5+8+6+2+1

=25

Mean =

= 1645/25

= 65.8

∴ The mean score of the students is 65.8.

Q. What is Null and Alternative Hypothesis?

Null Hypothesis (H₀):

●​ The null hypothesis represents a statement of no effect, no difference, or no


relationship. It's the default assumption that there's nothing unusual happening in
the data or that the result is due to random chance.
●​ It often suggests that any observed effect or difference in the data is due to
sampling variability or random chance.
●​ The goal of hypothesis testing is to either reject or fail to reject the null
hypothesis based on the data.

Example: If you're testing a new drug, the null hypothesis might be that the drug
has no effect on patients (i.e., the mean effect is zero).

Alternative Hypothesis (H₁ or Ha):

●​ The alternative hypothesis is the opposite of the null hypothesis. It represents


the idea that there is a true effect, difference, or relationship in the data.
●​ If the null hypothesis is rejected, it suggests that there is sufficient evidence to
support the alternative hypothesis.

Example: In the same drug testing scenario, the alternative hypothesis might be
that the drug does have an effect on patients (i.e., the mean effect is not zero).

Q. What is Statistics?

Statistics is the field of mathematics that deals with collecting, analyzing,


interpreting, presenting, and organizing data. It helps in making decisions or
drawing conclusions based on data. Statistics is widely used in various fields such as
business, economics, social sciences, medicine, and engineering to inform decisions
and understand trends, patterns, or relationships in data.

Key components of statistics include:


●​ Descriptive Statistics: This involves summarizing and describing the features of
a data set (e.g., mean, median, standard deviation, and range).
●​ Inferential Statistics: This involves making predictions or inferences about a
population based on a sample of data (e.g., hypothesis testing, confidence
intervals).
●​ Probability: The study of uncertainty and how likely certain outcomes are.

Q. What is Time Series?

A Time Series is a sequence of data points collected or recorded at successive points


in time, often at uniform intervals (e.g., daily, monthly, annually). Time series data
is important because it captures trends, patterns, and relationships over time.
Analyzing time series data allows for the understanding of historical trends and
forecasting future values.

Examples of time series data:

●​ Stock market prices recorded every day.


●​ Monthly unemployment rates.
●​ Daily temperature measurements.
●​ Annual sales data for a business.

Q. Find out Mean, Median and Mode from the following data:

Marks: 10-20, 20-30, 30-40, 40-50, 50-60

No. of students: 15 20 45 15
Q. Find the S.D. of the following data:

Age (in years): 4-6 6-8 8-10 10-12 12-14 14-16 16-18

No. of students: 30 90 120 150 80 60 20


Q. Fit a straight line trend by the method of least squares to the following
data:- Year : 2012 2013 2014 2015 2016 2017

Sales of T.V. sets (in'000): 7 10 12 14 17 24


Q. Calculate coefficient of rank correlation from the following data:-

Marks in Account: 48 33 40 9 18 14 67 24 19 65

Marks in Statistics:12 13 29 6 15 4 20 9 5 19
Q. Calculate the two regression equations from the following data:

X: 6 2 10 4 8

Y: 9 11 5 8 7
Q. There are three bags. Bag I contains 3 white and 5 black balls. Bag II has 5
white and 7 black balls while bag III contains 9 white and 6 black balls. One
white ball is drawn from one of the bags. Find the probability that it is drawn from
bag II?
Q. As a result of a certain experiment, the data obtained were:

x: 0 1 2 3 4

Y: 8 32 34 24 5

Fit a binomial distribution to the above data

You might also like