0% found this document useful (0 votes)

11 views44 pages

Averages, Dispersion, and Correlation Analysis

Uploaded by

yadavvaishnavi161

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views44 pages

Averages, Dispersion, and Correlation Analysis

Uploaded by

yadavvaishnavi161

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT 1

Averages:

Averages are measures used to summarize a set of data by identifying a central value. The
requisites for a good average are that it should be simple to understand, easy to compute,
and representative of the entire data set.

Types of Averages:

1. Mean: The mean is the arithmetic average, calculated by adding all values in the
dataset and dividing by the number of values.
○ Example: For data {2, 4, 6, 8}, the mean is (2+4+6+8)/4 = 5.
2. Median: The median is the middle value when the data is arranged in ascending or
descending order. If there is an even number of values, the median is the average of the
two middle numbers.
○ Example: For data {2, 4, 6, 8}, the median is (4 + 6) / 2 = 5.
3. Mode: The mode is the value that appears most frequently in a dataset. A set can
have no mode, one mode, or multiple modes.
○ Example: For data {2, 4, 4, 6, 8}, the mode is 4 because it appears twice.

Dispersion:

Dispersion refers to the extent to which values in a dataset deviate from the central value
(mean or median). Measuring variation helps understand how spread out the data is.

Significance of Measuring Variation:

It provides insights into the consistency or variability within a dataset. In some cases,
understanding the variation is more important than knowing the central value itself,
especially in fields like finance or manufacturing.

Methods of Studying Variation:

1. Mean Deviation: The mean deviation is the average of the absolute differences
between each data point and the mean.
○ Example: For data {1, 3, 5}, the mean is (1+3+5)/3 = 3. The mean deviation is
[(|1-3| + |3-3| + |5-3|)/3] = (2 + 0 + 2)/3 = 1.33.
2. Standard Deviation: The standard deviation measures the average distance
between each data point and the mean, considering squared differences to give more
weight to larger deviations.
○ Example: For data {1, 2, 3}, the mean is 2. The squared deviations are (1-2)² = 1,
(2-2)² = 0, and (3-2)² = 1. The variance is the average of these squared differences,
(1+0+1)/3 = 0.67, and the standard deviation is the square root of 0.67 ≈ [Link] mean
deviation and standard deviation provide insights into the spread of the data, with standard
deviation being more commonly used due to its sensitivity to extreme values.
UNIT 2

Correlation Analysis:

Significance of Studying Correlation: Correlation analysis helps in understanding

the relationship between two or more variables. It shows whether and how strongly
pairs of variables are related, which can inform decisions in fields like economics,
finance, and social sciences.

Types of Correlation:

1. Positive Correlation: When two variables move in the same direction. If one
increases, the other also increases.
○ Example: Height and weight often show a positive correlation—taller
individuals tend to weigh more.
2. Negative Correlation: When two variables move in opposite directions. If one
increases, the other decreases.
○ Example: The number of hours spent watching TV and academic performance
might have a negative correlation—more hours of TV could relate to lower academic
scores.
3. No Correlation: When there is no predictable relationship between two
variables.
○ Example: Shoe size and intelligence likely have no correlation.

Methods of Studying Correlation:

1. Karl Pearson’s Coefficient of Correlation (Pearson’s r): This measures the
strength and direction of a linear relationship between two variables. It ranges from
-1 to 1.

Formula:

○ A result of 1 indicates perfect positive correlation, -1 indicates perfect negative

correlation, and 0 indicates no correlation.
2. Rank Correlation Coefficient (Spearman's Rank Correlation): This
method is used when the data is in ranked form, particularly useful when the
relationship between variables isn’t linear. It measures the degree of similarity
between two rankings.

○ Formula:

○ Where d is the difference in ranks for each pair of values and n is the number
of pairs.

Regression Analysis:

Use of Regression Analysis: Regression analysis is used to model and analyze the
relationship between a dependent variable and one or more independent variables.
It helps predict the value of the dependent variable based on known values of
independent variables.

Difference between Correlation and Regression:

The difference between Correlation and Regression lies primarily in their purpose,
interpretation, and the type of relationship they describe:

1. Purpose:

● Correlation: Measures the strength and direction of a linear relationship

between two variables. It simply shows how two variables are related, but doesn’t
help with prediction.

○ Example: Correlation can tell you if there is a relationship between hours of

study and exam scores, but it won’t tell you how exam scores will change if study
hours change.
● Regression: Focuses on predicting or explaining the relationship between a
dependent variable and one or more independent variables. It tells you how much
one variable changes when another variable changes.

○ Example: Regression allows you to predict a person’s exam score based on the
number of hours they study.

2. Relationship Type:

● Correlation: It only indicates if there is a relationship (positive, negative, or

none) between two variables, without indicating any cause or effect.
○ It doesn't imply that one variable causes the other to change.
● Regression: It assumes a cause-and-effect relationship, where one variable
(independent) is considered to cause the change in the other variable (dependent).
○ For example, in predicting salary from years of experience, years of experience
is the independent variable assumed to influence salary (dependent).

3. Mathematical Expression:

● Correlation: Represented by a single number, usually Pearson’s correlation

coefficient (r), which ranges from -1 to 1.

○ r = 0: No linear relationship.
○ r = 1 or -1: Perfect positive or negative relationship.
● Regression: Represented by an equation, such as Y = a + bX, where Y is the
dependent variable, X is the independent variable, a is the intercept, and b is the
slope (the rate of change).

4. Symmetry:

● Correlation: The correlation between X and Y is the same as the correlation

between Y and X. The relationship is symmetric.

○ Example: The correlation between height and weight is the same as that
between weight and height.
● Regression: Regression is asymmetric. The regression of Y on X is not the
same as the regression of X on Y. It depends on which variable is considered
dependent and which is independent.

○ Example: The regression of weight on height is different from the regression of

height on weight.
5. Interpretation:

● Correlation: Tells you how strongly two variables are related, but not how
one variable affects the other.

○ It doesn't give a way to predict future values or quantify the relationship in

terms of units.
● Regression: Provides a way to predict the value of one variable based on
another and describes the relationship in terms of a formula with specific coefficients
(e.g., slope and intercept).

● Correlation measures the strength and direction of a relationship between

variables, but it does not imply causation. It simply tells you if and how two
variables are related.
● Regression quantifies the relationship and provides a formula to predict the
dependent variable based on the independent variable(s). It’s about establishing a
cause-effect relationship.

Two Regression Equations:

1. Regression of Y on X (Y = a + bX): Predicts the dependent variable Y from

the independent variable X. The equation consists of an intercept a and a slope b.
○ Example: Predicting someone’s salary (Y) based on years of experience (X).
2. Regression of X on Y (X = a' + b'Y): Predicts the independent variable X
based on the dependent variable Y. This equation is similar but with the roles of X
and Y reversed.

In simple terms, while correlation tells you whether two variables are related,
regression helps you understand how one variable influences the other and can be
used to make predictions.
UNIT 3

Time Series Analysis:

Utility of Time Series Analysis:

● Time series analysis is essential for understanding patterns and trends in

data over time. This analysis helps businesses and economists make informed
decisions based on past behavior, predict future values, and plan strategies.
● Use cases:
○ Sales forecasting: A retail business may use time series analysis to predict
sales for the next month based on past sales data.
○ Stock market predictions: Analysts use time series data to predict stock
prices, interest rates, or economic performance.
○ Weather forecasting: Meteorologists use historical weather data to predict
future conditions.

Components of Time Series: Time series data typically consists of four

components:

1. Trend:
○ A trend is the long-term direction in the data, either upward (increasing) or
downward (decreasing), or flat (no change). Trends may be caused by factors like
technological advancements, population growth, or shifts in market behavior.
○ Example: A company’s sales might have an upward trend over the years as it
grows, or there might be a downward trend if the product becomes outdated.
2. Seasonal Variations:
○ These are regular patterns that repeat at fixed periods, such as yearly,
monthly, or weekly cycles. Seasonal changes can be due to environmental factors,
holidays, or regular shifts in demand.
○ Example: Retail sales typically spike during the holiday season every year, or
tourist destinations see more visits during the summer.
3. Cyclic Movements:
○ These fluctuations are long-term, irregular changes that are often linked to
economic cycles or business cycles. Unlike seasonal changes, cycles do not follow a
fixed period, and their duration can vary.
○ Example: A recession or economic boom affects the stock market or business
operations in a cyclical manner, with periods of growth followed by downturns.
4. Irregular Fluctuations (Random Noise):
○ These are one-off events or random fluctuations that cannot be predicted
or explained easily, such as natural disasters, strikes, or sudden market shocks.
○ Example: A company’s sales could drop drastically due to a factory fire, but
this would be a random and irregular fluctuation, not part of any regular pattern.
Methods of Measurement: To analyze and smooth time series data, various
methods are used to identify the trend and eliminate irregularities.

1. Moving Averages:

○ Moving averages are used to smooth out fluctuations in data and make the
underlying trend clearer. There are two common types:
■ Simple Moving Average (SMA): This is the average of a set number of past
data points.
■ Example: A 3-month simple moving average for sales data would average the
sales of the past three months to predict the next month.
■ Weighted Moving Average: This assigns different weights to different
observations, giving more importance to recent data.
■ Example: In a weighted 3-month moving average, the most recent month's
sales could be weighted more heavily than earlier months.
2. Method of Least Squares:
○ The method of least squares is used to fit a trend line to time series data.
It minimizes the sum of squared differences between the actual data points and the
values predicted by the trend line.
○ Example: A company might use the method of least squares to create a linear
regression model that fits a straight line to historical sales data, allowing them to
predict future sales.

Business Forecasting:

Steps in Forecasting: Forecasting involves predicting future outcomes based on

past data and trends. Here's a breakdown of the typical steps in the forecasting
process:

1. Problem Definition:

○ Clearly define what needs to be forecasted—whether it's sales, demand,
production levels, or financial performance.
2. Data Collection:
○ Gather the historical data that will be used for forecasting. This could include
sales history, customer behavior, market trends, and other relevant data points.
○ Example: A clothing retailer might collect monthly sales data for the last 5
years to forecast next season's demand.
3. Model Selection:
○ Choose the appropriate forecasting method based on the available data and
the problem. This could involve quantitative models (like time series analysis) or
qualitative methods (like expert judgment).
○ Example: If historical sales data is available, a time series forecasting model
may be used. If data is scarce, judgmental forecasting methods may be used.
4. Model Application:
○ Apply the selected model to historical data and generate forecasts.
○ Example: Apply moving averages or regression models to predict future sales
based on past performance.
5. Forecasting and Interpretation:
○ The forecast results need to be interpreted to inform business decisions. This
might involve adjusting for external factors or uncertainties.
○ Example: After forecasting sales, the business might need to adjust their
production plan or marketing strategy based on predicted demand.
6. Monitoring and Revision:
○ Forecasts should be monitored regularly, and adjustments should be made
as needed. If predictions consistently miss the mark, the model may need to be
refined.
○ Example: If actual sales frequently deviate from the forecast, the forecasting
model should be reviewed and updated with more accurate data or methods.

Methods of Forecasting:

1. Qualitative Methods:

○ These methods rely on subjective judgment, experience, or opinions. They are
typically used when there is limited historical data or when predicting future events
that don't follow a clear pattern.
○ Example: The Delphi method involves a panel of experts who independently
provide their forecasts, and the results are aggregated in several rounds.
2. Quantitative Methods:
○ These rely on historical data and statistical models to make predictions.
Quantitative forecasting methods are particularly useful when data patterns are
stable and predictable.
○ Example: Time series analysis, regression analysis, and moving averages are
common quantitative methods.
3. Causal Methods:
○ These methods assume that the variable to be forecasted is influenced by one
or more independent variables. By analyzing the relationship between the dependent
and independent variables, you can predict future values.
○ Example: A business might forecast future sales based on the amount spent on
advertising and economic factors, using regression models to predict outcomes.

Theories of Business Forecasting:

1. Extrapolation Theory:

○ This theory assumes that past patterns will continue into the future. It's
commonly used in time series forecasting, where historical data is extended into the
future to predict future behavior.
○ Example: If a company has seen steady sales growth of 10% annually,
extrapolation suggests that this trend will continue.
2. Causal Theory:
○ This theory suggests that the variable being forecast is influenced by
external factors. These factors can be identified and used in models to predict
future outcomes.
○ Example: The sales of a car dealership may be influenced by interest rates,
consumer confidence, and economic growth. These factors are analyzed to predict
future sales.
3. Judgmental Forecasting:
○ This method is used when objective data is not available or is difficult to
analyze. It relies on expert opinion or subjective judgment.
○ Example: If a new product is launching and there’s no historical data, an
expert or manager may use their intuition to predict how it will perform in the
market.
UNIT 4

Probability:

1. Three Approaches to Probability:

1. Classical Approach:

○ Based on equally likely outcomes, where the probability of an event is the ratio
of favorable outcomes to the total number of possible outcomes.
○ Formula:

P(E)=(Number of favorable outcomes)/(Total number of outcomes)

○ Example: In a fair die, the probability of rolling a 3 is

P(3)=1/6

2. Empirical Approach:

○ Based on experimental or observed data. It calculates probability by dividing

the number of occurrences of an event by the total number of trials.
○ Formula:

P(E)=(Number of times event E occurs)/(Total number of trials)

○ Example: If you flip a coin 100 times and get 55 heads, the empirical
probability of getting heads is

P(Heads)=55/100=0.55

3. Axiomatic Approach:

○ Based on a set of axioms (rules) for defining probability. This approach uses
formal rules to derive probabilities, focusing on consistency and logical structure.
○ Axioms:
1. P(S)=1P(S) = 1, where SS is the sample space.
2. P(E)≥0P(E) \geq 0 for any event EE.
3. If A1,A2,A3,…A_1, A_2, A_3,..... are mutually exclusive events, then
P(A1∪A2∪… )=P(A1)+P(A2)+…

2. Addition and Multiplication Theorems of Probability:

1. Addition Theorem:

○ This theorem is used to find the probability of the union of two events.
○ For two events a and b:
■ If a and b are mutually exclusive (cannot happen at the same time):

P(A∪B)=P(A)+P(B)

■ If a and b are not mutually exclusive (can occur together):

P(A∪B)=P(A)+P(B)−P(A∩B)

○ Example: The probability of drawing a card that is either a heart or a red card
(mutually exclusive)
2. Multiplication Theorem:

○ This theorem is used to find the probability of the intersection of two events.
○ For two events a and b:
■ If a and b are independent (the outcome of one does not affect the other):

P(A∩B)=P(A)×P(B)

■ If a and b are not independent:

P(A∩B)=P(A)×P(B∣A)

Example: The probability of getting heads in two consecutive coin flips is

P(Heads on 1st∩Heads on 2nd)= 1/2×1/2 = 1/4.

3. Bayes' Theorem:

● Bayes' Theorem is used to update the probability of an event based on new

evidence.
● Formula: P(A∣B) = P(B∣A)×P(A)
P(B)

● Where:
○ P(A∣B)P(A|B) is the probability of A given B.
○ P(B∣A)P(B|A) is the probability of B given A.
○ P(A)P(A) is the prior probability of a.
○ P(B)P(B) is the total probability of b.
● Example: If a test for a disease is 95% accurate, Bayes’ theorem helps you
calculate the probability that a person actually has the disease, given a positive test
result.
4. Probability Distributions:

1. Binomial Distribution:

○ Used for experiments with two outcomes (success/failure) and a fixed number
of trials.
○ Formula for the probability of exactly k successes in n trials:

○ Where p is the probability of success, n is the number of trials, and kk is the

number of successes.
○ Example: The probability of flipping exactly 3 heads in 5 coin flips.
2. Poisson Distribution:
○ Describes the probability of a number of events occurring in a fixed interval of
time or space, where events happen at a constant average rate and independently
of each other.

○ Formula:
○ Where λ\lambda is the average rate of events, and kk is the number of events.
○ Example: The probability of receiving exactly 5 calls in an hour if, on average,
3 calls are received per hour.
3. Normal Distribution:
○ A continuous probability distribution that is symmetric around the mean, often
referred to as a bell curve.
○ Parameters: Mean (μ\mu) and Standard deviation (σ\sigma).
○ The probability density function is:

Example: The distribution of heights of adult men in a population can often

be modeled by a normal distribution.
Intervention in Markets:

1. Price Controls:

● Price ceilings: The maximum price that can be charged for a good or service,
often used to prevent prices from being too high (e.g., rent control).
● Price floors: The minimum price that can be charged, used to ensure prices
are not too low (e.g., minimum wage laws).

2. Support Price:

● A support price is the minimum price set by the government for certain goods,
often agricultural products, to ensure that producers receive a fair price that covers
their costs and protects them from market fluctuations.
● Example: Governments may set support prices for crops like wheat or rice to
protect farmers' income.

3. Prevention and Control of Monopolies:

● Governments implement antitrust laws and regulations to prevent

monopolies from forming, as monopolies can lead to higher prices, reduced
consumer choice, and inefficiency in the market.
● Methods include breaking up large firms, preventing mergers and acquisitions
that could reduce competition, and regulating monopolistic industries.

4. System of Dual Price:

● A dual price system refers to a system where a country or market has two
different prices for a good, one for the domestic market and one for international
trade.
● Often used in situations like agriculture, where the government might set a
lower domestic price for consumers while selling at a higher price internationally to
boost exports and protect domestic producers.
● Example: A government may provide subsidized food prices to its citizens while
selling surplus production at higher prices to foreign buyers.
UNIT 5

Statistical Inference:

Statistical Inference involves drawing conclusions about a population based on a

sample of data. It typically involves hypothesis testing, where you evaluate
whether there is enough evidence to support or reject a claim about the population.

Procedure of Testing Hypothesis:

The process of hypothesis testing follows these steps:

1. State the Hypotheses:

○ Null Hypothesis (H₀): This represents the default assumption or no effect.

○ Alternative Hypothesis (H₁ or Ha): This represents the claim you want to
test, suggesting an effect or difference.
2. Choose the Significance Level (α):

○ The significance level (α) is the probability of rejecting the null hypothesis
when it is true. A common value is 0.05 (5%).
3. Select the Test Statistic:

○ Depending on the type of data and sample size, select an appropriate test
statistic (e.g., t-test, chi-square test).
4. Compute the Test Statistic:

○ Using sample data, calculate the test statistic (e.g., t-statistic, chi-square
statistic).
5. Decision Rule:

○ Compare the test statistic to a critical value from statistical tables (based on α)
to make a decision.
○ If the test statistic falls in the rejection region, reject H₀. If it falls in the
non-rejection region, fail to reject H₀.
6. Conclusion:

○ State the conclusion in context, either rejecting or failing to reject the null
hypothesis based on the test results.
Two Types of Errors in Testing Hypothesis:

1. Type I Error (False Positive):

○ Occurs when the null hypothesis is rejected when it is actually true. This
means concluding that there is an effect or difference when there isn't.
○ Example: A medical test incorrectly indicates a person has a disease when they
do not.
2. Type II Error (False Negative):

○ Occurs when the null hypothesis is not rejected when it is actually false. This
means failing to detect an effect or difference that truly exists.
○ Example: A medical test fails to detect a disease when the person actually has
it.

Two-Tailed and One-Tailed Tests of Hypothesis:

1. Two-Tailed Test:

○ Used when the alternative hypothesis suggests that the parameter could be
either greater than or less than a certain value.
○ The rejection region is in both tails of the distribution.
○ Example: Testing if the mean of a population is different from a specific value
(e.g., μ≠50\mu \neq 50).
2. One-Tailed Test:

○ Used when the alternative hypothesis suggests that the parameter is either
greater than or less than a certain value, but not both.
○ The rejection region is in only one tail of the distribution.
○ Example: Testing if the mean is greater than a specific value (e.g., μ>50\mu
> 50).
Types of Statistical Tests:

1. t-Test:
○ A t-test is used to compare the means of two groups, especially when the
sample size is small (typically less than 30) and the population standard deviation is
unknown.

○ Types of t-tests:

■ One-sample t-test: Compares the sample mean to a known value (e.g., a

population mean).
■ Independent two-sample t-test: Compares the means of two independent
groups.
■ Paired t-test: Compares the means of two related groups (e.g., before and
after measurements).
○ Formula for a one-sample t-test:

Where xˉ is the sample mean, μ\mu is the population mean, s is the
sample standard deviation, and n is the sample size.

2. F-Test:

○ The F-test is used to compare the variances of two populations. It is often

used in the analysis of variance (ANOVA).
○ Formula:

F = (Variance of sample 1)/(Variance of sample 2)

○ If the F-value is significantly large, it suggests that the variances are different.
3. Chi-Square Test:

○ The chi-square test is used for categorical data to test the goodness of fit or
the independence of two variables.
○ Goodness of Fit: Tests whether the observed data fits a specific distribution
(e.g., uniform distribution).
○ Test of Independence: Tests if two categorical variables are independent or
related.
○ Formula for chi-square statistic:

Where Oi is the observed frequency and Ei is the expected frequency.

4. Analysis of Variance (ANOVA):

○ ANOVA is used to compare the means of three or more groups to see if

there is a significant difference between them.

○ It tests the null hypothesis that all group means are equal.

○ One-way ANOVA: Compares one independent variable across multiple

groups.

○ Two-way ANOVA: Compares two independent variables, and their interaction

effect on a dependent variable.

○ ANOVA uses the F-test to compare variances between groups (mean squares
between groups) and within groups (mean squares within groups).

○ Formula for F-statistic in ANOVA:

F=(Mean Square Between Groups)/(Mean Square Within Groups)

Where Mean Square Between Groups is the variance between the group means
and Mean Square Within Groups is the variance within each group.
Q1 a. What do you mean by correlation? Mention any four uses of it?

b. Discuss the difference between Parametric and Non-Parametric tests?

Answer:

Correlation represents the relationship between two variables and it's an important
metric when analyzing data sets. Learning more about how to calculate correlation
and how to interpret your results can help you make efficient financial and
marketing decisions.

● Correlation is the relationship between two variables and it can be measured to

inform financial and marketing decisions.
● Correlation can be positive when two variables move in the same direction,
negative when the two variables move in opposite directions or zero when there is
no relationship between two variables.
● Types of correlations include the Pearson correlation for linear relationships,
the Spearman correlation, which determines a monotonic relationship between
variables and the Kendall correlation that measures the strength of dependence
between two datasets.
● Types of Correlation
● Positive Linear Correlation. There is a positive linear correlation when the
variable on the x -axis increases as the variable on the y -axis increases. ...
● Negative Linear Correlation. ...
● Non-linear Correlation (known as curvilinear correlation) ...
● No Correlation.

common uses of correlation:

1. Predictive Modeling: Correlation can be used to build predictive models to

estimate the relationship between two variables. For example, in economics, the
correlation between interest rates and consumer spending can be used to predict
how changes in interest rates will affect consumer spending.
2. Quality Control: Correlation is used in quality control to measure the
relationship between two variables that affect product quality. For example, in
manufacturing, the correlation between temperature and product quality can be
used to control the temperature to ensure that the product meets quality standards.
3. Survey Research: Correlation is used in survey research to measure the
relationship between two variables that are measured using survey questions. For
example, in social science research, the correlation between income and education
level can be used to study the relationship between these two variables.
4. Medical Research: Correlation is used in medical research to study the
relationship between two variables, such as a treatment and a disease outcome. For
example, the correlation between smoking and lung cancer can be used to study the
relationship between smoking and the risk of developing lung cancer.
5. Risk Management: Correlation is used in risk management to estimate the
relationship between two risks. For example, in financial risk management, the
correlation between two assets can be used to estimate the risk of a portfolio that
includes both assets.

b) Definition of Parametric and Nonparametric Test

Parametric Test Definition

In Statistics, a parametric test is a kind of hypothesis test which gives

generalizations for generating records regarding the mean of the primary/original
population. The t-test is carried out based on the students’ t-statistic, which is often
used in that value.

The t-statistic test holds on the underlying hypothesis, which includes the normal
distribution of a variable. In this case, the mean is known, or it is considered to be
known. For finding the sample from the population, population variance is identified.
It is hypothesized that the variables of concern in the population are estimated on
an interval scale.

Non-Parametric Test Definition

The non-parametric test does not require any population distribution, which is meant
by distinct parameters. It is also a kind of hypothesis test, which is not based on the
underlying hypothesis. In the case of the non-parametric test, the test is based on
the differences in the median. So this kind of test is also called a distribution-free
test. The test variables are determined on the nominal or ordinal level. If the
independent variables are non-metric, the non-parametric test is usually performed.

Advantages and Disadvantages of Parametric and Nonparametric Tests

A lot of individuals accept that the choice between using parametric or

nonparametric tests relies upon whether your information is normally distributed.
The distribution can act as a deciding factor in case the data set is relatively small.
Although, in a lot of cases, this issue isn't a critical issue because of the following
reasons:

Parametric tests help in analyzing non normal appropriations for a lot of datasets.

· Nonparametric tests when analyzed have other firm conclusions that are harder to
[Link] appropriate response is usually dependent upon whether the mean or
median is chosen to be a better measure of central tendency for the distribution of
the data.
· A parametric test is considered when you have the mean value as your central
value and the size of your data set is comparatively large. This test helps in making
powerful and effective decisions.

A non-parametric test is considered regardless of the size of the data set if the
median value is better when compared to the mean value.

Properties Parametric Non-parametric

Assumptions Yes No

central tendency Mean value Median value

Value

Correlation Pearson Spearman

Probabilistic Normal Arbitrary

distribution

Population Requires Does not require

knowledge

Used for Interval data Nominal data

Applicability Variables Attributes &

Variables
Examples z-test, t-test, Kruskal-Wallis,
etc. Mann-Whitney

Q.3 Consider the following frequency distribution. Calculate the

mean weight of students.

Weight (in 31- 36 – 41 – 46 – 51 – 56 – 61 – 66 – 71 –

kg) 35 40 45 50 55 60 65 70 75

Number of 9 6 15 3 1 2 2 1 1
Students

Solution:

The given distribution has discontinuous class intervals, so we need to make

them continuous.

Class Number of students Class mark di = xi – fidi

intervals (fi) (xi) a

30.5 – 35.5 9 33 -10 -90

35.5 – 40.5 6 38 -5 -25

40.5 – 45.5 15 43 = a 0 0

45.5 – 50.5 3 48 5 15
50.5 – 55.5 1 53 10 10

55.5 – 60.5 2 58 15 30

60.5 – 65.5 2 63 20 40

65.5 – 70.5 1 68 25 25

70.5 – 75.5 1 73 30 30

Total ∑fi = 40 ∑fidi =

Here, ∑fi = 40 and ∑fidi = 35

By Assumed mean method,

Mean = a + (∑fidi/∑fi)

= 43 + (35/40)

= 43 + 0.875

= 43.875

Therefore, the mean weight of the students is 43.875 kg.

Q.4 Calculate the median marks of students from the

following distribution.

Marks 10 – 20 – 30 – 40 – 50 – 60 – 70 –
20 30 40 50 60 70 80
Number of 7 10 10 20 20 15 8
Students

Solution:

Class interval Number of students (frequency) Cumulative frequency

10 – 20 7 7

20 – 30 10 17

30 – 40 10 27 = cf

40 – 50 20 = f 47

50 – 60 20 67

60 – 70 15 82

70 – 80 8 90

N/2 = 90/2 = 45

Cumulative frequency greater and nearer to 45 is 47, which lies in the

interval 40 – 50

Median class is 40 – 50.

Lower limit of the median class = l = 40

Class size = h = 10
Frequency of the median class = f = 20

Cumulative frequency of the class preceding the median class = cf = 27

As we know,

Median = 40 + [(45 – 27)/20] × 10

= 40 + (18/2)

= 40 + 9

= 49

Hence, the median marks of the students = 49.

Q.4 Question 2: Find the mean, median, mode, and range

for the given data

190, 153, 168, 179, 194, 153, 165, 187, 190, 170, 165, 189, 185,
153, 147, 161, 127, 180

Solution:

For Mean:

190, 153, 168, 179, 194, 153, 165, 187, 190, 170, 165, 189, 185, 153, 147,
161, 127, 180

Number of observations = 18

Mean = (Sum of observations) / (Number of observations)

=
(190+153+168+179+194+153+165+187+190+170+165+189+185+153+
147 +161+127+180) / 18

= 2871/18

= 159.5

Therefore, the mean is 159.5

For Median:

The ascending order of given observations is,

127, 147, 153, 153, 153, 161, 165, 165, 168, 170, 179, 180, 185, 187, 189,
190, 190, 194

Here, n = 18
Median = 1/2 [(n/2) + (n/2 + 1)]th observation
= 1/2 [9 + 10]th observation
= 1/2 (168 + 170)
= 338/2
= 169

Thus, the median is 169

For Mode:

The number with the highest frequency = 153

Thus, mode = 53

For Range:

Range = Highest value – Lowest value

= 194 – 127
= 67

Q.5 On a final exam, a lecturer noted each student's score.

This frequency distribution is the grouping of the scores:

Score Range Frequency

40-50 3

50-60 5

60-70 8

70-80 6
80-90 2

90-100 1

Calculate the mean score of the students.

Solution:

Score Midpoint
Range (x) frequency (f) f·x

3 135
45
40-50

5 275
55
50-60

8 520
65
60-70

6 450
75
70-80

2 170
85
80-90

1 95
95
90-100

∑ (f · x) = 135+275+520+450+170+95

=1645

Total of frequency = ∑ (f)

∑ (f) = 3+5+8+6+2+1

=25

Mean =

= 1645/25

= 65.8

∴ The mean score of the students is 65.8.

Q. What is Null and Alternative Hypothesis?

Null Hypothesis (H₀):

● The null hypothesis represents a statement of no effect, no difference, or no

relationship. It's the default assumption that there's nothing unusual happening in
the data or that the result is due to random chance.
● It often suggests that any observed effect or difference in the data is due to
sampling variability or random chance.
● The goal of hypothesis testing is to either reject or fail to reject the null
hypothesis based on the data.

Example: If you're testing a new drug, the null hypothesis might be that the drug
has no effect on patients (i.e., the mean effect is zero).

Alternative Hypothesis (H₁ or Ha):

● The alternative hypothesis is the opposite of the null hypothesis. It represents

the idea that there is a true effect, difference, or relationship in the data.
● If the null hypothesis is rejected, it suggests that there is sufficient evidence to
support the alternative hypothesis.

Example: In the same drug testing scenario, the alternative hypothesis might be
that the drug does have an effect on patients (i.e., the mean effect is not zero).

Q. What is Statistics?

Statistics is the field of mathematics that deals with collecting, analyzing,

interpreting, presenting, and organizing data. It helps in making decisions or
drawing conclusions based on data. Statistics is widely used in various fields such as
business, economics, social sciences, medicine, and engineering to inform decisions
and understand trends, patterns, or relationships in data.

Key components of statistics include:

● Descriptive Statistics: This involves summarizing and describing the features of
a data set (e.g., mean, median, standard deviation, and range).
● Inferential Statistics: This involves making predictions or inferences about a
population based on a sample of data (e.g., hypothesis testing, confidence
intervals).
● Probability: The study of uncertainty and how likely certain outcomes are.

Q. What is Time Series?

A Time Series is a sequence of data points collected or recorded at successive points

in time, often at uniform intervals (e.g., daily, monthly, annually). Time series data
is important because it captures trends, patterns, and relationships over time.
Analyzing time series data allows for the understanding of historical trends and
forecasting future values.

Examples of time series data:

● Stock market prices recorded every day.

● Monthly unemployment rates.
● Daily temperature measurements.
● Annual sales data for a business.

Q. Find out Mean, Median and Mode from the following data:

Marks: 10-20, 20-30, 30-40, 40-50, 50-60

No. of students: 15 20 45 15
Q. Find the S.D. of the following data:

Age (in years): 4-6 6-8 8-10 10-12 12-14 14-16 16-18

No. of students: 30 90 120 150 80 60 20

Q. Fit a straight line trend by the method of least squares to the following
data:- Year : 2012 2013 2014 2015 2016 2017

Sales of T.V. sets (in'000): 7 10 12 14 17 24

Q. Calculate coefficient of rank correlation from the following data:-

Marks in Account: 48 33 40 9 18 14 67 24 19 65

Marks in Statistics:12 13 29 6 15 4 20 9 5 19
Q. Calculate the two regression equations from the following data:

X: 6 2 10 4 8

Y: 9 11 5 8 7
Q. There are three bags. Bag I contains 3 white and 5 black balls. Bag II has 5
white and 7 black balls while bag III contains 9 white and 6 black balls. One
white ball is drawn from one of the bags. Find the probability that it is drawn from
bag II?
Q. As a result of a certain experiment, the data obtained were:

x: 0 1 2 3 4

Y: 8 32 34 24 5

Fit a binomial distribution to the above data

Introduction to Regression Analysis
No ratings yet
Introduction to Regression Analysis
10 pages
Stat Unit4
No ratings yet
Stat Unit4
17 pages
Correlation vs Regression Explained
No ratings yet
Correlation vs Regression Explained
7 pages
Business Statistics - Complete Revision Guide
No ratings yet
Business Statistics - Complete Revision Guide
18 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
12 pages
11 Probabi̇lty and Stati̇sti̇c
No ratings yet
11 Probabi̇lty and Stati̇sti̇c
11 pages
11 Probabi̇lty and Stati̇sti̇c
No ratings yet
11 Probabi̇lty and Stati̇sti̇c
11 pages
Understanding Central Tendency & Correlation
No ratings yet
Understanding Central Tendency & Correlation
35 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
11 pages
Regression and Correlation Analysis Guide
No ratings yet
Regression and Correlation Analysis Guide
9 pages
Understanding Correlation in Statistics
No ratings yet
Understanding Correlation in Statistics
44 pages
Regression and Correlation Analysis Guide
No ratings yet
Regression and Correlation Analysis Guide
22 pages
Data Analysis Techniques in Research
No ratings yet
Data Analysis Techniques in Research
39 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
23 pages
Correlation and Regression in Statistics
No ratings yet
Correlation and Regression in Statistics
35 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
6 pages
Central Tendency & Dispersion Methods
No ratings yet
Central Tendency & Dispersion Methods
8 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
5 pages
SBD 2 Theory Notes - Bba
No ratings yet
SBD 2 Theory Notes - Bba
49 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
17 pages
MANSCI Topic 4: Correlation and Regression Analysis
No ratings yet
MANSCI Topic 4: Correlation and Regression Analysis
6 pages
Linear Correlation and Regression Guide
No ratings yet
Linear Correlation and Regression Guide
13 pages
Statistics Notes Sem 2
No ratings yet
Statistics Notes Sem 2
20 pages
Correlation and Regression Analysis Explained
No ratings yet
Correlation and Regression Analysis Explained
8 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
8 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
35 pages
Understanding Correlation and Regression Analysis
No ratings yet
Understanding Correlation and Regression Analysis
9 pages
Simple Linear Correlation Explained
No ratings yet
Simple Linear Correlation Explained
15 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
26 pages
Chapter 6 - Correlation and Regression Analysis
No ratings yet
Chapter 6 - Correlation and Regression Analysis
13 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
42 pages
Correlation and Regression Overview
No ratings yet
Correlation and Regression Overview
28 pages
Statistics
No ratings yet
Statistics
39 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
20 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
13 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
33 pages
Correlation and Regression Analysis Explained
No ratings yet
Correlation and Regression Analysis Explained
15 pages
Understanding Regression and Correlation Analysis
No ratings yet
Understanding Regression and Correlation Analysis
7 pages
Untitled Document-1
No ratings yet
Untitled Document-1
7 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
34 pages
Correlation vs. Regression Explained
No ratings yet
Correlation vs. Regression Explained
3 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
4 pages
Chapter7 CorrelationRegression
No ratings yet
Chapter7 CorrelationRegression
7 pages
Understanding Regression Analysis
100% (1)
Understanding Regression Analysis
12 pages
Open Book 48
No ratings yet
Open Book 48
13 pages
Statistical Analysis: T-tests and Correlation
No ratings yet
Statistical Analysis: T-tests and Correlation
85 pages
Minor-4&5 Unit
No ratings yet
Minor-4&5 Unit
10 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
16 pages
Wa0098.
No ratings yet
Wa0098.
11 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
28 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
19 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
56 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
6 pages
Lesson 14 - Statistical Methods
No ratings yet
Lesson 14 - Statistical Methods
5 pages
Stat Lec 10
No ratings yet
Stat Lec 10
36 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
11 pages
Comprehensive Guide to Data Analytics
No ratings yet
Comprehensive Guide to Data Analytics
67 pages
Correlation and Regression Explained
No ratings yet
Correlation and Regression Explained
6 pages
Understanding Multiple Correlation
No ratings yet
Understanding Multiple Correlation
21 pages
Security Analysis Q&A Guide
No ratings yet
Security Analysis Q&A Guide
4 pages
PR ChartsGraphs Gilhooly+
No ratings yet
PR ChartsGraphs Gilhooly+
13 pages
2403A52344 - NLP - Lab - 10 - Colab
No ratings yet
2403A52344 - NLP - Lab - 10 - Colab
4 pages
Geometric Leveling Techniques Explained
No ratings yet
Geometric Leveling Techniques Explained
4 pages
Xi Jee - B Batch Test - 12 (04.01.2026) Paper & Answer Key PDF
No ratings yet
Xi Jee - B Batch Test - 12 (04.01.2026) Paper & Answer Key PDF
13 pages
MEI Structured Mathematics Module Summary Sheets: C1, Introduction To Advanced Mathematics
No ratings yet
MEI Structured Mathematics Module Summary Sheets: C1, Introduction To Advanced Mathematics
11 pages
Understanding Cartesian Coordinates
No ratings yet
Understanding Cartesian Coordinates
4 pages
Symmetric Relation in Plane Lines
No ratings yet
Symmetric Relation in Plane Lines
2 pages
Solving Circle Equations in Conic Sections
No ratings yet
Solving Circle Equations in Conic Sections
43 pages
Statistical Inference for Two Samples
No ratings yet
Statistical Inference for Two Samples
39 pages
Data Science and Data Analytics (En)
100% (2)
Data Science and Data Analytics (En)
733 pages
Unit Conversion Principles Explained
No ratings yet
Unit Conversion Principles Explained
4 pages
(Chapman
No ratings yet
(Chapman
69 pages
Graph Types in Neural Networks Analysis
No ratings yet
Graph Types in Neural Networks Analysis
5 pages
Analyzing U.S. Stock Returns 1926-2020
No ratings yet
Analyzing U.S. Stock Returns 1926-2020
78 pages
Year 4 Place Value in 4-Digit Numbers
No ratings yet
Year 4 Place Value in 4-Digit Numbers
36 pages
JEE Physics Kinematics Calculus Exercises
No ratings yet
JEE Physics Kinematics Calculus Exercises
49 pages
ISA Instrumentation Symbols Guide
No ratings yet
ISA Instrumentation Symbols Guide
13 pages
Class 7 Expressions with Real-Life Examples
75% (4)
Class 7 Expressions with Real-Life Examples
25 pages
Advantages of Pressuremeter Testing
No ratings yet
Advantages of Pressuremeter Testing
8 pages
Understanding Earth's Latitudes and Longitudes
No ratings yet
Understanding Earth's Latitudes and Longitudes
40 pages
Numerical Analysis Exam Questions
No ratings yet
Numerical Analysis Exam Questions
4 pages
Heat Conduction Analysis in Coolers
No ratings yet
Heat Conduction Analysis in Coolers
1 page
Curiosidades Historia 123a
No ratings yet
Curiosidades Historia 123a
263 pages
Physics Multiple Choice Questions Exam
No ratings yet
Physics Multiple Choice Questions Exam
90 pages
Gait Cycle Analysis for Clinical Insights
No ratings yet
Gait Cycle Analysis for Clinical Insights
9 pages
Grade 7 Math Test Marking Guide
No ratings yet
Grade 7 Math Test Marking Guide
3 pages
Understanding Polar Coordinates Basics
No ratings yet
Understanding Polar Coordinates Basics
14 pages
Advanced Control Systems Overview
No ratings yet
Advanced Control Systems Overview
24 pages
Business Mathematics: Capslet
No ratings yet
Business Mathematics: Capslet
7 pages

Averages, Dispersion, and Correlation Analysis

Uploaded by

Averages, Dispersion, and Correlation Analysis

Uploaded by

UNIT 1

Significance of Measuring Variation:

Methods of Studying Variation:

Significance of Studying Correlation: Correlation analysis helps in understanding

Methods of Studying Correlation:

○​ A result of 1 indicates perfect positive correlation, -1 indicates perfect negative

Difference between Correlation and Regression:

●​ Correlation: Measures the strength and direction of a linear relationship

○​ Example: Correlation can tell you if there is a relationship between hours of

●​ Correlation: It only indicates if there is a relationship (positive, negative, or

●​ Correlation: Represented by a single number, usually Pearson’s correlation

●​ Correlation: The correlation between X and Y is the same as the correlation

○​ Example: The regression of weight on height is different from the regression of

○​ It doesn't give a way to predict future values or quantify the relationship in

●​ Correlation measures the strength and direction of a relationship between

Two Regression Equations:

1.​ Regression of Y on X (Y = a + bX): Predicts the dependent variable Y from

Time Series Analysis:

Utility of Time Series Analysis:

●​ Time series analysis is essential for understanding patterns and trends in

Components of Time Series: Time series data typically consists of four

1.​ Moving Averages:

Steps in Forecasting: Forecasting involves predicting future outcomes based on

1.​ Problem Definition:

1.​ Qualitative Methods:

Theories of Business Forecasting:

1.​ Extrapolation Theory:

1. Three Approaches to Probability:

1.​ Classical Approach:​

P(E)=(Number of favorable outcomes)/(Total number of outcomes)

○​ Example: In a fair die, the probability of rolling a 3 is

2.​ Empirical Approach:​

○​ Based on experimental or observed data. It calculates probability by dividing

P(E)=(Number of times event E occurs)/(Total number of trials)

3.​ Axiomatic Approach:​

2. Addition and Multiplication Theorems of Probability:

1.​ Addition Theorem:​

■​ If a and b are not mutually exclusive (can occur together):

■​ If a and b are not independent:

Example: The probability of getting heads in two consecutive coin flips is

P(Heads on 1st∩Heads on 2nd)= 1/2​×1/2​ = 1/4​.

●​ Bayes' Theorem is used to update the probability of an event based on new

1.​ Binomial Distribution:​

○​ Where p is the probability of success, n is the number of trials, and kk is the

Example: The distribution of heights of adult men in a population can often

3. Prevention and Control of Monopolies:

●​ Governments implement antitrust laws and regulations to prevent

4. System of Dual Price:

Statistical Inference involves drawing conclusions about a population based on a

Procedure of Testing Hypothesis:

The process of hypothesis testing follows these steps:

1.​ State the Hypotheses:​

○​ Null Hypothesis (H₀): This represents the default assumption or no effect.

1.​ Type I Error (False Positive):​

Two-Tailed and One-Tailed Tests of Hypothesis:

1.​ Two-Tailed Test:​

■​ One-sample t-test: Compares the sample mean to a known value (e.g., a

○​ The F-test is used to compare the variances of two populations. It is often

F = (Variance of sample 1)/(Variance of sample 2)

Where Oi is the observed frequency and Ei is the expected frequency.

4.​ Analysis of Variance (ANOVA):​

○​ ANOVA is used to compare the means of three or more groups to see if

○​ One-way ANOVA: Compares one independent variable across multiple

○​ Two-way ANOVA: Compares two independent variables, and their interaction

○​ Formula for F-statistic in ANOVA:

F=(Mean Square Between Groups)/(Mean Square Within Groups)

b. Discuss the difference between Parametric and Non-Parametric tests?

●​ Correlation is the relationship between two variables and it can be measured to

common uses of correlation:

1.​ Predictive Modeling: Correlation can be used to build predictive models to

b) Definition of Parametric and Nonparametric Test

Parametric Test Definition

In Statistics, a parametric test is a kind of hypothesis test which gives

Non-Parametric Test Definition

Advantages and Disadvantages of Parametric and Nonparametric Tests

A lot of individuals accept that the choice between using parametric or

○ A result of 1 indicates perfect positive correlation, -1 indicates perfect negative

● Correlation: Measures the strength and direction of a linear relationship

○ Example: Correlation can tell you if there is a relationship between hours of

● Correlation: It only indicates if there is a relationship (positive, negative, or

● Correlation: Represented by a single number, usually Pearson’s correlation

● Correlation: The correlation between X and Y is the same as the correlation

○ Example: The regression of weight on height is different from the regression of

○ It doesn't give a way to predict future values or quantify the relationship in

● Correlation measures the strength and direction of a relationship between

1. Regression of Y on X (Y = a + bX): Predicts the dependent variable Y from

● Time series analysis is essential for understanding patterns and trends in

1. Moving Averages:

1. Problem Definition:

1. Qualitative Methods:

1. Extrapolation Theory:

1. Classical Approach:

○ Example: In a fair die, the probability of rolling a 3 is

2. Empirical Approach:

○ Based on experimental or observed data. It calculates probability by dividing

3. Axiomatic Approach:

1. Addition Theorem:

■ If a and b are not mutually exclusive (can occur together):

■ If a and b are not independent:

P(Heads on 1st∩Heads on 2nd)= 1/2×1/2 = 1/4.

● Bayes' Theorem is used to update the probability of an event based on new

1. Binomial Distribution:

○ Where p is the probability of success, n is the number of trials, and kk is the

● Governments implement antitrust laws and regulations to prevent

1. State the Hypotheses:

○ Null Hypothesis (H₀): This represents the default assumption or no effect.

1. Type I Error (False Positive):

1. Two-Tailed Test:

■ One-sample t-test: Compares the sample mean to a known value (e.g., a

○ The F-test is used to compare the variances of two populations. It is often

4. Analysis of Variance (ANOVA):

○ ANOVA is used to compare the means of three or more groups to see if

○ One-way ANOVA: Compares one independent variable across multiple

○ Two-way ANOVA: Compares two independent variables, and their interaction

○ Formula for F-statistic in ANOVA:

● Correlation is the relationship between two variables and it can be measured to

1. Predictive Modeling: Correlation can be used to build predictive models to

Range = Highest value – Lowest value

● The null hypothesis represents a statement of no effect, no difference, or no

● The alternative hypothesis is the opposite of the null hypothesis. It represents

● Stock market prices recorded every day.