REGRESSION ANALYSIS
USING SPSS
Dr Somesh K Sinha
REGRESSION
Regression analysis is a set of statistical
processes for estimating the relationships
(cause and effect) among variables.
What is variable ?
An element, feature, or factor that is liable to
vary or change.
Correlation and Regression
Correlation is described as the analysis which
lets us know the association or the absence of
the relationship between two variables ‘x’ and
‘y’.
Regression analysis, predicts the value of the
dependent variable based on the known value
of the independent variable, assuming that
average mathematical relationship between
two or more variables.
CORRELATION AND REGRESSION
BASIS FOR CORRELATION REGRESSION
COMPARISON
Meaning Correlation is a statistical Regression describes how an
measure which determines independent variable is
co-relationship or numerically related to the
association of two dependent variable.
variables.
Usage To represent linear To fit a best line and estimate
relationship between two one variable on the basis of
variables. another variable.
Dependent and No difference Both variables are different
Independent
variables
Indicates Correlation coefficient Regression indicates the impact
indicates the extent to of a unit change in the known
which two variables move variable (x) on the estimated
together. variable (y).
Type of Variable
A. Dependant
B. Independent
Number of Percentage
Classes Attended
Objective
A. Prediction
B. Explanation – Magnitude , Sign, Statistical
significance
Research Design
A. Sample Size : 1: 5
B. Variable : Metric ( Scale, Ordinal)
Assumption in Regression
Linearity of Phenomenon measured
Constant Variance of the error term
Independence of the error term
Normality of error term distribution
Estimation of Model Fit
• R Square
• ANOVA & F Ratio
• Regression Coefficient
• t – Value
• P-Value
Regression Model
Bivariate Regression:
Y = a + bX + e
Multiple Regression:
Y = a + b1X1 + b2X2 + b3X3 + ... + btXt + e
Where:
• Y = the variable that you are trying to predict (dependent variable)
• X = the variable that you are using to predict Y (independent variable)
• a = the intercept
• b = the slope
• e = the regression residual
Slope
• The slope indicates the steepness of a line.
• The slope of a regression line (b) represents the rate
of change in y as x changes. Because y is dependent
on x, the slope describes the predicted values
of y given x.
• The greater the magnitude of the slope, the steeper
the line and the greater the rate of change.
The slope is positive 5. When x increases by 1, y increases by 5. The y-intercept is 2.
The slope is negative 0.4. When x increases by 1, y decreases by 0.4. The y-intercept is 7.2.
The slope is 0. When x increases by 1, y neither increases or decreases. The y-intercept
is -4.
Intercept
• The intercept (often labeled the constant) is the
expected mean value of Y when all X=0.
• The intercept indicates the location where it intersects
an axis.
• The slope and the intercept define the linear
relationship between two variables, and can be used
to estimate an average rate of change.
Beta Coefficient –
The Standardized regression coefficient . This is the B value of
standardized score of the independent variables. This value
will always vary between + - 1.0 in the linear relationship.
R Square –
Coefficient of determination is the proportion of the variance
in the dependent variable that is predictable from the
independent variable. R-squared is always between 0 and
100%:
• t- value - is the ratio of the departure of the estimated
value of a parameter from its hypothesized value to
its standard error. The value above 1.96 is acceptable.
• P value - or probability value or asymptotic
significance is the probability for a given statistical
model . A low p-value (< 0.05) indicates that you can
reject the null hypothesis.