0% found this document useful (0 votes)
7 views37 pages

Understanding Regression Analysis Techniques

Uploaded by

brsamrat65
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views37 pages

Understanding Regression Analysis Techniques

Uploaded by

brsamrat65
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Regression

• Regression is a supervised learning technique in data mining which


helps predict numerical values in a given data set, such as predicting
temperature, cost or such values.
• Hence, regression techniques in data mining are widely popular in
business settings, most popularly in marketing, trend analysis and
varied kinds of financial forecasting.
• A regression is a statistical technique that relates a dependent variable
to one or more independent (explanatory) variables.
• A regression model is able to show whether changes observed in the
dependent variable are associated with changes in one or more of the
explanatory variables.

Ranjit Kaur Walia, Asst Prof., SCA, LPU


For the regression analysis is be a successful method,
we understand the following terms:

• Dependent Variable: This is the variable that we are


trying to understand or forecast.

• Independent Variable: These are factors that


influence the analysis or target variable and provide
us with information regarding the relationship of the
variables with the target variable.

Ranjit Kaur Walia, Asst Prof., SCA, LPU


Examples:-
Ranjit Kaur Walia, Asst Prof., SCA, LPU
Regression Analysis

-Height is increasing with the increase of age


-Expensive products are getting purchased with higher income
Ranjit Kaur Walia, Asst Prof., SCA, LPU
This statistical method is used across different
industries such as,
• Financial Industry- Understand the trend in the
stock prices, forecast the prices, and evaluate risks
in the insurance domain
• Marketing- Understand the effectiveness of market
campaigns, and forecast pricing and sales of the
product.
• Manufacturing- Evaluate the relationship of
variables that determine to define a better engine
to provide better performance
• Medicine- Forecast the different combinations of
medicines to prepare generic medicines for
diseases.

Ranjit Kaur Walia, Asst Prof., SCA, LPU


Ranjit Kaur Walia, Asst Prof., SCA, LPU
Ranjit Kaur Walia, Asst Prof., SCA, LPU
Regression:

• Objective: Regression is used when the target variable


is continuous or numeric. The goal is to predict a
numeric value based on input features and patterns in
the data.
• Examples: Predicting house prices, estimating the age
of a person based on other attributes, forecasting stock
prices, and predicting the temperature based on
historical data.

Ranjit Kaur Walia, Asst Prof., SCA, LPU


Ranjit Kaur Walia, Asst Prof., SCA, LPU
Ranjit Kaur Walia, Asst Prof., SCA, LPU
Let’s break down the output of summary(model)

This part shows the formula used for the linear regression model, where Price is the dependent variable
and Area and Bedrooms are the independent variables.
Residuals:
Min 1Q Median 3Q Max
-15300.6 -9400.3 -720.2 6100.7 17500.2

Residuals are the differences between the observed values and the predicted values from the
model. They give an idea of how well the model fits the data:
•Min: The largest negative residual (underestimation by the model).
•1Q (First Quartile): The 25th percentile of the residuals.
•Median: The median of the residuals.
•3Q (Third Quartile): The 75th percentile of the residuals.
•Max: The largest positive residual (overestimation by the model).
In this example, the model underestimates some prices by as much as $15,300 and
overestimates others by up to $17,500.
The coefficients table gives information on how each predictor affects the dependent
variable:
•(Intercept): The estimated price when both Area and Bedrooms are zero, which is
$11,852.34. This doesn’t have much real-life meaning but is required mathematically to build
the regression equation.
•Area: The coefficient for Area is 108.94, meaning that for every additional square foot,
the house price increases by about $108.94, keeping the number of bedrooms
constant.
•Bedrooms: The coefficient for Bedrooms is 15,930.76, meaning that adding one
bedroom increases the house price by around $15,930.76, keeping the area constant.
• Standard Error:
• Std. Error represents the accuracy of the estimated coefficients. Smaller values
indicate more precise estimates.
• t value:
• t value is the test statistic used to determine whether each coefficient is
significantly different from zero.
• It is calculated as t=Estimate/StdError.
Pr(>|t|):
•Pr(>|t|) is the p-value, which tells us whether the predictor variables are statistically
significant. A low p-value (typically < 0.05) suggests the variable is significant.
•For Area, p=0.00, meaning the area is highly significant in predicting house
prices.
•For Bedrooms, p=0.048, meaning bedrooms are also significant but less strongly
than the area.
The significance codes show the levels of significance:
***: Highly significant (p < 0.001)
*: Significant (p < 0.05)
.: Marginally significant (p < 0.1)
Blank: Not significant
• The Residual Standard Error is a measure of the typical size of the
residuals. In this case, the average error between the predicted and
actual house prices is approximately $11,450.

•R-squared (0.943): This indicates that 94.3% of the variability in house prices is explained by the
model (i.e., by the variables Area and Bedrooms).
•Adjusted R-squared (0.921): This is a modified version of R-squared that adjusts for the number
of predictors. It’s usually slightly lower and penalizes for adding unnecessary variables. Here, it still
shows a very good fit (92.1%).
•The F-statistic tests the overall significance of the regression model.
•DF (Degrees of Freedom): 2 predictors and 4 residual degrees of freedom.
•p-value (0.00231): This indicates that the overall model is statistically significant at predicting
house prices, as the p-value is well below 0.05.

Conclusion:
From this summary, we can conclude:
•Area and Bedrooms significantly contribute to predicting house prices.
•The model fits the data well, as indicated by the high R-squared value (0.943).
•The model is statistically significant overall, with a p-value of 0.00231.
NOTE:-

Interpretation of Residuals
Residuals are the differences between the observed values of the dependent variable (in this case,
Price) and the predicted values from the regression model.
•Positive residuals: Indicate that the model underestimated the actual value. For example:
•For data point 2, the residual is 38775.3, meaning the predicted price is $38,775.3 lower than
the actual price.
•For data point 1, the residual is 10521.9, indicating the model underestimated the price by
$10,521.9.
•Negative residuals: Indicate that the model overestimated the actual value. For
example:
For data point 6, the residual is -44538.0, meaning the predicted price is $44,538
higher than the actual price.
For data point 3, the residual is -13597.9, indicating the model overestimated the
price by $13,597.9.
•Close-to-zero residuals: Suggest the model's predictions are accurate for that data
point. For example:
For data point 4, the residual is -278.8, meaning the predicted price is quite close to
the actual price.

•Large residuals (like 38,775.3 or -44,538.0) suggest that the model is not fitting
those particular data points well, leading to significant over- or under-predictions.
•Smaller residuals (like -278.8) suggest that the model is making fairly accurate
predictions for those data points.

You might also like