0% found this document useful (0 votes)
4 views8 pages

Linear Regression Analysis in R

Uploaded by

Sheela Mense
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views8 pages

Linear Regression Analysis in R

Uploaded by

Sheela Mense
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit 5:

Regression analysis is a very widely used statistical tool to establish a relationship


model between two variables. One of these variable is called predictor variable whose
value is gathered through experiments. The other variable is called response variable
whose value is derived from the predictor variable.

In Linear Regression these two variables are related through an equation, where
exponent (power) of both these variables is 1.

Mathematically a linear relationship represents a straight line when plotted as a graph.


A non-linear relationship where the exponent of any variable is not equal to 1 creates a
curve.

The general mathematical equation for a linear regression is −

y = ax + b

Following is the description of the parameters used −

 y is the response variable.


 x is the predictor variable.
 a and b are constants which are called the coefficients.

Steps to Establish a Regression

A simple example of regression is predicting weight of a person when his height is


known. To do this we need to have the relationship between height and weight of a
person.

The steps to create the relationship is −

 Carry out the experiment of gathering a sample of observed values of height and
corresponding weight.
 Create a relationship model using the lm() functions in R.
 Find the coefficients from the model created and create the mathematical
equation using these
 Get a summary of the relationship model to know the average error in
prediction. Also called residuals.
 To predict the weight of new persons, use the predict() function in R.

lm() Function

This function creates the relationship model between the predictor and the response
variable.
Syntax: lm(formula,data)

Following is the description of the parameters used −

 formula is a symbol presenting the relation between x and y.


 data is the vector on which the formula will be applied.

Create Relationship Model & get the Coefficients

x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Apply the lm() function.


relation <- lm(y~x)

print(relation)

When we execute the above code, it produces the following result −

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept) x
-38.4551 0.6746

predict() Function

Syntax: predict(object, newdata)

 object is the formula which is already created using the lm() function.
 newdata is the vector containing the new value for predictor variable.

Predict the weight of new persons

# The predictor vector.


x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)

# The resposne vector.


y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Apply the lm() function.


relation <- lm(y~x)

# Find weight of a person with height 170.


a <- [Link](x = 170)
result <- predict(relation,a)
print(result)
Output
[1] 76.22869

Visualize the Regression Graphically

# Create the predictor and response variable.


x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
relation <- lm(y~x)

# Give the chart file a name.


png(file = "[Link]")

# Plot the chart.


plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm")

# Save the file.


[Link]()

When we execute the above code, it produces the following result −


Assumptions of Linear Regression
 Linearity: The linearity assumption assumes that the relationship between the dependent
variable (Y) and independent variable(s) (X) is linear. In other words, changes in X should
result in constant, proportional changes in Y. To check this assumption, you can create
scatterplots and evaluate the linearity of the data.
 Independence: The independence of residuals assumes that the errors (residuals) are not
correlated with each other. This means that the error in one observation should be independent
of the error in another. You can test this assumption using the Durbin-Watson test or by
examining a plot of residuals against the order of observations.
 Homoscedasticity: Homoscedasticity means that the variance of residuals is constant across all
levels of the independent variable. In other words, the spread of residuals should be roughly the
same for all values of X. You can check this assumption by creating a plot of residuals against
the fitted values.
 Normality of Residuals: The normality of the residuals assumption assumes that the residuals
follow a normal distribution. A Q-Q plot and normality tests like the Shapiro-Wilk test can help
assess this assumption.
Plotting Regions:

Regions are areas of the graph that can be resized, relocated and formatted with color, fill
patterns, edge line types and more.

For any single plot created using base R graphics, there are 3 regions that make up the image.

1. Plot region: This is where the actual plot appears and the wher you will usually be
drawing the points, lines, text and so on. The Plot region uses the user coordinate system,
which reflects the value and scale of the horizontal and vertical aes.
2. Figure region: is the area that contains the space for your axes, their labels and any titles.
These spaces are also referred to as the figure margins.
3. Outer region: also referred to as the outer margin, is additional space around the figure
region that is not included by default but can be specified if it’s needed.

Ex:
# Draw a rectangle from (1,2) to (4,6)
plot (1:10, type= “n”) # Create an empty
plotrect(1,2,4,6, col =”lightblue”, border= “blue”)
Plotting Margins: Margin is the space between the chart border and the canvas border. You can
set the chart margins on any one of the charts’s 4 sides. There are 2 margin areas in base
R plots: margin and oma. You can control their size calling the par() function before your plot
and giving the corresponding arguments:

 mar() for margin.


 oma() for outer margin area.

For both arguments, you must give four values giving the desired space in the bottom, left, top
and right part of the chart respectively.

Example

par(mar=c(4,0,0,0)) draws a margin of size 4 only on the bottom of the chart.

use mai() and omi() if you want to set the areas in inches and not in lines.

Oma(Outer Margin): The margin in margin lines with oma. The outer margins are specially
useful for adding text to a combination of plots.

Par(oma = c(2,1,2,3))

You might also like