LOGISTIC REGRESSION
PROBLEM:
Write a R program to perform the logistic regression of mtcars dataset to find am using cyl, hp
and wt columns.
AIM:
To perform a simple logic regression
CODE & OUTPUT:
model=mtcars[c("am","cyl","hp","wt")]
am_data=glm(formula = am ~ cyl + hp + wt,data = model, family=binomial)
summary(am_data)
Call:
glm(formula = am ~ cyl + hp + wt, family = binomial, data = model)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 19.70288 8.11637 2.428 0.0152 *
cyl 0.48760 1.07162 0.455 0.6491
hp 0.03259 0.01886 1.728 0.0840 .
wt -9.14947 4.15332 -2.203 0.0276 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 43.2297 on 31 degrees of freedom
Residual deviance: 9.8415 on 28 degrees of freedom
AIC: 17.841
Number of Fisher Scoring iterations: 8
RESULT:
The logistic regression model is visualized and the result is obtained. AIC=17.841
Min 1Q Median 3Q Max
-2.17272 -0.14907 -0.01464 0.14116 1.27641
POLYNOMIAL REGRESSION
PROBLEM:
Fit a polynomial regression model for the data given below.
x 1 2 3 4 5 6 7 8 9 10
y 6 8 12 14 18 20 22 24 26 28
AIM:
To fit a linear regression model for the given data
CODE & OUTPUT:
x=c(1,2,3,4,5,6,7,8,9,10)
y=c(6,8,12,14,18,20,22,24,26,28)
xsq=x^2
xcub=x^3
plot(x,y)
fit1=lm(y~x)
abline(fit1, col='red')
fit2=lm(y~x+xsq)
xv=seq(min(x),max(x),0.01)
yv=predict(fit2, list(x=xv,xsq=xv^2))
lines(xv,yv,col='blue')
fit3=lm(y~x+xsq+xcub)
xx=seq(min(x),max(x),0.01)
yy=predict(fit3, list(x=xx, xsq=xx^2, xcub=xx^3))
lines(xx,yy,col='green')
summary(fit1)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-1.10303 -0.58788 -0.04242 0.45758 1.44242
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.13333 0.60168 6.87 0.000128 ***
x 2.48485 0.09697 25.62 5.77e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8808 on 8 degrees of freedom
Multiple R-squared: 0.988, Adjusted R-squared: 0.9865
F-statistic: 656.6 on 1 and 8 DF, p-value: 5.767e-09
summary(fit2)
Call:
lm(formula = y ~ x + xsq)
Residuals:
Min 1Q Median 3Q Max
-0.73939 -0.17879 0.01818 0.23030 0.71515
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.13333 0.60341 3.535 0.00953 **
x 3.48485 0.25201 13.828 2.44e-06 ***
xsq -0.09091 0.02233 -4.072 0.00474 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.513 on 7 degrees of freedom
Multiple R-squared: 0.9964, Adjusted R-squared: 0.9954
F-statistic: 976 on 2 and 7 DF, p-value: 2.728e-09
summary(fit3)
Call:
lm(formula = y ~ x + xsq + xcub)
Residuals:
Min 1Q Median 3Q Max
-0.70676 -0.24814 0.02867 0.28566 0.74312
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.466667 1.061775 2.323 0.05919 .
x 3.189200 0.795851 4.007 0.00706 **
xsq -0.026807 0.164157 -0.163 0.87565
xcub -0.003885 0.009844 -0.395 0.70673
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5471 on 6 degrees of freedom
Multiple R-squared: 0.9965, Adjusted R-squared: 0.9948
F-statistic: 572.2 on 3 and 6 DF, p-value: 9.23e-08
RESULT:
The polynomial regression plots and summeries were obtained. The cubic polynomial regression
was found to have the highest adjusted R-square value of 0.9948
LOGISTIC REGRESSION WITH PREDICTION
PROBLEM:
R has many build-in dataset. The data mtcars is one of them. The following R code read-in data
and save the data to input. Write a few lines of R code to conduct a regression analysis with am
as the response variable, cyl, hp and wt as explanation variables
AIM:
To show a simple logistic regression with prediction
CODE & OUTPUT:
#Loading package
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#Summary of dataset in package
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
#Loading package
library(caTools)
## Warning: package 'caTools' was built under R version 4.3.3
#Splitting dataset
split= [Link](mtcars, SplitRatio = 0.8)
train_reg=subset(mtcars, split=="TRUE")
test_reg=subset(mtcars, split =="FALSE")
# Training model
logistic_model=glm(am~wt, data=train_reg, family="binomial")
logistic_model
##
## Call: glm(formula = am ~ wt, family = "binomial", data = train_reg)
##
## Coefficients:
## (Intercept) wt
## 10.761 -3.659
##
## Degrees of Freedom: 22 Total (i.e. Null); 21 Residual
## Null Deviance: 30.79
## Residual Deviance: 15.9 AIC: 19.9
summary(logistic_model)
##
## Call:
## glm(formula = am ~ wt, family = "binomial", data = train_reg)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 10.761 4.594 2.342 0.0192 *
## wt -3.659 1.491 -2.454 0.0141 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 30.789 on 22 degrees of freedom
## Residual deviance: 15.904 on 21 degrees of freedom
## AIC: 19.904
##
## Number of Fisher Scoring iterations: 6
#Predict test data based on model
predict_reg=predict(logistic_model,test_reg, type = "response")
predict_reg
## Hornet Sportabout Valiant Duster 360 Lincoln Continental
## 0.1387011443 0.1301875975 0.0909742221 0.0001132627
## Chrysler Imperial Fiat 128 Porsche 914-2 Lotus Europa
## 0.0001512203 0.9376790201 0.9493410856 0.9946478004
## Ford Pantera L
## 0.3019193256
# Evaluating model accuracy
# using confusion matrix
table(test_reg$am, predict_reg>0.5)
##
## FALSE TRUE
## 0 5 0
## 1 1 3
Accuracy=(5+3)/(5+3+0+1)
Accuracy
## [1] 0.8888889
RESULT:
The model build to predict the variable ‘am’ using the variable ‘wt’ has the accuracy of 88.8%
and am=10.761-3.659wt