0% found this document useful (0 votes)

2 views8 pages

PCA and Factor Analysis in R

Uploaded by

mirapalanani4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views8 pages

PCA and Factor Analysis in R

Uploaded by

mirapalanani4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Practical 10

Principal component analysis using R

data("iris")
> str(iris)
'[Link]': 150 obs. of 5 variables:
$ [Link]: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ [Link] : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ [Link]: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ [Link] : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1
1 1 1 1 1 1 ...
> #partition of data
> [Link](111)
> ind <- sample(2, nrow(iris),
+ replace = TRUE,
+ prob = c(0.8, 0.2))
> training <- iris[ind==1,];
> testing <- iris[ind==2,];
> #Scatter Plot & Correlations
> [Link]("psych")
> library(psych)
> [Link](training[,-5],
+ gap = 0,
+ bg = c("red", "yellow", "blue")[training$Species],
+ pch=21)
> #Principal Component Analysis
> pc <- prcomp(training[,-5],
+ center = TRUE,
+ scale. = TRUE)
> attributes(pc)
$names
[1] "sdev" "rotation" "center" "scale"
[5] "x"

$class
[1] "prcomp"

> print(pc)
Standard deviations (1, .., p=4):
[1] 1.7173318 0.9403519 0.3843232 0.1371332

Rotation (n x k) = (4 x 4):
PC1 PC2 PC3
[Link] 0.5147163 -0.39817685 0.7242679
[Link] -0.2926048 -0.91328503 -0.2557463
[Link] 0.5772530 -0.02932037 -0.1755427
[Link] 0.5623421 -0.08065952 -0.6158040
PC4
[Link] 0.2279438
[Link] -0.1220110
[Link] -0.7969342
[Link] 0.5459403
Practical 11
Factor analysis using R

# Load the dataset

> data(iris)
> # View the first few rows of the dataset
> head(iris)
[Link] [Link] [Link]
1 5.1 3.5 1.4
2 4.9 3.0 1.4
3 4.7 3.2 1.3
4 4.6 3.1 1.5
5 5.0 3.6 1.4
6 5.4 3.9 1.7
[Link] Species
1 0.2 setosa
2 0.2 setosa
3 0.2 setosa
4 0.2 setosa
5 0.2 setosa
6 0.4 setosa
> # Scale the data
> iris_scaled <- scale(iris[,1:4])
> # Perform factor analysis
> library(psych)
> fa <- fa(r = iris_scaled,
+ nfactors = 4,
+ rotate = "varimax")
> summary(fa)

Factor analysis with Call: fa(r = iris_scaled, nfactors = 4, rotate

= "varimax")

Test of the hypothesis that 4 factors are sufficient.

The degrees of freedom for the model is -4 and the objective
function was 0
The number of observations was 150 with Chi Square = 0
with prob < NA

The root mean square of the residuals (RMSA) is 0

The df corrected root mean square of the residuals is NA

Tucker Lewis Index of factoring reliability = 1.009

> # View the factor loadings
> fa$loadings

Loadings:
MR1 MR2 MR3 MR4
[Link] 0.997
[Link] -0.108 0.757
[Link] 0.861 -0.413 0.288
[Link] 0.801 -0.317 0.492

MR1 MR2 MR3 MR4

SS loadings 2.389 0.844 0.332 0.000
Proportion Var 0.597 0.211 0.083 0.000
Cumulative Var 0.597 0.808 0.891 0.891
>
LINEAR DISCREMINANT ANALYSIS
library(MASS)

> library(tidyverse)

> library(caret)

> theme_set(theme_classic())

> # Load the data

> data("iris")

> # Split the data into training (80%) and test set (20%)

> [Link](123)

> [Link] <- iris$Species %>%

+ createDataPartition(p = 0.8, list = FALSE)

> [Link] <- iris[[Link], ]

> [Link] <- iris[-[Link], ]

> # Estimate preprocessing parameters

> [Link] <- [Link] %>%

+ preProcess(method = c("center", "scale"))

> # Transform the data using the estimated parameters

> [Link] <- [Link] %>% predict([Link])

> # Fit the model

> model <- lda(Species~., data = [Link])

> # Make predictions

> predictions <- model %>% predict([Link])

> # Model accuracy

> mean(predictions$class==[Link]$Species)

[1] 0.9666667

> model <- lda(Species~., data = [Link])

> model
Call:

lda(Species ~ ., data = [Link])

Prior probabilities of groups:

setosa versicolor virginica

0.3333333 0.3333333 0.3333333

Group means:

[Link] [Link] [Link]

setosa -1.0112835 0.78048647 -1.2900001

versicolor 0.1014181 -0.68674658 0.2566029

virginica 0.9098654 -0.09373989 1.0333972

[Link]

setosa -1.2453195

versicolor 0.1472614

virginica 1.0980581

Coefficients of linear discriminants:

LD1 LD2

[Link] 0.6794973 0.04463786

[Link] 0.6565085 -1.00330120

[Link] -3.8365047 1.44176147

[Link] -2.2722313 -1.96516251

Proportion of trace:

LD1 LD2

0.9902 0.0098

> cluster analysis

library(factoextra)
> # Loading dataset
> df <- mtcars
> # Omitting any NA values
> df <- [Link](df)
> # Scaling dataset
> df <- scale(df)
> # output to be present as PNG file
> png(file = "[Link]")
> km <- kmeans(df, centers = 4, nstart = 25)
> # Visualize the clusters
> fviz_cluster(km, data = df)
> # saving the file
> [Link]()
null device
1
> # output to be present as PNG file
> png(file = "[Link]")
> km <- kmeans(df, centers = 5, nstart = 25)
> # Visualize the clusters
> fviz_cluster(km, data = df)
> # saving the file
> [Link]()
null device
1
>

# Installing the package

[Link]("dplyr")
# Loading package
> library(dplyr)
> # Summary of dataset in package
> summary(mtcars)
mpg cyl disp
Min. :10.40 Min. :4.000 Min. : 71.1
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8
Median :19.20 Median :6.000 Median :196.3
Mean :20.09 Mean :6.188 Mean :230.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0
Max. :33.90 Max. :8.000 Max. :472.0
hp drat wt
Min. : 52.0 Min. :2.760 Min. :1.513
1st Qu.: 96.5 1st Qu.:3.080 1st Qu.:2.581
Median :123.0 Median :3.695 Median :3.325
Mean :146.7 Mean :3.597 Mean :3.217
3rd Qu.:180.0 3rd Qu.:3.920 3rd Qu.:3.610
Max. :335.0 Max. :4.930 Max. :5.424
qsec vs
Min. :14.50 Min. :0.0000
1st Qu.:16.89 1st Qu.:0.0000
Median :17.71 Median :0.0000
Mean :17.85 Mean :0.4375
3rd Qu.:18.90 3rd Qu.:1.0000
Max. :22.90 Max. :1.0000
am gear carb
Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
Median :0.0000 Median :4.000 Median :2.000
Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :1.0000 Max. :5.000 Max. :8.000
> Installing the package
> # For Logistic regression
> [Link]("caTools")
Error in [Link] : Updating loaded packages
> [Link]("caTools")
> # For ROC curve to evaluate model
> [Link]("ROCR")
Error in [Link] : Updating loaded packages
> [Link]("ROCR")

> # Loading package

> library(caTools)
> library(ROCR)
> # Splitting dataset
> split <- [Link](mtcars, SplitRatio = 0.8)
> split
[1] TRUE TRUE TRUE TRUE FALSE TRUE FALSE
[8] TRUE TRUE TRUE FALSE
> train_reg <- subset(mtcars, split == "TRUE")
> test_reg <- subset(mtcars, split == "FALSE")
> # Training model
> logistic_model <- glm(vs ~ wt + disp,
+ data = train_reg,
+ family = "binomial")
> logistic_model

Call: glm(formula = vs ~ wt + disp, family = "binomial", data =

train_reg)

Coefficients:
(Intercept) wt disp
3.65725 0.50755 -0.02638

Degrees of Freedom: 23 Total (i.e. Null); 21 Residual

Null Deviance: 33.27
Residual Deviance: 17.64 AIC: 23.64
> # Summary
> summary(logistic_model)

Call:
glm(formula = vs ~ wt + disp, family = "binomial", data =
train_reg)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.65725 3.20933 1.140 0.254
wt 0.50755 1.74052 0.292 0.771
disp -0.02638 0.01611 -1.638 0.102

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 33.271 on 23 degrees of freedom

Residual deviance: 17.638 on 21 degrees of freedom
AIC: 23.638

Number of Fisher Scoring iterations: 6

> predict_reg <- predict(logistic_model,

+ test_reg, type = "response")
> predict_reg
Hornet Sportabout Duster 360
0.01642104 0.01752145
Merc 280C Lincoln Continental
0.72757938 0.00325796
Fiat 128 Dodge Challenger
0.93690626 0.05001231
Porsche 914-2 Ford Pantera L
0.82781339 0.01812313

>
# Create the data frame
> data <- [Link](
+ Years_Exp = c(1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7),
+ Salary = c(39343.00, 46205.00, 37731.00, 43525.00,
+ 39891.00, 56642.00, 60150.00, 54445.00, 64445.00,
57189.00)
+)
> # Create the scatter plot
> plot(data$Years_Exp, data$Salary,
+ xlab = "Years Experienced",
+ ylab = "Salary",
+ main = "Scatter Plot of Years Experienced vs Salary")
> [Link]('caTools')
Error in [Link] : Updating loaded packages
> [Link]("caTools")
WARNING: Rtools is required to build R packages but is not
currently installed. Please download and install the appropriate
version of Rtools before proceeding:

[Link]
Warning in [Link] :
package ‘caTools’ is in use and will not be installed
> library(caTools)
> split = [Link](data$Salary, SplitRatio = 0.7)
> trainingset = subset(data, split == TRUE)
> testset = subset(data, split == FALSE)
> # Fitting Simple Linear Regression to the Training set
> lm.r= lm(formula = Salary ~ Years_Exp,
+ data = trainingset)
> #Summary of the model
> summary(lm.r)

Call:
lm(formula = Salary ~ Years_Exp, data = trainingset)

Residuals:
1 2 3 5 6 7
479.5 5713.8 -4387.8 -7924.6 3129.5 5823.7
10
-2834.1

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29911 5770 5.184 0.00351
Years_Exp 8138 2382 3.417 0.01890

(Intercept) **
Years_Exp *
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5774 on 5 degrees of freedom

Multiple R-squared: 0.7002, Adjusted R-squared:
0.6402
F-statistic: 11.68 on 1 and 5 DF, p-value: 0.0189

> # Create a data frame with new input values

> new_data <- [Link](Years_Exp = c(4.0, 4.5, 5.0))
> # Predict using the linear regression model
> predicted_salaries <- predict(lm.r, newdata = new_data)
> # Display the predicted salaries
> print(predicted_salaries)
1 2 3
62464.62 66533.78 70602.94
>

> # given below is the data pertaining to income, years of service

and tax paid by
#by 10 individuals working in multinomial company

# given below is the data pertaining to income ,years of service and

tax paid by
#by 10 individuals working in multinomial company

Taxdata<-[Link](
taxpaid=c(7,10,8,6,5,5,9,6,9,7),
Income=c(2.5,3,2,2.2,1.8,1.5,1.5,1.9,2.3,1.6),
years=c(9,13,10,10,8,8,12,10,9,5))
# fitting of multiple regression
mulreg1<-lm(Taxdata$taxpaid~Taxdata$Income+Taxdata$years)
mulreg1
summary(mulreg)

Correlation and Regression Analysis in R
No ratings yet
Correlation and Regression Analysis in R
17 pages
R Programming for Data Science Lab
No ratings yet
R Programming for Data Science Lab
21 pages
Data Preprocessing and Analysis Techniques
No ratings yet
Data Preprocessing and Analysis Techniques
15 pages
ML Code Final Hemant
No ratings yet
ML Code Final Hemant
24 pages
05 Multivariate Analysis Using R-BNM
No ratings yet
05 Multivariate Analysis Using R-BNM
7 pages
Data Science with R: mtcars Analysis
No ratings yet
Data Science with R: mtcars Analysis
11 pages
R Data Visualization Techniques
No ratings yet
R Data Visualization Techniques
28 pages
Hadoop Word Count and Data Analysis Guide
No ratings yet
Hadoop Word Count and Data Analysis Guide
16 pages
R Programming 5,6,8,9, Lead Program
No ratings yet
R Programming 5,6,8,9, Lead Program
1 page
Machine Learning Techniques Explained
No ratings yet
Machine Learning Techniques Explained
43 pages
Statistical Analysis Techniques Overview
100% (1)
Statistical Analysis Techniques Overview
6 pages
R Data Analysis Basics and Techniques
No ratings yet
R Data Analysis Basics and Techniques
78 pages
Mixed Models for Correlated Data Analysis
No ratings yet
Mixed Models for Correlated Data Analysis
29 pages
Data Mining Practical Report in R
No ratings yet
Data Mining Practical Report in R
36 pages
Data Management and Analysis in R
No ratings yet
Data Management and Analysis in R
2 pages
R Data Analysis Techniques Guide
No ratings yet
R Data Analysis Techniques Guide
12 pages
R Data Analysis: Visualization & Stats
No ratings yet
R Data Analysis: Visualization & Stats
7 pages
Chi-Square and PCA Analysis Guide
100% (2)
Chi-Square and PCA Analysis Guide
97 pages
Data Science Lab Manual for TYCS VI
No ratings yet
Data Science Lab Manual for TYCS VI
39 pages
Multicollinearity Diagnostics in SAS
No ratings yet
Multicollinearity Diagnostics in SAS
34 pages
CJFR 2020 0257suppla
No ratings yet
CJFR 2020 0257suppla
9 pages
Dimensionality Reduction in R Techniques
No ratings yet
Dimensionality Reduction in R Techniques
24 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
22 pages
Clinical Prediction Models Guide in R
No ratings yet
Clinical Prediction Models Guide in R
59 pages
Data Science Projects in R
No ratings yet
Data Science Projects in R
51 pages
Data Analysis Techniques in R
No ratings yet
Data Analysis Techniques in R
6 pages
Iris Dataset Analysis and Modeling Techniques
No ratings yet
Iris Dataset Analysis and Modeling Techniques
12 pages
R Codes
No ratings yet
R Codes
4 pages
PCA and KNN Regression Code Examples
No ratings yet
PCA and KNN Regression Code Examples
19 pages
PCA and ML Techniques in R Code
No ratings yet
PCA and ML Techniques in R Code
62 pages
A CS1B 1120
No ratings yet
A CS1B 1120
8 pages
Nguyen Thi Lan Nhi Report
No ratings yet
Nguyen Thi Lan Nhi Report
17 pages
Overview of mtcars Dataset Analysis
No ratings yet
Overview of mtcars Dataset Analysis
13 pages
R Data Analysis Lab: Iris & mtcars Datasets
No ratings yet
R Data Analysis Lab: Iris & mtcars Datasets
3 pages
Air Quality Data Cleaning Process
No ratings yet
Air Quality Data Cleaning Process
4 pages
Understanding Linear Models in R
No ratings yet
Understanding Linear Models in R
54 pages
Computer Oriented Statistical Techniques
No ratings yet
Computer Oriented Statistical Techniques
29 pages
Path Analysis in R: A Guide
No ratings yet
Path Analysis in R: A Guide
30 pages
R Programming for Data Science Lab Exercises
No ratings yet
R Programming for Data Science Lab Exercises
20 pages
Data Analysis Techniques and Visualizations
No ratings yet
Data Analysis Techniques and Visualizations
13 pages
R Journal
No ratings yet
R Journal
13 pages
Class Slide15
No ratings yet
Class Slide15
27 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
59 pages
R Programming for Data Analysis Guide
No ratings yet
R Programming for Data Analysis Guide
31 pages
Data Mining
No ratings yet
Data Mining
27 pages
R for Marketing Research Insights
No ratings yet
R for Marketing Research Insights
47 pages
Essential R Commands for Data Analysis
No ratings yet
Essential R Commands for Data Analysis
8 pages
Data Mining: PCA and Clustering Analysis
No ratings yet
Data Mining: PCA and Clustering Analysis
29 pages
R Data Visualization Techniques
No ratings yet
R Data Visualization Techniques
14 pages
R Cheat Sheet - BC GC
No ratings yet
R Cheat Sheet - BC GC
20 pages
Class Slide15
No ratings yet
Class Slide15
23 pages
Logistic Regression & Decision Tree EDA
No ratings yet
Logistic Regression & Decision Tree EDA
12 pages
Orthogonal Rotation in Factor Analysis
No ratings yet
Orthogonal Rotation in Factor Analysis
15 pages
Decision Analysis and Statistical Methods Guide
No ratings yet
Decision Analysis and Statistical Methods Guide
2 pages
Fixed vs. Random Effects in Regression
No ratings yet
Fixed vs. Random Effects in Regression
3 pages
Econometrics Exercise Answers Guide
100% (2)
Econometrics Exercise Answers Guide
78 pages
ISOM2500 Regression Exam Solutions
No ratings yet
ISOM2500 Regression Exam Solutions
3 pages
Statistical Analysis Concepts Explained
No ratings yet
Statistical Analysis Concepts Explained
11 pages
Multiple Regression Analysis of Overhead Costs
No ratings yet
Multiple Regression Analysis of Overhead Costs
9 pages
Lampiran R Studio
No ratings yet
Lampiran R Studio
25 pages
Fixing Endogeneity in Cointegration Analysis
No ratings yet
Fixing Endogeneity in Cointegration Analysis
18 pages
Impact of Age, Service, Education on Productivity
No ratings yet
Impact of Age, Service, Education on Productivity
8 pages
Greek Industrial Production Modeling
No ratings yet
Greek Industrial Production Modeling
19 pages
Green Finance and Renewable Energy in South Asia
No ratings yet
Green Finance and Renewable Energy in South Asia
26 pages
Ph.D. Admission Notification 2025-26
No ratings yet
Ph.D. Admission Notification 2025-26
18 pages
Regression Analysis Summary Output
No ratings yet
Regression Analysis Summary Output
17 pages
EViews TSLS Regression Guide
No ratings yet
EViews TSLS Regression Guide
2 pages
Agriculture's Impact on Bangladesh Growth
No ratings yet
Agriculture's Impact on Bangladesh Growth
40 pages
Nonlinear Regression Models Overview
No ratings yet
Nonlinear Regression Models Overview
20 pages
Econ 222: Microeconometrics Syllabus
No ratings yet
Econ 222: Microeconometrics Syllabus
4 pages
Lectures PowerPoints PDF
No ratings yet
Lectures PowerPoints PDF
459 pages
R Package for Blinder-Oaxaca Decomposition
No ratings yet
R Package for Blinder-Oaxaca Decomposition
19 pages
Evaluating Policies with Econometrics in India
No ratings yet
Evaluating Policies with Econometrics in India
2 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
8 pages
Regression Diagnostics Explained
No ratings yet
Regression Diagnostics Explained
10 pages
Logistic Regression Case Study Analysis
No ratings yet
Logistic Regression Case Study Analysis
9 pages
Course Plan - FDSA
No ratings yet
Course Plan - FDSA
9 pages
Bank Profitability Factors in Sub-Saharan Africa
No ratings yet
Bank Profitability Factors in Sub-Saharan Africa
14 pages
STAT 3008 Regression Assignment 2
No ratings yet
STAT 3008 Regression Assignment 2
2 pages
Understanding Econometrics and Data
100% (1)
Understanding Econometrics and Data
22 pages
Labor Economics 9th Edition by George Borjas 9781266095528 1266095527 Complete Chaptes
100% (5)
Labor Economics 9th Edition by George Borjas 9781266095528 1266095527 Complete Chaptes
181 pages
Corporate Governance Impact on Indian Firms
No ratings yet
Corporate Governance Impact on Indian Firms
239 pages
CEP Estimators for Elliptical Errors
No ratings yet
CEP Estimators for Elliptical Errors
20 pages
SPC - Part A - Case Discussion
No ratings yet
SPC - Part A - Case Discussion
4 pages

PCA and Factor Analysis in R

Uploaded by

PCA and Factor Analysis in R

Uploaded by

Practical 10

Principal component analysis using R

# Load the dataset

Factor analysis with Call: fa(r = iris_scaled, nfactors = 4, rotate

Test of the hypothesis that 4 factors are sufficient.

The root mean square of the residuals (RMSA) is 0

Tucker Lewis Index of factoring reliability = 1.009

MR1 MR2 MR3 MR4

> # Load the data

> [Link] <- iris$Species %>%

+ createDataPartition(p = 0.8, list = FALSE)

> [Link] <- iris[[Link], ]

> [Link] <- iris[-[Link], ]

> # Estimate preprocessing parameters

> [Link] <- [Link] %>%

+ preProcess(method = c("center", "scale"))

> # Transform the data using the estimated parameters

> [Link] <- [Link] %>% predict([Link])

> [Link] <- [Link] %>% predict([Link])

> # Fit the model

> model <- lda(Species~., data = [Link])

> # Make predictions

> predictions <- model %>% predict([Link])

> # Model accuracy

> model <- lda(Species~., data = [Link])

lda(Species ~ ., data = [Link])

Prior probabilities of groups:

setosa versicolor virginica

0.3333333 0.3333333 0.3333333

[Link] [Link] [Link]

setosa -1.0112835 0.78048647 -1.2900001

versicolor 0.1014181 -0.68674658 0.2566029

virginica 0.9098654 -0.09373989 1.0333972

Coefficients of linear discriminants:

[Link] 0.6794973 0.04463786

[Link] 0.6565085 -1.00330120

[Link] -3.8365047 1.44176147

[Link] -2.2722313 -1.96516251

> cluster analysis

# Installing the package

> # Loading package

Call: glm(formula = vs ~ wt + disp, family = "binomial", data =

Degrees of Freedom: 23 Total (i.e. Null); 21 Residual

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 33.271 on 23 degrees of freedom

Number of Fisher Scoring iterations: 6

> predict_reg <- predict(logistic_model,

Residual standard error: 5774 on 5 degrees of freedom

> # Create a data frame with new input values

> # given below is the data pertaining to income, years of service

# given below is the data pertaining to income ,years of service and

You might also like