0% found this document useful (0 votes)

3 views1 page

Varimax Rotation

The document provides a tutorial on performing factor analysis using Python's scikit-learn library, specifically applied to a food-texture dataset. It explains the process of standardizing data, fitting a factor analysis model, and interpreting the results, including communality and uniqueness of variables. The tutorial emphasizes the importance of understanding the underlying factors and their relationships to effectively interpret the data.

Uploaded by

Sandeepan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views1 page

Varimax Rotation

Uploaded by

Sandeepan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Homepage SOGA Startpage SOGA-R Privacy Policy Accessibility Statement

Search with Google™ ... 

Department of Earth Sciences /

STATISTICS AND GEODATA ANALYSIS USING PYTHON (SOGA-PY)

INTRODUCTION TO PYTHON BASICS OF STATISTICS ADVANCED STATISTICS MACHINE LEARNING

Homepage  SOGA-Py  Advanced statistics  Multivariate approaches  Factor Analysis  A simple example of Factor Analysis in Python

 39 / 100 

A simple example of Factor Analysis in Python

A simple example of factor analysis in Python

In this example we compute a factor analysis, employing the scikit-learn library.

We assume that our data was generated by a linear transformation of a lower dimensional data set, with an overlay of white noise. The factor analysis allows us to retrieve these
underlying factors and thus to lower the dimensionality of our data.

Let's import the needed tools and get going. We will give further explanation along the way.

In [1]: from pandas import read_csv, Series, DataFrame

from [Link] import FactorAnalysis
from [Link] import StandardScaler
import [Link] as plt
import numpy as np

The data set

Let us get our hands dirty and apply a factor analysis on a the food-texture data set. We already discussed the data set in the section on principal component analysis, so you are
probably familiar with the data set.

This open source data set is available here and describes texture measurements of a pastry-type food.

In [2]: food = read_csv("[Link] index_col=0)

[Link]()

Out[2]: Oil Density Crispy Fracture Hardness

B110 16.5 2955 10 23 97

B136 17.7 2660 14 9 139

B171 16.2 2870 12 17 143

B192 16.7 2920 10 31 95

B225 16.3 2975 11 26 143

The data set consists of 50 rows (observations) and 5 columns (features/variables). The features are:

Oil: percentage oil in the pastry

Density: the product’s density (the higher the number, the more dense the product)
Crispy: a crispiness measurement, on a scale from 7 to 15 , with 15 being more crispy.
Fracture: the angle, in degrees, through which the pasty can be slowly bent before it fractures.
Hardness: a sharp point is used to measure the amount of force required before breakage occurs.

The FactorAnalysis class of scikit-learn

The class FactorAnalysis of the scikit-learn package, enables many methods around Factor analysis. When instantiating the class, we can pass it the desired number of factors.

FactorAnalysis(n_components = <factors>)
Replace <factors> with the amount of desired factors.

With this class we can perform a maximum-likelihood factor analysis on a covariance matrix or data matrix, specifying the desired number of factors. By an additional argument
rotation the transformation of the factors may be specified to be either varimax or quartimax , two types of orthogonal rotation or None (default) for no rotation.

Getting a reasonable value for the amount of factors desired is a tricky aspect of factor analysis. If we already have some understanding of the system that created our data, we
may make an make an educated guess about the number of latent variables.

If we do not know much about out data other than that the number of variables is not too large, we may simply try several values to initialize the model. In most cases though and to
do our due dilligence, we should use a more sophisticated approach and perform a principal component analysis (PCA) and get a good initial estimate of the number of factors.

Since we have already explained the PCA, we will not repeat it here and just make a guess and set the number of factor to be factors = 2 . Furthermore, we try the analysis with
the rotation set to varimax and with the default value rotation = None .

Let's compute this and plot it using matplotlib.

In [3]: X = StandardScaler().fit_transform(food) # Standardize the data

factors = 2
# a list of 2 tuples containing titles for and instances of or class
fas = [
("FA no rotation", FactorAnalysis(n_components = factors)),
("FA varimax", FactorAnalysis(n_components = factors, rotation="varimax")),
]

# Let's prepare some plots on one canvas (subplots)

fig, axes = [Link](ncols=len(fas), figsize=(10, 8))

'''
And loop over the variants of our analysis `fas`, zipped with the
plot axes `axes`
'''
for ax, (title, fa) in zip(axes, fas):
# Fit the model to the standardized food data
fa = [Link](X)
# and transpose the component (loading) matrix
factor_matrix = fa.components_.T
# Plot the data as a heat map
im = [Link](factor_matrix, cmap="RdBu_r", vmax=1, vmin=-1)
# and add the corresponding value to the center of each cell
for (i,j), z in [Link](factor_matrix):
[Link](j, i, str([Link](2)), ha="center", va="center")
# Tell matplotlib about the metadata of the plot
ax.set_yticks([Link](len([Link])))
if ax.get_subplotspec().is_first_col():
ax.set_yticklabels([Link])
else:
ax.set_yticklabels([])
ax.set_title(title)
ax.set_xticks([0, 1])
ax.set_xticklabels(["Factor 1", "Factor 2"])
# and squeeze the axes tight, to save space
plt.tight_layout()

# and add a colorbar

cb = [Link](im, ax=axes, location='right', label="loadings")
# show us the plot
[Link]()

Interpretation of the results

Before we interpret the results of the factor analysis, let us recall the basic idea behind it. Factor analysis creates linear combinations of factors to abstract the variable’s
underlying communality. This reduces the amount of factors in the data set, while preserving most of the variance.

This allows us to aggregate a large number of observable variables in a model to represent an underlying concept, making it easier to comprehend the data.

The variability in a data set X, is given by Σ, and its estimate Σ

^ is composed of the variability explained by a linear combination of the factors, which we call communality, and

of the remaining variability that can not be explained by a linear combination of the factors, namley the uniqueness.
T
^ = ^^ ^
Σ ΛΛ + Ψ
 
communality uniqueness

Now let's have a look at our results:

^
In the plot above we can see the loadings, which range from −1 to 1 . This part is represented by the Λ in the equation above. The loadings are the contribution of each original
variable to the factor. Variables with loading values further away from 0 have a larger part of their variability factor.

Now we will check the uniqueness for of each variable. Note that we only use the varimax rotation method, going forward.

The next plot is generated straight from a PandasSeries , using the .plot() method. This provides easy access to quick plots.

In [4]: fa = FactorAnalysis(n_components = 2, rotation="varimax")

[Link](X)
uniqueness = Series(fa.noise_variance_, index=[Link])
[Link](
kind="bar",
ylabel="Uniqueness"
)

Out[4]: <AxesSubplot:ylabel='Uniqueness'>

The uniqueness, which ranges from 0 to 1 . It is, sometimes also referred to as (white) noise, corresponds to the proportion of variability that can not be explained by a linear
combination of the factors. This part is represented by the Ψ
^ in the equation above. A high uniqueness for a variable indicates that the factors do not account well for its

variance.

Opposing the uniquess, stands the communality:

In [5]: # Communality
communality = Series([Link](fa.components_.T).sum(axis=1), index=[Link])
[Link](
kind="bar",
ylabel="communality"
)

Out[5]: <AxesSubplot:ylabel='communality'>

By squaring the loading we can compute the fraction of the variable’s total variance explained by the factors (Teetor 2011{target="_blank"}). This proportion of the variability is
denoted as communality.

A way way to calculate the the uniqueness, when you already computed the communlity is to subtract it from 1 . An appropriate factor model results in low values for
uniqueness and high values for communality. So if we see bad results for our model, we could try a different number of underlying factors (latent variables).

In [6]: #and back to uniqueness

(1 - communality).plot(kind="bar", ylabel="uniqueness")

Out[6]: <AxesSubplot:ylabel='uniqueness'>

Recall the factor analysis model: Σ

^ ^Λ
= Λ ^ ^
+ Ψ

Using our factor model fa we may calculate Σ

^ and compare it to the observed correlation matrix, S , by simple matrix algebra.

We use numpy to perform fast and efficient math operations. Note: We can also use pandas, since it "wraps" around numpy, basically just forwarding our commands to the library.

In [8]: # the word 'lambda' is reserved for a python-operator, so we use the underscore at the end
lambda_ = fa.components_
psi = [Link](uniqueness)
s = [Link]([Link](X))
sigma = [Link](lambda_.T, lambda_) + psi
residuals = (s - sigma)

We subtracted the fitted correlation matrix Σ

^ ( sigma ) from the observed correlation matrix S . The resulting matrix is called the residual matrix. Numbers close to 0 indicate

that our factor model is a good representation of the underlying system. Now lets plot the results.

In [9]: ax = [Link]()
im = [Link](residuals, cmap="RdBu_r", vmin=-1, vmax=1)
ax.tick_params(axis="x", bottom=False, labelbottom=False, top=True, labeltop=True)
ax.set_xticks(range(5))
ax.set_xticklabels([Link])
ax.set_yticks(range(5))
ax.set_yticklabels([Link])
for (i,j), z in [Link](residuals):
[Link](j, i, str([Link](3)), ha="center", va="center")

[Link](im, ax=ax, location='right')

ax.set_title("FA residual matrix")
plt.tight_layout()

Interpretation of the factors

The purpose of a rotation is to produce more extreme loadings. The idea behind this is to give meaning to the factors. This can help with their interpretation. From a
mathematical viewpoint, there is no difference between a rotated and unrotated matrix. The fitted model is the same, the uniquenesses are the same, and the proportion of
variance explained is the same.

Here we fit three factor models, one with no rotation , one with varimax rotation, and one with quartimax rotation. We then make a scatter plot of the first and second
loadings.

In [10]: methods = [
("FA No rotation", FactorAnalysis(2,)),
("FA Varimax", FactorAnalysis(2, rotation="varimax")),
("FA Quartimax", FactorAnalysis(2, rotation="quartimax")),
]
fig, axes = [Link](ncols=3, figsize=(10, 8), sharex=True, sharey=True)

for ax, (method, fa) in zip(axes, methods):

fa = [Link](X)

components = fa.components_

vmax = [Link](components).max()
[Link](components[0,:], components[1, :])
[Link](0, -1, 1, color='k')
[Link](0, -1, 1, color='k')
for i,j, z in zip(components[0, :], components[1, :], [Link]):
[Link](i+.02, j+.02, str(z), ha="center")
ax.set_title(str(method))
if ax.get_subplotspec().is_first_col():
ax.set_ylabel("Factor 1")
ax.set_xlabel("Factor 2")

plt.tight_layout()
[Link]()

What does this mean?

How can I interpret these factors? If two variables have loadings further away from 0 for the same factor, we know they are related. We want to understand the data in order to
give meaningful names to the latent variables.

Taking a look at the plot in the middle FA Varimax above, it appears that Factor 1 describes a variable that makes pastry soft, less crisp. This description fits flaky pastry
rather well.

Whereas the loadings in Factor 2 show high density and little oil . This would fit the classification of hot water crust pastry

Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by
soga[at][Link].

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.

 

SOGA SOGA-R

SOGA Project Service Navigation This Page

 SOGA Startpage  Homepage  Print

 SOGA-R  SOGA Startpage  Subscribe RSS-Feed

 SOGA-R

 Privacy Policy

 Accessibility Statement

The Latin words veritas, justitia, and libertas, which frame the seal of Freie Universität Berlin, stand for the values that
have defined the academic ethos of Freie Universität since its founding in December 1948.

Python Data Analysis Cheat Sheet
No ratings yet
Python Data Analysis Cheat Sheet
2 pages
Python Data Analysis Cheat Sheet
No ratings yet
Python Data Analysis Cheat Sheet
2 pages
EDA Techniques with Python and R
No ratings yet
EDA Techniques with Python and R
26 pages
Data Visualization Techniques in Python
No ratings yet
Data Visualization Techniques in Python
12 pages
Z-Test Implementation with Pandas
No ratings yet
Z-Test Implementation with Pandas
39 pages
DVP Sem
No ratings yet
DVP Sem
27 pages
Density and Contour Plots in Python
No ratings yet
Density and Contour Plots in Python
18 pages
Statistical Functions and Data Visualization
No ratings yet
Statistical Functions and Data Visualization
42 pages
Data Visualization Techniques in Python
No ratings yet
Data Visualization Techniques in Python
6 pages
Bonafide Certificate for AI & Data Science
No ratings yet
Bonafide Certificate for AI & Data Science
49 pages
Matplotlib Exam Notes & Examples
No ratings yet
Matplotlib Exam Notes & Examples
3 pages
Python Data Visualization Experiments
No ratings yet
Python Data Visualization Experiments
17 pages
Sec4 Data Visualization
No ratings yet
Sec4 Data Visualization
15 pages
Data Visualization Process in Python
No ratings yet
Data Visualization Process in Python
21 pages
Data Exploration Techniques in Mining
No ratings yet
Data Exploration Techniques in Mining
11 pages
Box Plot and Data Analysis Techniques
No ratings yet
Box Plot and Data Analysis Techniques
7 pages
Introduction to Pandas for Data Analysis
No ratings yet
Introduction to Pandas for Data Analysis
25 pages
Python Data Visualization Techniques
No ratings yet
Python Data Visualization Techniques
12 pages
Data Visualization with Matplotlib
No ratings yet
Data Visualization with Matplotlib
50 pages
Pca Varimax Rotation
No ratings yet
Pca Varimax Rotation
2 pages
Visualizing Data Relationships and Trends
No ratings yet
Visualizing Data Relationships and Trends
12 pages
Data Visualization Using Matplotlib
No ratings yet
Data Visualization Using Matplotlib
6 pages
Data Visualization with Matplotlib Guide
No ratings yet
Data Visualization with Matplotlib Guide
6 pages
Hands-On EDA with Python Techniques
No ratings yet
Hands-On EDA with Python Techniques
24 pages
Creating Basic Plots with Matplotlib
No ratings yet
Creating Basic Plots with Matplotlib
11 pages
Data Visualization with Python's Matplotlib
No ratings yet
Data Visualization with Python's Matplotlib
17 pages
Basic Plotting with Pandas in Python
No ratings yet
Basic Plotting with Pandas in Python
14 pages
Understanding Regression and Data Visualization
No ratings yet
Understanding Regression and Data Visualization
3 pages
Quartile Visualization Techniques
No ratings yet
Quartile Visualization Techniques
10 pages
Contour and Density Plotting with Matplotlib
No ratings yet
Contour and Density Plotting with Matplotlib
6 pages
Cheat Sheet - Exploratory Data Analysis
No ratings yet
Cheat Sheet - Exploratory Data Analysis
2 pages
Exploratory Data Analysis Techniques
No ratings yet
Exploratory Data Analysis Techniques
22 pages
Python Data Analysis with NumPy & Pandas
No ratings yet
Python Data Analysis with NumPy & Pandas
24 pages
Lab Ashish
No ratings yet
Lab Ashish
18 pages
Data Visualization Techniques in Python
No ratings yet
Data Visualization Techniques in Python
35 pages
Matplotlib Cheat Sheet for Python
No ratings yet
Matplotlib Cheat Sheet for Python
1 page
Machine Learning Lab Manual for M.Tech
No ratings yet
Machine Learning Lab Manual for M.Tech
42 pages
Unit - V - Data Visualization
No ratings yet
Unit - V - Data Visualization
30 pages
Matplotlib Aprender en Ingles
No ratings yet
Matplotlib Aprender en Ingles
42 pages
Install Pandas and DataFrame Basics
No ratings yet
Install Pandas and DataFrame Basics
28 pages
Python Data Visualization Techniques
No ratings yet
Python Data Visualization Techniques
11 pages
Matplotlib Plotting Techniques for Class 12
No ratings yet
Matplotlib Plotting Techniques for Class 12
5 pages
Factor Analysis for Feature Selection
No ratings yet
Factor Analysis for Feature Selection
3 pages
Matplotlib Cheat Sheet for Quick Reference
No ratings yet
Matplotlib Cheat Sheet for Quick Reference
6 pages
Data Mining Practicals Part 2
No ratings yet
Data Mining Practicals Part 2
21 pages
Essential Data Visualization Techniques
No ratings yet
Essential Data Visualization Techniques
27 pages
Da Unit-1
No ratings yet
Da Unit-1
11 pages
Data Visualization and Probability in R
No ratings yet
Data Visualization and Probability in R
57 pages
Matplotlib Visualization Techniques
No ratings yet
Matplotlib Visualization Techniques
15 pages
Data Visualisation Matplotlib
No ratings yet
Data Visualisation Matplotlib
15 pages
Matplotlib Cheat Sheet for Python
No ratings yet
Matplotlib Cheat Sheet for Python
1 page
Air Quality Data Analysis Framework
No ratings yet
Air Quality Data Analysis Framework
8 pages
Exploratory Data Analysis in Python
No ratings yet
Exploratory Data Analysis in Python
16 pages
04 Data Visualization - Matplotlib
No ratings yet
04 Data Visualization - Matplotlib
15 pages
PCA and Factor Analysis in R
No ratings yet
PCA and Factor Analysis in R
8 pages
MQTC v2016 IIB Performance Final-24-33
No ratings yet
MQTC v2016 IIB Performance Final-24-33
10 pages
Citra Log: Luigi's Mansion Issues
No ratings yet
Citra Log: Luigi's Mansion Issues
17 pages
Mid-Level ASP.NET Developer Job Opening
No ratings yet
Mid-Level ASP.NET Developer Job Opening
2 pages
MonitorSONIC SG6841
No ratings yet
MonitorSONIC SG6841
49 pages
Introduction to Internet Technology
No ratings yet
Introduction to Internet Technology
37 pages
Curro Bloemfontein Intersen Phase Guide
No ratings yet
Curro Bloemfontein Intersen Phase Guide
18 pages
EP203 Automatic Extinguisher Panel Manual
No ratings yet
EP203 Automatic Extinguisher Panel Manual
20 pages
Ebook & Testbank Wiley Pathways Healthcare Management Tools and Techniques For Managing in A Health Care Environment 1E Lombardi
100% (2)
Ebook & Testbank Wiley Pathways Healthcare Management Tools and Techniques For Managing in A Health Care Environment 1E Lombardi
268 pages
Itel 25v 200ah Lithium Battery Price in Pakistan
No ratings yet
Itel 25v 200ah Lithium Battery Price in Pakistan
1 page
Enhanced GWR Adapter for Desalter Use
No ratings yet
Enhanced GWR Adapter for Desalter Use
1 page
Ionizing Radiation Effects in SONOS-Based Neuromorphic Inference Accelerators
No ratings yet
Ionizing Radiation Effects in SONOS-Based Neuromorphic Inference Accelerators
8 pages
NDR User Registration & Profile Update Form
No ratings yet
NDR User Registration & Profile Update Form
2 pages
Aeromag September - October 2020
No ratings yet
Aeromag September - October 2020
27 pages
Foam Fire Fighting System Guidelines
No ratings yet
Foam Fire Fighting System Guidelines
4 pages
Numerical Methods for Root Finding
No ratings yet
Numerical Methods for Root Finding
46 pages
C.R.I. Pumps: Industry Overview & Insights
No ratings yet
C.R.I. Pumps: Industry Overview & Insights
5 pages
Managing Incident Resources in ICS
No ratings yet
Managing Incident Resources in ICS
52 pages
E-Commerce Website Design for Shopping
No ratings yet
E-Commerce Website Design for Shopping
9 pages
MNN Station Mechanical & Electrical Inspection Report
No ratings yet
MNN Station Mechanical & Electrical Inspection Report
7 pages
C Program for Producer-Consumer & Dining Philosophers
No ratings yet
C Program for Producer-Consumer & Dining Philosophers
4 pages
SY0-601問題集、CompTIA実際の試験問題 - 模擬練習
No ratings yet
SY0-601問題集、CompTIA実際の試験問題 - 模擬練習
24 pages
AI Literacy Guidelines for Academic Libraries
No ratings yet
AI Literacy Guidelines for Academic Libraries
12 pages
BERT for Call Center Text Classification
No ratings yet
BERT for Call Center Text Classification
4 pages
Minecraft Fabric Loader v0.16.14 Logs
No ratings yet
Minecraft Fabric Loader v0.16.14 Logs
6 pages
Fire Alarm System Technical Specifications
No ratings yet
Fire Alarm System Technical Specifications
20 pages
N2 Purge Requirement Calculation Report
No ratings yet
N2 Purge Requirement Calculation Report
6 pages
Daikin EUWA 40-120K Chiller Manual
No ratings yet
Daikin EUWA 40-120K Chiller Manual
159 pages
Battery Charger Guidelines for Telit Modules
No ratings yet
Battery Charger Guidelines for Telit Modules
11 pages
Troubleshooting High Dataplane CPU
No ratings yet
Troubleshooting High Dataplane CPU
9 pages
Power BI Data Model Lab Instructions
No ratings yet
Power BI Data Model Lab Instructions
24 pages

Varimax Rotation

Uploaded by

Varimax Rotation

Uploaded by

Homepage SOGA Startpage SOGA-R Privacy Policy Accessibility Statement

Search with Google™ ... 

Department of Earth Sciences /

STATISTICS AND GEODATA ANALYSIS USING PYTHON (SOGA-PY)

A simple example of Factor Analysis in Python

A simple example of factor analysis in Python

In [1]: from pandas import read_csv, Series, DataFrame

The data set

In [2]: food = read_csv("[Link] index_col=0)

Out[2]: Oil Density Crispy Fracture Hardness

B110 16.5 2955 10 23 97

B136 17.7 2660 14 9 139

B171 16.2 2870 12 17 143

B192 16.7 2920 10 31 95

B225 16.3 2975 11 26 143

Oil: percentage oil in the pastry

The FactorAnalysis class of scikit-learn

Let's compute this and plot it using matplotlib.

In [3]: X = StandardScaler().fit_transform(food) # Standardize the data

# Let's prepare some plots on one canvas (subplots)

# and add a colorbar

Interpretation of the results

The variability in a data set X, is given by Σ, and its estimate Σ

Now let's have a look at our results:

In [4]: fa = FactorAnalysis(n_components = 2, rotation="varimax")

Opposing the uniquess, stands the communality:

In [6]: #and back to uniqueness

Recall the factor analysis model: Σ

Using our factor model fa we may calculate Σ

We subtracted the fitted correlation matrix Σ

[Link](im, ax=ax, location='right')

Interpretation of the factors

for ax, (method, fa) in zip(axes, methods):

What does this mean?

SOGA Project Service Navigation This Page

 SOGA Startpage  Homepage  Print

 SOGA-R  SOGA Startpage  Subscribe RSS-Feed

You might also like