Mathematics of Linear Elastic FEM
Mathematics of Linear Elastic FEM
1 Introduction
The Finite Element Method (FEM) is about constructing functions that approx-
imate physics using linear algebra. As Mechanical engineering is the engineering
of physical processes, the same way chemical engineering is the engineering of
chemical processes, being able to express physics mathematically is a basic yet
core skill for mechanical engineers.
This paper explores the background and foundational mathematics required
to understand and critically interpret FEM by recapping how to express, solve
and interpret
1
2 A coordinate system, a function, and an engi-
neer walk into a Space bar...
The art of problem-solving is the
art of finding the right function.
George Pólya
2.1 Space
To begin, we identify a space, such as the real number line (R), the Cartesian
plane (R2 ), or three-dimensional space (R3 ), and establish a suitable coordinate
system.
2
2.2.2 2D Coordinate System
The middle panel shows a 2D coordinate system known as the Cartesian plane
typically labelled x, y or x1 , x2 . It consists of two perpendicular axes:
1D: x = x or x = x1
3
This concept can be extended to higher dimensions, though they cannot be
easily visualised. An n-dimensional point would be represented as:
Let x = (x1 , . . . , xn ) ∈ Rn denote a point in n-dimensional Euclidean space,
where each xi represents the coordinate along the i-th orthogonal axis.
4
2.3 Functions
Functions emerge as mathematical constructs that express relationships between
points in a space. For example, a function f : R → R defined by y = f (x) de-
scribes a relationship where each x-coordinate in the domain is associated with
exactly one y-coordinate in the codomain. Graphically, this can be represented
in the Cartesian plane, where each point [x, f (x)] satisfies the functional rela-
tionship. Hence, a function f is an equation that assigns to each element in a
set A exactly one element in a set B. We write this as:
f :A→B
A is called the domain and B is called the codomain.
f :R→R
Example: f (x) = x2
Scalar-valued: f (x, y) = x2 + y 2
T
Vector-valued: f (x, y) = x2 , y 2 , xy
5
where T is measured in degrees Celsius, and x and y are spatial coordinates in
meters. The colour gradient and height of the surface represent the temperature
value at each point.
The top leftmost subfigure in Figure 3 depicts a temperature field with the
height and colours indicating the temperature.
Pressure
The pressure function P (x, t) shows how pressure varies with position and
time:
6
2.4.4 Vector-valued Functions
Vector functions return multiple values (typically 3 for 3D space) for each input.
A vector function of two variables, x and y, is a function that assigns a vector
(with multiple components) to each pair of x and y values. In other words, a
vector function f (x, y) maps the two-dimensional (x, y) coordinate space to a
higher-dimensional vector space. For example, a two-dimensional vector func-
tion might look like 2
x +y
f (x, y) =
x − y2
Displacement
7
3 Approximately Four Approaches to Approxi-
mation
Approximations bridge theoretical
ideals and practical realities.
Author
where f (x) is our fitting function. The square ensures that any difference is
added, whether yi is bigger or smaller than f (xi ).
where [a, b] is the interval of interest. This integral represents the area
between the two functions.
where w1 (x) = x2 , w2 (x) = x, and w3 (x) = 1 are the weight functions. The
weight functions come from the terms of the fit function f (x), which is called
the Bubnov-Galerkin method. This leads to a system of linear equations:
8
b b b b
a x4 dx + b x3 dx + c x2 dx = g(x)x2 dx
a a a a
b b b b
a x3 dx + b x2 dx + c xdx = g(x)xdx
a a a a
b b b b
a x2 dx + b xdx + c dx = g(x)dx
a a a a
3.5 Examples
3.5.1 1D Example: Fitting a Quadratic to a Sine Function
We fit a quadratic function f (x) = ax2 + bx + c to a sine function g(x) = sin(x)
over the interval [0, π].
Orange curve: Quadratic function fitted to these points using least squares
method
Middle left plot: Fitting to continuous function
9
Blue curve: Target function sin(x)
Orange curve: Quadratic function fitted by minimising the integral of the
squared difference
Middle right plot: Method of weighted residuals
Blue curve: Target function sin(x)
Orange curve: Quadratic function fitted using the method of weighted
residuals
Far right plot: Stochastic integration approach
Blue curve: Target function sin(x)
Orange curve: Quadratic function fitted using stochastic integration
To reproduce:
1. Generate 20 points from x ∈ [0, π]
2. Compute y = sin(x)
3. Fit a quadratic function to these points
π
4. For continuous fit, minimize 0
(sin(x) − (ax2 + bx + c))2 dx using gradient-
based optimisation [3].
π
5. For weighted residuals, solve the system of equations derived from 0 (sin(x)−
(ax2 +bx+c))xi dx = 0 for i = 0, 1, 2, which solves a solution for ∇f (x) = 0
(optimality criterion solution) [3].
6. For stochastic integration, randomly sample points from [0, π] and min-
imise the average squared difference using gradient-only optimisation [3].
7. Plot results using 100 points over [0, π]
10
Figure 5: 2D fitting examples
Discrete point fitting is best when data is sparse or noisy and when data
is measured using sensors.
Continuous function fitting is ideal when the data we fit is given in a
functional form.
Author
11
position V (x) characterises force fields. Calculus then provides the machinery
df
to analyse how these quantities change. The derivative dx quantifies rates of
dx
change, crucial in defining velocity v(t) = dt or force f (x) = −∇V (x). Inte-
gration f (x) dx, conversely,
accumulates infinitesimal contributions, essential
in calculating work W = f · dx or determining the centre of mass. Differen-
2
tial equations, combining functions and their derivatives (e.g., m ddt2x = f (x)),
encapsulate the dynamics of physical systems. This interplay of functions and
calculus forms the language in which the laws of physics are written and under-
stood, enabling precise description and prediction of natural phenomena across
scales from the quantum to the cosmic.
4.1 Calculus
Calculus, at its core, is the study of relationships between points in a defined
space by studying how functions change and accumulate:
Figure 4.1.1 visualises the top left: the function top left, top right: gradient
field, bottom left: partial derivative with respect to x, bottom right: partial
derivative with respect to y.
12
4.1.2 Integral Calculus
For univariate functions:
b n
X
f (x)dx = lim f (x∗i )∆x
a n→∞
i=1
where ∆Vi is the volume of the i-th subregion and (x∗i , yi∗ , zi∗ ) is a point in
that subregion.
For a vector field f (x, y, z) and a closed surface s enclosing a volume V :
13
(∇ · f ) dV = f · ds
V s
ds is a vector pointing outward from the surface, perpendicular to the surface
at each point with magnitude of the area of an infinitesimal piece of the surface.
This is known as the Divergence Theorem.
4.2 Applications in 3D
Optimisation: ∇f = 0 at critical points (maxima, minima and saddle
points) [3]
Volume under a surface: V = V
f (x, y, z) dx dy dz
Calculus provides tools to analyse how these relationships change over the
space. Differential calculus examines rates of change between nearby points,
quantified by derivatives, while integral calculus studies accumulation and ar-
eas represented by integrals. These fundamental operations allow us to explore
the behaviour of functions, such as finding extrema, analysing curvature, quan-
tifying differences between curves or calculating areas under curves.
14
These unit transformations are essential for dimensional consistency in phys-
ical equations and can be used to check the correctness of derived equations. Un-
derstanding how units transform under differentiation and integration is crucial
for correctly interpreting and applying these equations in real-world scenarios.
15
In cylindrical coordinates (r, θ, z):
1 ∂ 1 ∂Fθ ∂Fz
∇·F= (rFr ) + +
r ∂r r ∂θ ∂z
1 ∂ 2 1 ∂ 1 ∂Fϕ
∇·F= 2
(r Fr ) + (sin θFθ ) +
r ∂r r sin θ ∂θ r sin θ ∂ϕ
Electromagnetism:
16
– Gauss’s law for electricity: ∇ · E = ϵρ0
– Gauss’s law for magnetism: ∇ · B = 0
– Charge conservation: ∂ρ∂t + ∇ · J = 0
Heat Transfer:
– Describes heat flux in a material
– Helps identify heat sources and sinks in thermal systems
– Heat equation: ρcp ∂T
∂t = −∇ · q + Q
Acoustics:
– Used to study sound wave propagation and compression
– Helps analyse acoustic sources and their behaviour
– Continuity equation for sound waves: ∂ρ
∂t + ρ0 ∇ · v = 0
– Acoustic energy flux: ∇ · I = − ∂E
∂t c
Plasma Physics:
– Describes the behaviour of charged particle distributions
– Used in studying plasma dynamics and instabilities
– Continuity equation for plasma: ∂n
∂t + ∇ · (ns vs ) = 0
s
Meteorology:
– Helps analyse atmospheric pressure systems
– Used in studying wind patterns and air mass movements
– Mass continuity equation: ∂ρ
∂t + ∇ · (ρv) = 0
∂u ∂v ∂w
– Divergence of wind field: ∇ · v = ∂x + ∂y + ∂z
Geophysics:
– Applied in studying seismic wave propagation
– Used in analysing groundwater flow and contaminant transport
– Continuity equation for groundwater: Ss ∂h
∂t = ∇ · (K∇h) + W
Quantum Mechanics:
– Appears in the continuity equation for probability density
– Used in formulating conservation laws for quantum systems
∂|ψ|2
– Continuity equation: ∂t +∇·J=0
Image Processing:
– Applied in edge detection algorithms
– Used in analysing intensity variations in images
– Anisotropic diffusion: ∂I
∂t = ∇ · (c(x, y, t)∇I)
– Divergence of gradient for edge detection: ∇2 I = ∇ · (∇I)
17
The Divergence of the Gradient of a Function
The divergence of a function’s gradient is referred to as the Laplacian opera-
tor. The Laplacian operator denoted as ∇2 or ∆, is a second-order differential
operator of central importance in many mathematics, physics, and engineering
areas. The Laplacian of a function f at a point P measures how much the value
of f at P differs from its average value on a small sphere around P . It measures
the ’curvature’ of f in all directions.
For a scalar function f (x1 , . . . , xn ) of n variables, the Laplacian is defined
as:
n
X ∂2f
∇2 f = ∆f = ∇ · ∇f =
i=1
∂x2i
Common representations include:
In Cartesian coordinates (3D):
1 ∂2f ∂2f
2 1 ∂ ∂f
∇ f= r + 2 2 + 2
r ∂r ∂r r ∂θ ∂z
∂2f
2 1 ∂ 2 ∂f 1 ∂ ∂f 1
∇ f= 2 r + 2 sin θ + 2 2
r ∂r ∂r r sin θ ∂θ ∂θ r sin θ ∂ϕ2
Fluid Dynamics:
– Appears in the Navier-Stokes equations, describing fluid motion
– Used in studying vorticity and stream functions
– Helps analyse diffusion processes in fluids
18
– Navier-Stokes equation: ρ Dv 2
Dt = −∇p + µ∇ v + ρg
– Navier-Stokes equation for incompressible flow: ρ ∂v
∂t + ρ(v · ∇)v =
−∇p + µ∇2 v + f
– Stream function in 2D flow: ∇2 ψ = −ω, where ω is vorticity
Heat Transfer:
– Central to the heat equation, describing temperature distribution
over time
– Used in analysing steady-state heat conduction
– Helps model thermal diffusion in materials
∂T
– Heat equation: ∂t = α∇2 T
– Steady-state heat conduction: ∇2 T = 0
Acoustics:
– Appears in the wave equation for sound propagation
– Helps analyse resonance in acoustic cavities
– Used in studying sound diffraction and scattering
2
1 ∂ p
– Wave equation for sound: ∇2 p = c2 ∂t2
– Helmholtz equation: ∇2 p + k 2 p = 0
Finance:
– Appears in the Black-Scholes equation for option pricing
– Used in modelling diffusion processes in financial markets
– Helps analyse risk spreading in portfolio theory
2
∂V
– Black-Scholes equation: ∂t + 12 σ 2 S 2 ∂∂SV2 + rS ∂V
∂S − rV = 0
∂u 1 ∂2u
– Heat equation for option pricing: ∂τ = 2 ∂x2
19
– Helps analyse electromagnetic wave propagation
– Poisson’s equation for electrostatic potential: ∇2 ϕ = − ϵρ0
2
– Wave equation for electromagnetic waves: ∇2 E = µ0 ϵ0 ∂∂tE
2
Image Processing:
– Used in edge detection and image sharpening algorithms
– Helps in noise reduction and image smoothing
– Applied in feature extraction and image segmentation
– Laplacian filter for edge detection: L(x, y) = ∇2 I(x, y)
∂I
– Heat equation for image denoising: ∂t = ∇2 I
Geophysics:
– Used in gravity and magnetic field analysis
– Helps model seismic wave propagation
– Applied in studying groundwater flow and contaminant transport
– Laplace’s equation for gravitational potential: ∇2 V = 0
– Diffusion equation for groundwater flow: S ∂h 2
∂t = T ∇ h
Plasma Physics:
– Describes potential distributions in plasmas
– Used in studying plasma oscillations and instabilities
– Helps analyse diffusion processes in plasma
– Poisson’s equation for electrostatic potential in plasma: ∇2 ϕ = − ϵe0 (ni −
ne )
∂ 2 ∂
– Drift wave equation: ∂t ∇ ϕ + v∗ ∂y ∇2 ϕ = 0
Computer Graphics:
– Applied in mesh smoothing and surface fairing
– Used in generating realistic textures and terrain
– Helps in simulating fluid and cloth dynamics
– Laplacian smoothing: vinew = vi + λ∇2 vi
– Poisson equation for surface reconstruction: ∇2 f = ∇ · v
Quantum Mechanics:
– Fundamental component of the Schrödinger equation
– Describes the kinetic energy operator for quantum particles
– Used in studying quantum tunnelling and barrier penetration
2
– Time-independent Schrödinger equation: − 2m
ℏ
∇2 ψ + V ψ = Eψ
2
– Time-dependent Schrödinger equation: iℏ ∂ψ 2
∂t = − 2m ∇ ψ + V ψ
ℏ
20
4.5 Examples
4.5.1 Ordinary Differential Equation (ODE): Motion of a Mass-Spring-
Damper System
The following second-order ODE can describe the motion of a mass-spring-
damper system:
d2 x dx
m +b + kx = 0
dt2 dt
where m is the mass, b is the damping coefficient, k is the spring constant,
x is the displacement, and t is time.
This can be rewritten as a system of first-order ODEs:
dx
=v
dt
dv k b
=− x− v
dt m m
Figure 6 visualises the solution to this ODE system in two ways. The top
left shows the position (blue), velocity (green), and acceleration (red) of the
mass-spring-damper system over time. The top right is a phase plot of the
same system, showing how velocity changes with position.
Interpreting these plots is essential in engineering. The mass started at
position 1 on the x-axis, and it had no initial zero velocity. The increasing
amplitude and velocity show the system is gradually losing energy. The spiral
pattern indicates the system’s gradual energy loss due to damping.
∂T ∂2T
=α 2
∂t ∂x
where T (x, t) is the temperature at position x and time t, and α is the
thermal diffusivity.
The solution to this PDE is visualised in three ways:
1. A surface plot showing how temperature (T ) varies with position and time.
2. A surface plot showing how the rate of change of temperature (∂T /∂t)
varies with position and time.
3. A surface plot showing how the second spatial derivative of temperature
(∂ 2 T /∂x2 ) varies with position and time.
21
Figure 6: Visualisation of ODE and PDE solutions. Top row: Mass-spring-
damper system. (Left) Position, velocity, and acceleration over time. (Right)
Phase plot. Bottom row: Heat equation. (Left) Temperature T (x, t). (Mid-
dle) Rate of change of temperature ∂T /∂t. (Right) Second spatial derivative
∂ 2 T /∂x2 .
Figure 6 bottom left is a surface plot of the heat equation solution. The
x-axis represents position, the y-axis represents time, and the z-axis (colour)
represents temperature. The initial sinusoidal temperature distribution can be
seen to ”flatten out” over time due to heat diffusion.
Figure 6bottom middle is a surface plot of dT /dt for the heat equation.
This represents the temperature change rate at each point in space and time.
The magnitude of dT /dt decreases over time as the system approaches thermal
equilibrium.
Figure 6 bottom right is the heat equation’s surface plot of ∂ 2 T /∂x2 . This
represents the curvature of the temperature profile at each point in space and
time. According to the heat equation,∂ 2 T /∂x2 is proportional to dT /dt, which
is evident in the similarity of their plots.
We can observe that higher temperature regions decrease and lower tem-
perature region increase over time. ∂T /∂t and ∂ 2 T /∂x2 differ only by a scale
factor. That scale factor is α. Regions with high positive ∂ 2 T /∂x2 (high up-
ward curvature) correspond to regions where ∂T /∂t is positive (temperature is
increasing), and vice versa. This illustrates the physical principle that heat flows
from high-temperature to low-temperature regions, smoothing the temperature
22
distribution over time. Using the same colour map for all three plots allows for
easy comparison between T , ∂T /∂t, and ∂ 2 T /∂x2 .
Steven J. Leon
5.1.1 Scalars
A scalar is a single number which can be real or complex. Scalars are typically
denoted by lowercase, non-bold letters.
Example: a = 5, b = −3.14, c = 2 + 3i
5.1.2 Vectors
A vector is an ordered list of numbers, often representing a point or direction
in space. Vectors are typically denoted by lowercase, bold letters.
1
−1
Example: v = 2, w =
4
3
Vectors can be represented in different ways:
v1
v2
Column vector: v = .
..
vn
Row vector: vT = v1
v2 ··· vn
5.1.3 Matrices
A matrix is a rectangular array of numbers arranged in rows and columns. It
is typically denoted by uppercase, bold letters.
23
1 2 3
−1 0
Example: A = 4 5 6 ,B=
2 3
7 8 9
A matrix with m rows and n columns is called an m × n matrix.
5.1.4 Relationships
A scalar can be viewed as a 1 × 1 matrix.
These fundamental objects form the basis for more complex operations and
concepts in linear algebra and related fields.
5.2.2 Properties
Associative: (AB)C = A(BC)
Distributive: A(B + C) = AB + AC
24
3. The (i, j)-th element of C is the projection of the j-th column of B onto
the direction of the i-th row of A.
A : Rn → Rm (2)
Then AB can be interpreted as applying this transformation to each column
of B:
5.3.2 Properties
tr(A + B) = tr(A) + tr(B)
tr(AB) = tr(BA)
25
5.3.3 Physical Interpretation
In physics and solid mechanics:
Often represents the sum of principal components or invariants of a system.
In stress analysis, tr(σ) = σ11 + σ22 + σ33 gives the hydrostatic stress.
5.4.2 Properties
det(AB) = det(A) det(B)
det(AT ) = det(A)
det(A−1 ) = 1
det(A)
26
Figure 7: Plane with Normal Vector
27
5.5 Normal Vectors: VIP Physics Vector
There are two common methods to compute a normal vector on an arbitrar-
ily oriented plane: the cross product method and the trigonometric method.
Consider the plane shown in Figure 7.
Method 1: Cross Product
Given three non-collinear points p1 (x1 , y1 , z1 ), p2 (x2 , y2 , z2 ), and p3 (x3 , y3 , z3 )
on the plane:
1. Calculate two vectors on the plane: v = p2 −p1 = (x2 −x1 , y2 −y1 , z2 −z1 )
w = p3 − p1 = (x3 − x1 , y3 − y1 , z3 − z1 )
2. Compute the normal vector n usingthe cross product: n = v × w
(y2 − y1 )(z3 − z1 ) − (z2 − z1 )(y3 − y1 )
3. The resulting normal vector is: n = (z2 − z1 )(x3 − x1 ) − (x2 − x1 )(z3 − z1 )
(x2 − x1 )(y3 − y1 ) − (y2 − y1 )(x3 − x1 )
Method 2: Trigonometric
Alternatively, we can use trigonometry to compute the normal vector:
1. Define the plane using the general equation: ax + by + cz + d = 0
2. The normal vector is parallel to (a, b, c)
3. We can calculate a, b, and c using the following: a = cos θ cos ϕ b =
cos θ sin ϕ c = sin θ
Where θ is the angle between the normal vector and the xy-plane, and ϕ is
the azimuthal angle in the xy-plane from the x-axis.
4. The resulting normal vector is: n = (a, b, c) = (cos θ cos ϕ, cos θ sin ϕ, sin θ)
n
For both methods, to get a unit normal vector, normalise n: n̂ = ∥n∥
Figure 7 illustrates these concepts, showing the plane (in blue), the three
points p1 , p2 , and p3 (in red), and the computed unit normal vectors (in green
for cross product method and purple for trigonometric method).
5.6.1 Properties
(AT )T = A
(A + B)T = AT + BT
(AB)T = BT AT
28
5.7 Matrix Inverse: Matrices That Un-matrix Themselves
For a square matrix A, its inverse A−1 is defined as the unique matrix satisfying:
AA−1 = A−1 A = I
where I is the identity matrix.
5.7.1 Properties
(A−1 )−1 = A
QT Q = QQT = I
This leads to a crucial relationship:
Q−1 = QT
For orthogonal matrices, the inverse is equal to the transpose.
Geometric Interpretation
Orthogonal matrices represent rotations and reflections in Euclidean space.
They preserve lengths and angles between vectors.
x = A−1 b
29
5.7.4 Columns of A−1 as Solutions
Examining AA−1 = I column by column:
A[a−1 −1
1 a2 · · · a−1
n ] = [e1 e2 · · · en ]
Aa−1
i = ei
where ei is the standard basis vector (columns of I).
x = b1 a−1 −1 −1
1 + b2 a2 + · · · + bn an
x = Q−1 b = QT b
This has several important implications:
30
Signal Processing: The Discrete Fourier Transform matrix is unitary, al-
lowing efficient computation of inverse transforms.
Structural Dynamics: Modal analysis often involves orthogonal mode
shapes, simplifying the solution of dynamic equations.
Computer Graphics: The orthogonal rotation matrices in 3D graphics
allow efficient inverse transformations.
Data Analysis: Principal Component Analysis (PCA) uses an orthogonal
transformation to convert correlated variables into uncorrelated principal
components.
Derivation
For a full rank square system:
1. The system has a unique solution: x = A−1 b
2. The inverse exists because det(A) ̸= 0
3. The columns of A form a basis for Rn
31
Solution
The solution can be found using:
Direct methods: Gaussian elimination, LU decomposition
Derivation
For an overdetermined system:
AT Ax = AT b
The least squares solution is:
x = (AT A)−1 AT b
Underdetermined Systems
When m < n, we have an underdetermined system. These systems typically
have infinitely many solutions.
32
Derivation
For an underdetermined system:
x = A+ b = AT (AAT )−1 b
This solution minimizes ∥x∥2 subject to Ax = b.
A = UΣVT
where U and V are orthogonal matrices and Σ is a diagonal matrix of
singular values.
The pseudoinverse is then:
A+ = VΣ+ UT
where Σ+ is formed by reciprocating the non-zero singular values and trans-
posing.
This approach provides a unified framework for solving all types of linear
systems, handling issues of rank deficiency, and providing least squares solutions
when necessary.
Av = λv
This equation tells us that when A operates on v, it only scales v by a factor
λ, without changing its direction.
33
5.9.1 Derivation and Interpretation
Let’s multiply both sides of the eigenvalue equation by vT :
vT Av = λvT v
Now, if we normalize v such that ∥v∥ = 1, then vT v = 1, and we get:
vT Av = λ
This result has a profound interpretation: the eigenvalue λ is the value of
the quadratic form vT Av when v is the corresponding eigenvector.
L(v, µ) = vT Av − µ(vT v − 1)
where µ is a Lagrange multiplier. Setting the gradient to zero:
∂L
= 2Av − 2µv = 0
∂v
This gives us Av = µv, which is precisely the eigenvalue equation with
µ = λ.
34
For example, in structural mechanics:
Eigenvectors of the stiffness matrix represent mode shapes of vibration.
Corresponding eigenvalues give the squared natural frequencies of these
modes.
The orthogonality of eigenvectors means these modes can be excited in-
dependently.
A = QΛQT
where Q is the matrix of eigenvectors and Λ is a diagonal matrix of eigenval-
ues. This decomposition is the basis for Principal Component Analysis (PCA),
where:
This connection between linear algebra and data analysis demonstrates the
far-reaching implications of eigenvector theory.
xT Ax > 0
In the context of elasticity, the strain energy density U is given by:
1 T
U= ε Cε
2
where ε is the strain vector and C is the elasticity tensor. For the strain
energy to be physically meaningful, C must be positive definite.
35
In quantum mechanics, the Hamiltonian operator is represented as a Her-
mitian matrix. Its eigenvectors are the stationary states, and eigenvalues
are the corresponding energies.
Author
6.1.1 Properties
Diagonal matrices possess several important properties:
36
2. Inverse: If all diagonal elements are non-zero, the inverse is also diagonal:
1/d1 0 ··· 0
0 1/d2 · · · 0
D−1 = .
. .
. . . ..
. . . .
0 0 · · · 1/dn
Dx = [d1 x1 , d2 x2 , . . . , dn xn ]T
6.1.3 Applications
Diagonal matrices play crucial roles in various fields:
37
Physics and Engineering
Moment of inertia tensor: In the principal axis system, the moment
of inertia tensor is diagonal.
Stress tensor: In principal stress directions, the stress tensor becomes
diagonal.
Stiffness matrices: In some simple structural problems, stiffness matri-
ces can be diagonal.
Control Theory
Decoupled systems: Diagonal state-space matrices represent decoupled
systems.
Modal analysis: The modal matrix diagonalises the system matrix in
vibration analysis.
A = PDP−1
where D is a diagonal matrix of eigenvalues and P is a matrix whose columns
are the corresponding eigenvectors.
This decomposition is particularly useful for:
Computing matrix powers efficiently: Ak = PDk P−1
38
Solving systems of differential equations
Analysing the long-term behaviour of dynamical systems
S = ST
In component form, this means sij = sji for all i and j.
Understanding symmetric matrices and their properties is crucial in many
areas of physics, engineering, and data science. Their special properties, particu-
larly with regard to eigenvalues and eigenvectors, make them invaluable in mod-
eling many physical and statistical phenomena. The connection to quadratic
forms further extends their applicability, making them a cornerstone of many
mathematical and scientific disciplines.
6.2.1 Properties
Symmetric matrices possess several important properties:
39
6.2.3 Examples
1. Covariance matrix: C = E[(X − µ)(X − µ)T ]
Ixx −Ixy −Ixz
2. Inertia tensor: I = −Ixy Iyy −Iyz
−Ixz −Iyz Izz
σxx τxy τxz
3. Stress tensor: σ = τxy σyy τyz
τxz τyz σzz
4. Adjacency matrix of an undirected graph
6.2.4 Applications
Symmetric matrices play crucial roles in various fields:
1. Mechanics:
Stress and strain tensors in continuum mechanics
Moment of inertia tensors in rigid body dynamics
4. Quantum Mechanics:
5. Optimisation:
Hessian matrices in multivariate calculus
Quadratic programming problems
6. Graph Theory:
Laplacian matrices of graphs
Adjacency matrices of undirected graphs
40
6.2.5 Computational Advantages
Working with symmetric matrices offers several computational benefits:
1. Reduced storage: Only the upper (or lower) triangular part needs to
be stored.
2. Efficient eigenvalue algorithms: Methods like Jacobi rotations con-
verge faster for symmetric matrices.
3. Numerical stability: Many algorithms for symmetric matrices are more
stable than their general counterparts.
4. Guaranteed real eigenvalues: Simplifies many calculations and inter-
pretations.
QT Q = QQT = I
where I is the identity matrix.
Understanding orthogonal matrices and their properties is crucial in many
areas of mathematics, physics, and engineering. Their ability to preserve ge-
ometric properties while allowing efficient computation makes them invaluable
tools in both theoretical analysis and practical applications, from 3D graphics
to quantum mechanics.
41
6.3.1 Properties
Orthogonal matrices possess several important properties:
42
Figure 8: Visualisation of cube rotation and orthogonal matrix
Q = [v1 v2 v3 ]
where v1 , v2 , v3 are mutually orthogonal.
43
Figure 9: Visualisation of the Identity Orthogonal Matrix
Figure 9 shows the column vectors of Q1 , which align with the standard
basis vectors.
44
Figure 10: Visualisation of a Non-trivial Orthogonal Matrix
6.3.6 Applications
This geometric understanding of orthogonal matrices is crucial in various fields:
45
6.3.7 Examples
cos θ − sin θ
1. 2D rotation matrix: Q =
sin θ cos θ
cos θ − sin θ 0
2. 3D rotation matrix (around z-axis): Q = sin θ cos θ 0
0 0 1
4. Permutation matrices
46
Figure 11: Visualisation of Principal Stress Directions
In finite element analysis, it allows for efficient solution methods and pro-
vides insight into the behaviour of structures under load.
47
6.3.13 Applications
Orthogonal matrices play crucial roles in various fields:
1. Computer Graphics:
Representing 3D rotations in rendering and animation
Implementing camera transformations
2. Robotics and Control:
Describing orientation of rigid bodies
Implementing attitude control in spacecraft
3. Signal Processing:
Orthogonal transformations like DCT (Discrete Cosine Transform)
Wavelet transforms
4. Quantum Mechanics:
Representing spin rotations
Describing symmetry operations in molecular systems
5. Data Analysis:
Principal Component Analysis (PCA)
Factor analysis in psychometrics
x = Q−1 b = QT b
This means that solving the system is as simple as a matrix-vector multipli-
cation without the need for more complex linear algebra operations.
48
6.3.16 Orthogonal Decompositions
Orthogonal matrices are fundamental in many matrix decompositions:
U∗ U = UU∗ = I
where U∗ denotes the conjugate transpose (also written as U† ), and I is the
identity matrix.
Understanding unitary matrices and their properties is crucial in many ar-
eas of physics and engineering, particularly in quantum mechanics and signal
processing. Their ability to preserve inner products and norms while allowing
efficient computation makes them invaluable tools in both theoretical analysis
and practical applications.
6.4.1 Properties
Unitary matrices possess several important properties:
49
6.4.2 Relation to Orthogonal Matrices
Unitary matrices are the complex generalisation of real orthogonal matrices.
When the entries are real, a unitary matrix is orthogonal.
6.4.4 Examples
cos θ − sin θ
1. Rotation matrices: In 2D, U =
sin θ cos θ
0 1
2. Pauli matrices: e.g., σx =
1 0
1 1 1
3. Hadamard matrix: H = √2
1 −1
6.4.5 Applications
Unitary matrices play crucial roles in various fields:
1. Quantum Mechanics:
Unitary matrices represent Quantum gates.
Time evolution of closed quantum systems is unitary.
2. Signal Processing:
The Discrete Fourier Transform (DFT) is unitary, facilitating efficient
computation of inverse transforms.
Unitary transforms are used in data compression algorithms.
3. Numerical Linear Algebra:
QR decomposition involves unitary matrices.
Singular Value Decomposition (SVD) uses unitary matrices.
4. Control Theory:
State-space models often involve unitary state transition matrices for
stable systems.
5. Cryptography:
Some cryptographic protocols use unitary operations.
50
6.4.6 Computational Advantages
Working with unitary matrices offers several computational benefits:
x = U−1 b = U∗ b
This means that the solution x is a linear combination of the columns of U∗ ,
with coefficients given by the components of b. This property, combined with
the orthonormality of the columns, makes unitary transformations particularly
useful in analysing and manipulating complex systems.
51
2. Stability Guarantee: The positive definiteness ensures that the system’s
energy is always positive, which is a fundamental requirement for stable
physical systems. Negative energy would imply an unstable or unphysical
state.
3. Potential Energy Surfaces: In molecular dynamics and quantum chem-
istry, positive definite matrices often represent the curvature of potential
energy surfaces near equilibrium points, ensuring the local stability of
molecular configurations.
2. Determinant is positive:
This property is related to the volume expansion in phase space,
which is crucial in statistical mechanics.
3. Cholesky decomposition:
A = LLT , where L is lower triangular.
This decomposition is useful in simulating correlated random vari-
ables in statistical physics.
4. Convex optimisation:
52
6.5.4 Importance in Numerical Methods
1. Stability in iterative methods:
In finite element analysis, a positive definite stiffness matrix ensures
convergence of iterative solvers.
2. Condition number:
The ratio of largest to smallest eigenvalue affects numerical stability
in simulations of physical systems.
References
[1] Strang, G. (2006). Linear Algebra and Its Applications. 4th ed. Belmont,
CA: Thomson, Brooks/Cole.
[2] Apostol, T. M. (2007). Calculus, Volume 1: One-Variable Calculus, with
an Introduction to Linear Algebra. 2nd ed. New York: Wiley.
[3] Snyman, J. A., Wilke, D. N. (2018). Practical Mathematical Optimization:
Basic Optimization Theory and Gradient-Based Algorithms. Second Edi-
tion. New York: Springer.
53