0% found this document useful (0 votes)

67 views104 pages

Discrete Mathematics Lecture Notes 2025-2026

Q: What is the role of internal vertices and leaves in determining the structure of a full m-ary tree?

In a full m-ary tree, internal vertices and leaves determine the hierarchical structure. Each internal vertex, except the root, connects to \( m+1 \) edges, forming levels, with the root connecting \( m \). Leaves, vertices of degree one, mark the tree's endpoints. Together, they uphold the tree's growth pattern and balance: the total number of vertices \( n \) is related to internal vertices \( i \) and leaves \( l \) by \( n = i + l \), with structural equations balancing between these elements.

Q: Illustrate how adjacency matrices can be utilized to identify connected components in a graph?

Adjacency matrices can identify connected components by examining the transitive closure—effectively checking if there exists a path between every pair of vertices. For a graph with adjacency matrix \( A \), the power \( A^n \), where \( n \) is the number of vertices, provides information about connectivity: nonzero entries indicate connections between vertices within a component. Thus, connected components correspond to submatrices indicating strong internal connectivity with no links to vertices outside. Matrix exponentiation and analysis uncover underlying connected structures.

Q: How does Gaussian elimination facilitate the solution of a system of linear equations represented by an augmented matrix?

Gaussian elimination facilitates solving a system of linear equations by transforming the augmented matrix into row echelon form via row operations. This involves performing downsweeps to zero out sub-diagonal elements and arranging equations hierarchically, then upsweeps or back-substitution are performed to solve variables starting from the bottom row. The approach systematically reduces the system to simpler forms, allowing the extraction of variable values efficiently.

Q: What are the characteristics of a tree in graph theory, and how do they differ from general graphs?

A tree in graph theory is a connected acyclic graph, possessing unique characteristics: being connected implies there is a path between any two vertices, and acyclic means it contains no cycles. Compared to general graphs, trees are minimally connected with \( n-1 \) edges given \( n \) vertices, emphasizing simplicity and structure. Their acyclic nature ensures only one path connects any two vertices, distinguishing them from more complex general graphs which may contain cycles and multiple connections.

Q: What conditions must be met for a subgraph to be considered a spanning tree of a graph?

For a subgraph to be considered a spanning tree of a graph, it must satisfy: 1) containing every vertex of the original graph, 2) being minimally connected with no cycles (i.e., a tree structure), and 3) including exactly \( n-1 \) edges where \( n \) is the number of vertices, thus connecting all vertices using the fewest possible edges. This ensures it spans all vertices efficiently without redundant connections.

Q: Can you explain how the solution of the recurrence relation \( a_n = a_{n-1} + a_{n-2} \) leads to finding \( a_7 = 21 \)?

The recurrence relation \( a_n = a_{n-1} + a_{n-2} \) describes how each term in the sequence is the sum of the two preceding terms, similar to the Fibonacci sequence. Starting with initial conditions \( a_1 = 1 \) and \( a_2 = 2 \), we calculate each term sequentially: \( a_3 = 3, a_4 = 5, a_5 = 8, a_6 = 13, \) and finally \( a_7 = 21 \). These calculations align with the sequence definition, ensuring \( a_7 \) is accurate.

Q: How can the characteristic equation \( r^2 = pr + q \) for a recurrence relation be used to find its general solution?

The characteristic equation \( r^2 = pr + q \) arises when solving linear homogeneous recurrence relations of the form \( a_n = pa_{n-1} + qa_{n-2} \). Solving this quadratic equation provides the roots \( r_1 \) and \( r_2 \), which are used in the general solution \( a_n = c(r_1)^n + d(r_2)^n \) where \( c \) and \( d \) are constants determined by initial conditions. This method leverages the structure of the quadratic to derive a closed form for the recurrence, simplifying analysis and calculations.

Q: Why is Dijkstra's Algorithm considered efficient for finding the shortest path in a graph, and what conditions may affect its efficiency?

Dijkstra's Algorithm is efficient for finding the shortest path in a graph because it systematically explores edges closest to the starting point, visiting each node once and updating shortest paths in a priority-based manner. However, its efficiency is affected by the implementation of the priority queue: using a Fibonacci heap results in a time complexity of \( O(E + V \log V) \), making it suitable for dense graphs. Issues arise with negative weight edges as it cannot handle them properly, requiring edge relaxation assumptions.

Q: In the context of recurrence relations, how does one determine if a solution of the form \( a_n = c(3)^n + d(4)^n \) is valid?

For \( a_n = c(3)^n + d(4)^n \) to be a valid solution of a recurrence relation like \( a_n = 7a_{n-1} - 12a_{n-2} \), it must satisfy the characteristic equation derived from the relation. The characteristic equation \( r^2 = 7r - 12 \) resolves to roots \( r = 3 \) and \( r = 4 \). With these roots, \( a_n = c(3)^n + d(4)^n \) accurately captures all solutions associated with the recurrence, provided constants \( c \) and \( d \) fit initial conditions.

Q: What is the significance of a graph having an even number of vertices with odd degree?

The theorem stating that any graph must have an even number of vertices with odd degree is significant because it is a fundamental property derived from the degree sum formula. This formula asserts that the sum of the degrees of all vertices in a graph must be even, as each edge is counted twice (once at each endpoint). Therefore, since odd degree vertices contribute an odd sum, they must occur in pairs for the total to remain even, ensuring consistency with graph theory principles.

The document contains comprehensive lecture notes for the F17SC Discrete Mathematics course at Heriot-Watt University for the academic year 2025-2026. It includes weekly reading assignments, exercises for practice, and model solutions available on Canvas. The content covers various topics such as set theory, probability, recurrence relations, graph theory, and matrices.

Uploaded by

mohammadyousefch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views104 pages

Discrete Mathematics Lecture Notes 2025-2026

Uploaded by

mohammadyousefch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lecture notes for

F17SC Discrete Mathematics

Dr Anatoly Konechny, Dr Adrian Turcanuand Dr Mateja Presern

Department of Mathematics
Heriot-Watt University

AY 2025–2026
Discrete Mathematics Page 2

This is the complete set of notes covering all material taught in the course.

Each section is complemented by a list of exercises. They are different from

the problems in the Problem Sets to be discussed at the workshop sessions. You
are advised to attempt the exercises after reading each of the assigned sections.
They can be also used to practice for the midterm tests and the final exam. Model
solutions to the exercises will be released on Canvas at the end of each week as
separate files.
The reading assignments for each week are on Canvas. For your convenience
they are also listed here:

Week 1 Sections 1.1, 1.2, 1.3, 1.4

Week 2 Sections 1.5, 1.6

Week 3 Sections 1.7, 1.8, 1.9

Week 4 Sections 2.1, 2.2, 3.1

Week 5 Sections 3.2, 3.3, 3.4

Week 7 Sections 4.1, 4.2, 4.3

Week 8 Sections 4.4, 4.5, 4.6

Week 9 Sections 4.7, 4.8, 4.9, 4.10

Week 10 Sections 5.1

Week 11 Sections 5.2, 5.3

Updated: December 18, 2025.

Contents

1 Set Theory and Combinatorics 5

1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Computer Representation of Sets . . . . . . . . . . . . . . . . . . . . 8
1.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Mathematical Induction . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5.1 Closures of relations . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7 The Sum and Product Rules . . . . . . . . . . . . . . . . . . . . . . 22
1.7.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8 Permutations and Combinations . . . . . . . . . . . . . . . . . . . . 24
1.8.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.9 The Inclusion-Exclusion Principle . . . . . . . . . . . . . . . . . . . . 27
1.9.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2 Probability 31
2.1 Probability Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2 Conditional Probability and Independence . . . . . . . . . . . . . . . 37
2.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3 Recurrence relations 41
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Linear Homogeneous Recurrence Relations . . . . . . . . . . . . . . . 45
3.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Solving Inhomogeneous Recurrence Relations . . . . . . . . . . . . . . 48
3.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4 Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3
Discrete Mathematics Page 4

4 Graph Theory 54
4.1 Introduction to Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Adjacency Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4 Directed graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5 Weighted Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.6 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.6.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.7 Spanning trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.7.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.8 Minimum Spanning Trees . . . . . . . . . . . . . . . . . . . . . . . . 72
4.8.1 Prim’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.8.2 Kruskal’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . 74
4.8.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.9 Euler Paths and Circuits . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.9.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.10 Hamilton Paths and Circuits . . . . . . . . . . . . . . . . . . . . . . . 79
4.10.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5 Matrices and linear transformations 84

5.1 Linear equations and Gaussian Elimination . . . . . . . . . . . . . . . 84
5.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2 Matrices as linear transformations in vector spaces . . . . . . . . . . . 95
5.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.3 Eigenvectors and eigenvalues . . . . . . . . . . . . . . . . . . . . . . . 99
5.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Chapter 1

Set Theory and Combinatorics

1.1 Sets
Definition 1.1. A set is an unordered collection of objects.

The notation A = {1, 3, 5, 7} means that the set A contains the four elements
1, 3, 5, 7 . We write a ∈ A to denote that a is an element of the set A . We write
a∈ / A to indicate that a does not belong to A . For example, given A = {1, 3, 5, 7}
we have that 5 ∈ A and 6 ∈ / A.
There are a number of ways of denoting sets. Thus for the set of positive even
numbers we could write
A = {2, 4, 6, 8, . . .}
or
A = {x | x is a positive even number} ,
where the vertical bar | is to be read “such that”.
A set cannot contain the same object more than once. Also, the elements in a
set are not ordered so the sets {a, b, c} and {c, a, b} are the same. In general, two
sets are equal if they contain the same elements.

Definition 1.2. A set B is a subset of a set A if every element of B is also an

element of a A .

If B is a subset of A we write B ⊂ A . For example, if A = {1, 3, 5, 7} and

B = {3, 7} , then B ⊂ A .

Remark 1.1. You should take care to distinguish between ‘B is a subset of A’ ( B ⊂

A ) and ‘B is a element of A’ ( B ∈ A ). For example if

A = {1, {3, 5}, 3, 6, 7},

then all of the following are true

3 ∈ A, {3, 5} ∈ A, 5∈
/ A, {3, 6} ⊂ A, {3} ⊂ A.

5
Discrete Mathematics Page 6

The set that has no elements is called the empty set and is denoted by ∅ . By
convention ∅ ⊂ A for any set A.
Remark 1.2. Note that the set ∅ and the set B = {∅} are different. The single
element of B is the empty set itself.

Definition 1.3. An ordered pair is a pair (a, b) , where a comes before b.

Two pairs (a, b) and (c, d) are equal whenever a = c and b = d . Note that
(3, 8) and (8, 3) are not equal.
Definition 1.4. If A and B are sets then the Cartesian product A × B is the set
of all ordered pairs (a, b) with a ∈ A and b ∈ B .

Example 1.5. Let A = {1, 2, 3} and B = {x, y} . Find the Cartesian product
A×B.
Solution. The Cartesian product A × B is

A × B = { (1, x), (1, y), (2, x), (2, y), (3, x), (3, y) }.

1.1.1 Exercises

Exercise 1.1.1. List the elements of the set

{x | x is the square of an integer and x < 25} .

Exercise 1.1.2. Find B × A for Example 1.5.

1.2 Set Operations

Definition 1.6. If A and B are sets then the union of A and B , A∪B is defined
as
A ∪ B := {x | x ∈ A or x ∈ B}.
The symbol “:=” means “equal by definition”.
Definition 1.7. The intersection of A and B , A ∩ B is defined as

A ∩ B := {x | x ∈ A and x ∈ B}.
Page 7 Dr Adrian Turcanu & Dr Mateja Presern

Note that the statement x ∈ A or x ∈ B does not exclude the possibility that
x is an element of both sets. For example, we have
{1, 3, 5, 7} ∪ {2, 5, 7} = {1, 2, 3, 5, 7}
and
{1, 3, 5, 7} ∩ {2, 5, 7} = {5, 7} .
Remark 1.3. It is useful to picture combinations of sets with the aid of Venn dia-
grams. This will be done in the lectures.

Definition 1.8. If A and B are sets, then the difference of A and B is denoted
by A \ B and defined as
A \ B := {x | x ∈ A and x ∈
/ B}.
For example, if A = {1, 3, 5, 7} and B = {4, 5, 6, 7} then A \ B = {1, 3} .
Often, all the sets under consideration are subsets of some larger set U called the
universal set.
Definition 1.9. The complement of a set A is A = U \ A .
Thus A = {x | x ∈
/ A} .

Example 1.10. Let U = {1, 2, 3, 4, 5, 6, 7} , A = {1, 7} , B = {1, 2, 4, 6} and

C = {1, 3, 5, 7} . Find the sets
(a) C \ A , (b) A ∩ B , (c) (A ∩ B) ∪ C , (d) A ∪ B , (e) (C − A) ∩ B .
Solution. (a) C \ A = {3, 5} .
(b) B = {3, 5, 7} so A ∩ B = {7} .
(c) A ∩ B = {1} and C = {2, 4, 6} . Hence (A ∩ B) ∪ C = {1, 2, 4, 6} .
(d) A ∪ B = {1, 2, 4, 6, 7} so A ∪ B = {3, 5, } .
(e) C \ A = {1, 2, 4, 6, 7} so (C \ A) ∩ B = {1, 2, 4, 6} .

Set operations satisfy the following laws.

A ∪ (B ∪ C) = (A ∪ B) ∪ C
A ∩ (B ∩ C) = (A ∩ B) ∩ C
A∪B =B∪A
A∩B =B∩A
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
(A ∪ B) = A ∩ B
(A ∩ B) = A ∪ B
Note that the empty set ∅ satisfies the following properties A ∪ ∅ = A and
A ∩ ∅ = ∅ for any set A.
Discrete Mathematics Page 8

1.2.1 Exercises

Exercise 1.2.1. Let U = {1, 2, 3, 4, 5, 6, 7, 8} , A = {3, 8} , B = {2, 4, 6, 8} and

C = {1, 3, 5, 7} .
Find the sets
(a) C \ A , (b) A ∩ B , (c) (A ∩ B) ∪ C ,
(d) A ∩ B , (e) (C \ A) ∩ B .

Exercise 1.2.2. Let C = A ∩ B and D = A ∩ B . Find C ∩ D .

1.3 Computer Representation of Sets

There are many ways to represent sets on a computer. We could store the elements
of a set in an unordered fashion. If we did this, it would be time-consuming to
compute the union or intersection of two sets since these operations would require a
large amount of searching for elements. We outline below a method for storing the
elements of a set which makes computing set operations easy.

Step 1. We order a finite universal set U = {u1 , u2 , . . . , un } .

Step 2. Given a subset A of U , we represent A by a bit string of length n

where the k th bit in the string is 1 if uk ∈ A and 0 otherwise.

As an example, let U = {1, 2, 3, 4, 5, 6} , A = {2, 3} and B = {3, 4, 5, 6} . Then

the bit string for A is 0 1 1 0 0 0 and the bit string for B is 0 0 1 1 1 1 .
Given the bit strings for A and B, one can easily compute the bit strings for A,
A ∪ B and A ∩ B as follows.

− To find the bit string for A from the bit string for A we turn each 1 into a 0
and each 0 into a 1.

For the above example, the bit string for A is 0 1 1 0 0 0 so the bit string for A
is 1 0 0 1 1 1 .

− The k th bit in the bit string for A ∪ B is 1 if either (or both) of the bits in
the k th position of A and B is 1, and is 0 if both bits are 0.
− The k th bit in the bit string for A ∩ B is 1 if both of the bits in the k th
position of A and B are 1 , and is 0 if either (or both) of the two bits is 0.

As an example, let U = {1, 2, 3, 4, 5} , A = {1, 3, 5} and B = {3, 4} . The bit

string for A is 1 0 1 0 1 whereas the bit string for B is 0 0 1 1 0 . Therefore, the bit
string for A ∪ B is 1 0 1 1 1 and the bit string for A ∩ B is 0 0 1 0 0 .
Page 9 Dr Adrian Turcanu & Dr Mateja Presern

1.3.1 Exercises

Exercise 1.3.1. Let U = {1, 2, 3, 4, 5, 6, 7} , A = {3, 4, 5, 7} and B = {5, 6} . Find

the bit strings for A , B , A , A ∪ B , and A ∩ B .

Exercise 1.3.2. Which subsets of a finite universal set U do these bit strings rep-
resent?
(a) the string with all zeros, (b) the string with all ones.

Exercise 1.3.3. Explain how to find the bit string corresponding to A \ B.

1.4 Mathematical Induction

Induction is one of the most commonly used proof techniques in Discrete Mathe-
matics and Computer Science. It is used for proving statements which depend on
the integers.

Suppose we want to find a formula for the sum of the first n odd integers:
For n = 1 the sum is 1 = 12 .
For n = 2 the sum is 1 + 3 = 4 = 22 .
For n = 3 the sum is 1 + 3 + 5 = 9 = 32 .
For n = 4 the sum is 1 + 3 + 5 + 7 = 16 = 42 .
Based on the above, it seems reasonable to conjecture that
1 + 3 + 5 + 7 + . . . + (2n − 1) = n2 .
But how does one rigorously prove this?

To start with, let us look at what it means for a statement (or proposition) to
depend on a positive integer n .

Example 1.11. Let P (n) be the statement

1 + 3 + 5 + 7 + . . . + (2n − 1) = n2 .

Then:
Discrete Mathematics Page 10

P (3) states that 1 + 3 + 5 = 32 ;

P (4) states that 1 + 3 + 5 + 7 = 42 .

Both P (3) and P (4) are true.

Example 1.12. Let Q(n) be the statement 3n > (n + 2)2 . Then

Q(2) states that 32 > 42 ;

Q(3) states that 33 > 52 .

Q(2) is false, whereas Q(3) is true.

Example 1.13. Let P (n) be the statement n2 + n is even. Write down P (1) ,
P (3) , P (k) and P (k + 1) .
Solution. P (1) states that 12 + 1 is even. P (3) states that 32 + 3 is even. P (k)
states that k 2 + k is even. P (k + 1) states that (k + 1)2 + k + 1 is even.

Example 1.14. Let P (n) be the statement 50 − 2n ≥ 0 . Write down P (1) , P (5)
and P (30) . Which of them are true?
Solution. P (1) states that 50 − 2 ≥ 0 which is true. P (5) states that 50 − 10 ≥ 0
which is true. P (30) states that 50 − 60 ≥ 0 which is false.

Example 1.15. Let {an } be the sequence defined by a1 = 4 and

an+1 = 2an − 3 , n ≥ 1 .

Let P (n) be the statement an = 2n−1 + 3 . Write down P (1) , P (2) and P (3) .
Which of them are true?
Solution. P (1) states that a1 = 20 + 3 , that is, a1 = 4 . Hence P (1) is true. P (2)
states that a2 = 21 + 3 , that is, a2 = 5 . P (3) states that a3 = 22 + 3 , that is,
a3 = 7 .
We use an+1 = 2an − 3 to find a2 and a3 : a2 = 2a1 − 3 = 2(4) − 3 so a2 = 5,
and a3 = 2a2 − 3 = 2(5) − 3 so a3 = 7. Hence, both P (2) and P (3) are true.
Page 11 Dr Adrian Turcanu & Dr Mateja Presern

Let P (n) be a proposition which depends n where n is a positive integer.

A proof by mathematical induction that P (n) is true for all positive integers n
consists of two steps.

Mathematical induction

Step 1. Show that P (1) is true.

Step 2. Let k be any positive integer. Show that if P (k) is true

then P (k + 1) is true.

Example 1.16. Use Mathematical Induction to prove that for any positive integer
n
1 + 3 + 5 + 7 + . . . + (2n − 1) = n2
Solution. Let P (n) be the statement

1 + 3 + 5 + 7 + . . . + (2n − 1) = n2 .

P (1) states that 1 = 12 which is correct. Hence P (1) holds.

Assume P (k) is true: 1 + 3 + 5 + 7 + . . . + (2k − 1) = k 2 . Then

1 + 3 + 5 + 7 + . . . + (2k − 1) + (2k + 1) = [1 + 3 + 5 + 7 + . . . + (2k − 1)] + 2k + 1.

Since P (k) holds,

1 + 3 + 5 + 7 + . . . + (2k − 1) + (2k + 1) = k 2 + 2k + 1 = (k + 1)2 .

This shows that P (k + 1) follows from P (k) .

By the principle of mathematical induction, P (n) is true for any positive integer
n .

Remark 1.4. In the above example, P (n) is a statement. It is not an algebraic

expression. It is not acceptable to write P (n) = n2 .

Example 1.17. Let {an } be the sequence defined by a1 = 1 and

an+1 = an + 8n , n ≥ 1 .

Prove by induction that an = (2n − 1)2 for all positive integers n .

Solution. Let P (n) be the statement: an = (2n − 1)2 .
P (1) is the statement: a1 = (2 − 1)2 . Since a1 = 1 , P (1) is true.
Discrete Mathematics Page 12

Assume P (k) is true: ak = (2k − 1)2 . Then

ak+1 = ak + 8k
= (2k − 1)2 + 8k
= 4k 2 + 4k + 1
= (2k + 1)2 .

Since 2(k + 1) − 1 = 2k + 1 we see that P (k + 1) follows from P (k) . By the

principle of mathematical induction, P (n) is true for any positive integer n .

To prove things by induction you should:

(a) State P (n) .

(b) Show that P (1) is true.

It is worth stressing that one needs to carry out ALL of the above steps. Indeed,
there are a number of pitfalls in using the principle of mathematical induction as
the next three examples show.

I. It does not suffice to prove only that P (1) is true. Indeed, let P (n) be the
statement: n2 = n . Then P (1) is true. But P (n) is true only when n = 1
or n = 0 and is false for n ≥ 2.

II. It does not suffice to prove only that if P (k) is true then P (k + 1) is true.
Indeed, let P (n) be the statement: n + 1 = n . Assume that P (k) holds so
that k + 1 = k . Adding 1 to each side gives k + 2 = k + 1 so P (k + 1) is
true. Of course, P (1) is false so we cannot get started.

III. Another typical error is to give a proof of P (k) implies P (k + 1) which only
works for some values of k , for example k ≥ 2 . Let P (n) be the statement:
any n students taking the same examination must all score the same mark.
Clearly P (1) is true.
Assume that P (k) is true and let any group of k + 1 candidates be given.
Let A and B be particular candidates and C the remaining group of k − 1
students. Then A and C form a group of k students so by P (k) they all
have the same mark. The same is true of B and C . Thus A and B and the
group C score the same mark so P (k + 1) is true. By induction, P (n) holds
for all n .
The mistake is that the group C must not be empty. If k = 1 then C is
empty and the marks of students from the groups A and C do not coincide.
We have proved that P (k) is true implies that P (k + 1) is true if k ≥ 2 but
there is no way of establishing P (2) which is false.
Page 13 Dr Adrian Turcanu & Dr Mateja Presern

Example 1.18. Let {an } be the sequence defined by a1 = 1 and

√
an+1 = 2 + an , n ≥ 1 .

Prove by induction that an ≤ 2 for all positive integers n .

Solution. Let P (n) be the statement: an ≤ 2 . Since a1 = 1 , P (1) is true.
Assume that P (k) holds. Thus ak ≤ 2. Then
√
ak+1 = 2 + ak
√
≤ 2+2
= 2,

that is, P (k + 1) is true. By induction P (n) is true for all n .

The next example uses the factorial function n! . The number 5! (read five
factorial or factorial 5) means 5 × 4 × 3 × 2 × 1 . Note that 1! = 1 and that by
convention 0! = 1 . The reader should check by calculation that 4! = 24 and use a
calculator to check that 7! = 5040 .

Example 1.19. Prove by induction that 2n < n! for n ≥ 4 .

Solution. Let P (n) be the statement: 2n < n! .
We first check that P (4) is true. This is correct since 4! = 24 > 16 = 24 .
Assume that k ≥ 4 and that P (k) holds so that 2k < k! . Then

2k+1 = 2 2k

< 2 (k!)
< (k + 1) k!
= (k + 1)!.

Hence 2k+1 < (k + 1)! so P (k + 1) is true.

By induction P (n) is true for all n .

1.4.1 Exercises

Exercise 1.4.1. Let P (n) be the statement 3n − 100 ≥ 0 . Write down P (1) ,
P (4) and P (40) . Which of them are true?
Discrete Mathematics Page 14

Exercise 1.4.2. Let {an } be the sequence defined by a1 = 3 and

an+1 = 5an − 8 , n ≥ 1 .

Let P (n) be the statement an = 5n−1 + 2 . Write down P (1) and P (2) . Which of
them are true?

Exercise 1.4.3. Prove by induction that 3n < n! for n ≥ 7 .

Exercise 1.4.4. Let {an } be the sequence defined by a1 = x and

a2n + 12
an+1 = , n≥1
7
where 3 < x < 4. Prove by induction that 3 < an < 4 for all n.

Exercise 1.4.5. For n ≥ 2 the sequence {sn } is given by by

1 1 1
sn = 1 − 1− ... 1 −
2 3 n

so that s2 = 12 and s3 = 31 · By computing more values of sn if needed, guess a

formula for sn and use induction to prove it.
Page 15 Dr Adrian Turcanu & Dr Mateja Presern

1.5 Relations
Let A and B be sets. A relation between A and B links certain elements of A with
certain elements of B. More formally, a relation from A to B is a subset of A × B.
A relation on the set A is a relation from A to A.

Here are some examples of relations:

1. Let A be the set of students at Heriot-Watt University and let B be the set
of modules. The relation R consists of pairs (a, b) , a ∈ A , b ∈ B with the
student a being enrolled to the course b.
2. Let R be the relation on the set of people consisting of pairs (a, b) where a is
a parent of b.
3. Let A = {1, 2, 3, 4} and B = {x, y} . Then
R = { (1, y), (2, x), (3, x), (4, y) }
is a relation from A to B.
4. The following are relations on the set of integers:
R1 = {(a, b)|a ≥ b},
R2 = {(a, b)|a − b = 4}.
5. Let A = {1, 2, 3, 4} and let R be the relation on A given by
R = {(a, b)|a divides b }.
Then R contains the pairs
(1, 1) (1, 2) (1, 3) (1, 4) (2, 2) (3, 3) (4, 4) (2, 4) .

If S is a (small) finite set, then we can represent a relation R on S by a directed

graph, a network of blobs and arrows. We have a blob for each element of S, and
draw an arrow from x to y if (x, y) ∈ R.
Let S = {a, b, c, d} and let R be the relation defined by the subset
{(a, a), (a, b), (b, b), (b, c), (c, b), (c, d), (d, d)}.
The directed graph encoding this relation is shown below.
Definition 1.20. Let R be a relation on a set A. We say that R is
reflexive if (a, a) ∈ R for all elements a in A;
symmetric if whenever (a, b) ∈ R then (b, a) ∈ R;
antisymmetric if whenever (a, b) ∈ R and (b, a) ∈ R then a = b;
transitive if whenever (a, b) ∈ R and (b, c) ∈ R then (a, c) ∈ R.
A description of the above properties in terms of graphs was given in class.
Discrete Mathematics Page 16

Example 1.21. As an example, consider the relation in item 5 above.

It is reflexive since it contains all pairs of the form (a, a) , namely,

(1, 1) (2, 2) (3, 3) (4, 4)

It is not symmetric since, for example, it contains (1, 2) but not (2, 1) .
It is antisymmetric since there is no pair of elements a and b with a ̸= b such
that (a, b) and (b, a) belong to R.
It is transitive. To check this we have to show that if (a, b) and (b, c) belong to
R, then so does (a, c) .

Example 1.22. As another example, let R be the relation on the set of people
consisting of pairs (a, b) where a likes b. Unfortunately it is not symmetric. It is
not antisymmetric or transitive. It is probably not even reflexive.

Definition 1.23. A relation that is reflexive, symmetric and transitive is called an

equivalence relation.

Example 1.24. Let R be the relation on the set A = {1, 3, 5, 9, 11, 18} defined by
the pairs (a, b) such that a − b is divisible by 4.

Given a set A with an equivalence relation R on it, we can break up all elements in
A into disjoint groups (subsets) such that within each group all elements are related
between themselves but no two elements from two different groups are related. Such
groups of elements are called equivalence classes. For Example 1.24 the equivalence
classes are C1 = {1, 5, 9}, C2 = {3, 11}, C3 = {18}.
Page 17 Dr Adrian Turcanu & Dr Mateja Presern

1.5.1 Closures of relations

Definition 1.25. Let R be a relation on a set A. The reflexive closure of R is the
smallest relation Rr on A that is reflexive and contains R.

Remark 1.5. To obtain Rr the reflexive closure of a relation R defined on a set A

we add all the pairs (a, a) with a inA, i.e. Rr = R ∪ {(a, a)|a ∈ A}.

Example 1.26. Let R = {(1, 1), (1, 2), (1, 3), (2, 3), (3, 4)} be a relation on the set
A = {1, 2, 3, 4}.
Then Rr = {(1, 1), (1, 2), (2, 2), (1, 3), (2, 3), (3, 3), (3, 4), (4, 4)} is the reflexive
closure of R.

Definition 1.27. Let R be a relation on a set A. The symmetric closure of R is

the smallest relation Rs on A that is symmetric and contains R.

Remark 1.6. To obtain Rs the symmetric closure of a relation R defined on a set

A, for each pair (a, b) ∈ R with a ̸= b, if (b, a) ∈
/ R then we add the pairs (b, a), i.e.
Rs = R ∪ {(b, a)|(a, b) ∈ R, a ̸= b}.

Example 1.28. Let R = {(1, 1), (1, 2), (1, 3), (2, 3), (3, 4)} be a relation on the set
A = {1, 2, 3, 4}.
Then Rs = {(1, 1), (1, 2), (2, 1), (1, 3), (3, 1), (2, 3), (3, 2), (3, 4), (4, 3)} is the sym-
metric closure of R.

Definition 1.29. Let R be a relation on a set A. The transitive closure of R is the

smallest relation Rt on A that is transitive and contains R.

Remark 1.7. To obtain Rt the transitive closure of a relation R defined on a set A,

for each pairs (a, b) ∈ R and (b, c) ∈ R, if (a, c) ∈
/ R then we add the pair (a, c), i.e.
Rt = R ∪ {(a, c)|(a, b) ∈ R and (b, c) ∈ R}.

Example 1.30. Let R = {(1, 1), (1, 2), (2, 3), (3, 2)} be a relation on the set A =
{1, 2, 3}.
Then Rt = {(1, 1), (1, 2), (2, 3), (1, 3), (3, 2), (2, 2), (3, 3)} is the transitive closure
of R.
Discrete Mathematics Page 18

1.5.2 Exercises

Exercise 1.5.1. Let A = {2, 3, 4, 5, 6} and let R be the relation on A given by

R = {(a, b)|a divides b }

Write down the ordered pairs in R.

Exercise 1.5.2. Let R be the relation on the set of real numbers consisting of
pairs (a, b) with a ≤ b. Determine whether the relation is reflexive, symmetric.
antisymmetric, or transitive.

Exercise 1.5.3. Let A = {1, 2, 3} and let R be the relation on A given by

R = {(1, 2), (2, 2), (2, 3)}

Find:
(a) the reflexive closure of R
(b) the symmetric closure of R
(c) the trasitive closure of R

1.6 Functions
Let X and Y be sets.
Definition 1.31. A function from X to Y is a rule that assigns to each element of
X exactly one element of Y . We write f : X → Y . The set X is called the domain
of f and the set Y is called the codomain of f .
Here are some examples of functions.

Example 1.32. f : R → R defined by the formula f (x) = x2 is a real-valued

function of the kind studied in calculus courses.

Example 1.33. Let A = {1, 5, 7}, B = {3, 8, 22, 30}. A function f : A → B is

defined as
f (1) = 22 , f (5) = 3 , f (7) = 22 .
Page 19 Dr Adrian Turcanu & Dr Mateja Presern

Example 1.34. Function f : N → N is defined so that f (1) = 1 and f (n + 1) =

(n + 1)f (n). The latter is called a recurrence relation. It allows one to compute
values of f starting from the one given explicitly. For example f (2) = f (1 + 1) =
2f (1) = 2 · 1 = 2. It is actually computing the factorial: f (n) = n!.

Example 1.35. The next example is called the McCarthy 91 function and was
given by John McCarthy, one of the founders of artificial intelligence.
Here f : N → Z is defined for positive integers n by

n − 10 if n ≥ 101
f (n) =
f (f (n + 11)) if n ≤ 100

If we try to find f (1) directly, we put n = 1 in the definition. This shows that f (1)
depends on f (12) which in turn depends on f (23) etc. It seems hopeless to find
f (1) in this way. The trick is to work backwards. Putting n = 100 in the definition
gives
f (100) = f (f (111)) = f (101) = 91
Putting n = 99 in the definition gives

f (99) = f (f (110)) = f (100) = 91 .

Continuing in this way we can show that f (n) = 91 for all n ≤ 100 .

If f : X → Y is any function, the set of images

Range(f ) := {y ∈ Y | y = f (x) for some x ∈ X}

is called the range of f .

In Example 1.32 the range of f is the set of all non-negative real numbers:
Range(f ) = {y ∈ R | y ≥ 0}. In Example 1.33 the range is the set Range(f ) =
{3, 22}. For the McCarthy 91 function (Example 1.35) the range is Range(f ) =
{y ∈ N | y ≥ 91}.

Definition 1.36. A function is called surjective if its range is equal to its codomain.

None of the above examples defines a surjective function. However it is easy to

construct functions from the above examples which are surjective by simply modi-
fying the codomain. For instance f (x) = x2 becomes surjective if we consider it as
a function from all real numbers into the non-negative real numbers: f : R → R≥0 .

Definition 1.37. A function is called injective if no two distinct elements of the

domain have the same image.
Discrete Mathematics Page 20

In the above only the factorial function of Example 1.34 is injective.

Given two functions f : A → B, g : B → C we can define their composition

g ◦ f : A → C by the rule g ◦ f (x) = g(f (x)). The notation g ◦ f needs to be read
from right to left: we first apply f then g.

Example 1.38. Let f : R → R be defined as f (x) = 2x2 and g : R → R is defined

as g(x) = 3x + 4 then

(f ◦ g)(x) = f (g(x)) = f (3x + 4) = 2(3x + 4)2 = 2(9x2 + 24x + 16) = 18x2 + 48x + 32

while
(g ◦ f )(x) = g(f (x)) = g(2x2 ) = 3(2x2 ) + 4 = 6x2 + 4

The identity function on a set A is the function i : A → A defined so that

i(x) = x.

Definition 1.39. Let f : A → B and g : B → A be functions. If g ◦ f : A → A is

the identity function on A and f ◦ g : B → B is the identity function on B then it
is said that f is the inverse of g (and g is the inverse of f ).

A function f that has an inverse is said to be invertible.

The following theorem is true.

Theorem 1.40. A function f is invertible if and only if it is surjective and injective.

Remark 1.8. A function that is both injective and surjective is often called bijective.

1.6.1 Exercises

Exercise 1.6.1. For each of the following functions find their range. Determine
whether the function is surjective and whether it is injective.

(a)(∗) f : R → R , f (x) = 2x + 1 (b) g : R → R , g(x) = x4 + 1 ,

(c)(∗) h : X → Y , h(s) = number of ones in s ,

(d) j : X → Y , j(s) = the first bit of s
where X be the set of all finite non-empty strings of bits and let Y be the set of all
non-negative integers.
Page 21 Dr Adrian Turcanu & Dr Mateja Presern

Exercise 1.6.2. Let the functions f , g and h have R as their domain and codomain,
and be defined as
(
1 if x ≥ 0
f (x) = 4x − 3 , g(x) = x2 + 1 , h(x) = .
0 if x < 0

Find rules for the following functions (with domain and codomain R)

(a) f ◦ f , (b) h ◦ f , (c) h ◦ g .

Exercise 1.6.3. Find the inverse function for each of the following functions, or
explain why no inverse exists.
√
(a) g : R → R , g(x) = x2 ,

(b) j : S → S
where S is the set of non-empty finite strings of lower case letters, and j(s) gives
a string obtained by moving the last character to the beginning of the string, e.g.
j(yncrfm) = myncrf.
Discrete Mathematics Page 22

1.7 The Sum and Product Rules

Consider the following examples of counting problems

Example 1.41. The mathematics teaching committee requires one student repre-
sentative from either the first year or the second year or the third year. If there
are 500 first year, 300 second year and 100 third year students, how many possible
different representatives are there?
Solution. There are obviously 500 different choices for picking a representative from
the first year students, 300 for the second and 100 for the third year. Since the
students are all different and we have to pick only one representative from either
year we can add the numbers of choices for a representative from each year to obtain
900 different possible representatives.

The solution to the above problem follows form a general rule.

The sum rule

If there are n(A) ways to do A and, distinct from them, n(B) ways
to do B, then the number of ways to do A or B is n(A) + n(B).
Similarly, one adds n(A), n(B), n(C) when one must do A or B or C;
and so on.

Consider a different problem.

Example 1.42. The mathematics student committee is composed of three students:

one from each of the first year, the second year and the third year. If there are 500
first year, 300 second year and 100 third year students, how many possible different
committees are there?
Solution. There are 500 choices for a representative from the first year students.
For each of those representatives there are 300 ways to pick a representative from
the second year students. So we have 500 × 300 = 150, 000 ways to pick the two
representatives from the first and second year students. For each choice of those (for
each pair of representatives) we have 100 choices for a third year representatives. In
total this gives us 150, 000 × 100 = 15, 000, 000 different possible committees. Thus
in this problem we had to multiply through the numbers for each respective choice:
500 × 300 × 100.

The solution to the above problem follows form a general rule.

Page 23 Dr Adrian Turcanu & Dr Mateja Presern

The product rule

If there are n(A) ways to do A and n(B) ways to do B, then the

number of ways to do A and B is n(A) × n(B). It is assumed that
A and B are independent; that is, the number of choices in B is the
same regardless of which choice in A is taken. Similarly, one multiplies
theree terms: n(A) × n(B) × n(C) if one has to do choices in A, B,
and C independently; and so on.

To solve the following example we need to use both of the above rules.

Example 1.43. In a certain computer system a file name must be a string of letters
and digits which is 1,2, or 3 symbols long. The first symbol must be a letter and
the system does not make any distinction between uppercase and lowercase letters.
How many legitimate file names are there?
Solution. A file name either has one symbol, or two symbols or three symbols. In
each case we have 26 choices for the first letter (the 26 letters of the alphabet).
The number of one-symbol names is then 26. The number of 2-symbol names is
26 × 36 = 936 because for the second symbol we can have either a letter or a
digit, 36 = 26 + 10 choices. We multiply the number of choices for the first and
for the second symbol because both choices are done independently. Similarly the
number of three-symbol names is 26 × 36 × 36 = 33696. Summing the numbers for
the three different cases (by the sum rule) we obtain that the system at hand can
accommodate 26 + 936 + 33696 = 34658 different files.

1.7.1 Exercises

Exercise 1.7.1. A binary word is made of 0’s and 1’s. How many binary words are
there with length exactly 7? How many binary words are there with length up to
3?

Exercise 1.7.2. A salesman is to visit 5 towns. Assuming that he goes to one town
after another, visiting all of them once, in how many orders can he make his trip?

Exercise 1.7.3. How many numbers between 1 and 1000 have exactly one 7 in
them?
Discrete Mathematics Page 24

1.8 Permutations and Combinations

Permutations and combinations are the basic building blocks of combinatorial math-
ematics.

Definition 1.44. A permutation of n objects taken k at a time is any ordered

selection of k distinct objects from a given set of n objects.

Note that the order in which the selection is made matters. Thus for a set of 5
letters {B, S, W, K, L} the lists {B, W, K}, {W, B, K} are two different permutations
of the 5 given letters taken 3 at a time.
Repetitions are not allowed. Thus for the above example a list {B, B, L} is not
a permutation. We will often say “ordered selection without repetitions” instead of
“permutations” as this is self-descriptive.

Example 1.45. In how many ways can a committee with 10 members select a
chairperson, a treasurer and an administrator?
Solution. Suppose that the first person to be selected is to be the chairperson. There
are 10 different was to do it. Suppose the second person selected is the treasurer for
whom we have 9 choices. Finally, suppose the last person chosen is the administrator,
for whom there are 8 possible people available. By the product rule the number of
selections is 10 × 9 × 8 = 720.

In the above example we had to make an ordered selection of 3 distinct people

from a group of 10 people. For general permutations the following theorem holds.

Theorem 1.46. The number of ordered selections of k distinct objects from a set
of n objects is n(n − 1)(n − 2) . . . (n − k + 2)(n − k + 1).

Remark 1.9. The above number can be alternatively written as

n!
n(n − 1)(n − 2) . . . (n − k + 2)(n − k + 1) = (1.1)
(n − k)!

where we used the factorial notation

n! = n × (n − 1) × (n − 2) × · · · × 3 × 2 × 1

Note that by convention one defines

0! = 1 .

With this convention the expression on the right hand side in (1.1) gives the same
result for n = k as the expression on the left hand side.
Page 25 Dr Adrian Turcanu & Dr Mateja Presern

Proof of Theorem 1.46. There are n ways to select the first object. After the first
object is selected there are n − 1 objects left and there are respectively n − 1 choices
for the second object. We continue selecting until the last object for which there
are n − k + 1 choices. By the product rule these selections can be made in n(n −
1)(n − 2) . . . (n − k + 1) ways.
Remark 1.10. Observe that using formula (1.1) for k = n we find that the number
of different ways to order n objects is n!.
Consider now a different kind of selections called combinations.

Definition 1.47. A combination of n objects taken k at a time is any selection of

k distinct objects from a given set of n objects.
In this case the order of objects selected does not matter. We will often say
“unordered selection without repetitions” rather than “combination”.
Theorem 1.48. The number of unordered selections of k objects from a set of n
objects without repetitions is
n(n − 1) . . . (n − k + 1) n!
= (1.2)
k! (n − k)!k!
The combination of factorials on the right hand side is called a binomial coefficient
and is denoted
n n!
:= . (1.3)
k (n − k)!k!
Proof. We already know that the number of ordered selections is given by formula
(1.1). To pick an ordered selection of k elements one can first make an unordered
selection of k elements and then choose an order. We know that there are k! different
ways to order k elements. Thus by the product rule

n! n
= × k!
(n − k)! k

where nk stands for the number of unordered selections. Dividing both sides by k!

we obtain formula (1.2).

The binomial coefficients (1.3) appear in the binomial formula that gives coeffi-
cients for an expansion of a power
n
n
X n k n−k
(x + y) = x y . (1.4)
k=0
k

For example

5 5 0 5 5 1 4 5 2 3 5 3 2 5 4 1 5 5 0
(x + y) = xy + xy + xy + xy + xy + xy
0 1 2 3 4 5
= y 5 + 5xy 4 + 10x2 y 3 + 10x3 y 2 + 5x4 y + x5 .
Discrete Mathematics Page 26

Example 1.49. In the card game bridge, a hand has 13 cards. There are 52 cards
in a deck. How many different bridge hands are there?
Solution. We have to make an unordered selection of 13 cards out of 52. Repetitions
are not allowed. By the above theorem there are 52 13
= 635013559600 different
bridge hands.

Example 1.50. A full house in poker is a collection of 5 cards in which 3 of them

are from one denomination and 2 from another. In a pack of cards there are 13
denominations (2,3,...,queen,king,ace) and 4 cards of each. How many different full
houses are there?
Solution. The order in which we pick cards for a full house does not matter so we
can specify a convenient order. Let us first pick a denomination for three cards.
There are 13 choices. Next we have to pick 3 cards out of 4 in this denomination.
Since the order does not matter we have 43 = 4 choices. Next let us pick the second
denomination. We have 12 choices for it. To pick 2 cards out of 4 in the chosen
4
denomination we have 2 = 6 choices. Finally by the product rule we multiply
through the choices to obtain 13 × 4 × 12 × 6 = 3744 — the number of different full
houses.

1.8.1 Exercises

Exercise 1.8.1. What is the number of ways in which a subset of one or two
elements can be picked up from a set of n elements?

Exercise 1.8.2. A university department has 12 male professors and 3 female pro-
fessors. It is decided that every committee of professors in this department should
have at least one female member. How many different committees of 12 people can
be formed?

Exercise 1.8.3. The computer science students are entering a team of 5 in the
university relay race. There are 20 undergraduates and 10 postgraduates. How
many teams are possible if
(a)(∗) any student is eligible
Page 27 Dr Adrian Turcanu & Dr Mateja Presern

(b) there must be 2 postgraduates and 3 undergraduates

Exercise 1.8.4. How many ways are there to pick a combination of k distinct
numbers from {1, 2, . . . , n} if 1 and 2 cannot both be picked?

1.9 The Inclusion-Exclusion Principle

If a set A contains a finite number of elements then it is called a finite set and we
write |A| for the number of elements in it. An infinite set is a set that is not finite.
The number |A| is called the cardinality of A. Hence |{x, y, z}| = 3 and the set of
positive integers is infinite.

Inclusion-Exclusion Principle

If A and B are finite sets, then

|A ∪ B| = |A| + |B| − |A ∩ B|. (1.5)

For instance, suppose we want to find the number of cards in an ordinary deck
that are clubs or aces. Let A be the set of cards that are aces and let C be the set
of cards that are clubs.
We have |A| = 4 , |C| = 13 and |A ∩ C| = 1 (the ace of clubs). Then the
Exclusion Principle gives us
|A ∪ C| = |A| + |C| − |A ∩ C| = 4 + 13 − 1 = 16.

Example 1.51. A survey in Scotland finds that 86% of the population have heard
of product A while 80% have heard of product B. Also, 70% of the population
have heard about both products. What percentage of the population have not
heard about either product?
Solution. Let A be the set of people who have heard about product A and define
B similarly. Let N be the population of Scotland.
From the given data, |A| = (0.86)N , |B| = (0.8)N and |A ∩ B| = (0.7)N .
Then

|A ∪ B| = |A| + |B| − |A ∩ B| = (0.86)N + (0.8)N − (0.7)N = (0.96)N.

Hence 96% of the population have heard of either product A or product B and 4%
have not heard about either product.
Discrete Mathematics Page 28

The Inclusion-Exclusion Principle for the union of three sets reads

|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C| (1.6)

In the lectures I will indicate why (1.6) holds by means of a Venn diagram.

Example 1.52. How many integers from 1 to 100 are divisible by 2 or 3 or 5?

Solution. U = {1, 2, . . . , 100} , A = {n ∈ U | 2 divides n}
B = {n ∈ U | 3 divides n} , C = {n ∈ U | 5 divides n}
We need to find |A ∪ B ∪ C|.
We have that |A| = 100/2 = 50 , |C| = 100/5 = 20 , and |B| is the integer part of
100/3 , that is |B| = 33 .
For a number to be in the set A ∩ B it must be divisible by 6. Hence |A ∩ B| = 16 .
Numbers in the set A ∩ B ∩ C are divisible by 30. Hence |A ∩ B ∩ C| = 3 .
Similarly, |A ∩ C| = 10 and |B ∩ C| = 6 so

|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|

and
|A ∪ B ∪ C| = 50 + 33 + 20 − 16 − 10 − 6 + 3 = 74.

Example 1.53. At the University of Scabia all first year computer science students
have to study at least one of the three modules : accounting, ethics and mathematics.
There are 237 first year computer science students. Also, 150 study maths, 120
study accounting, 100 study ethics, 50 study both accounting and maths, 20 study
both accounting and ethics and 70 study both maths and ethics.

(a) Find the number who study all three subjects.

(b) Find the number who study accounting and maths but not ethics.

Solution. Let A be the set of students studying accounting, M those doing maths
and E those doing ethics. From the given data,

|A| = 120 |E| = 100 |M | = 150

|A ∩ M | = 50 |A ∩ E| = 20 |E ∩ M | = 70
Also, since everyone is studying at least one of the three modules,

|A ∪ E ∪ M | = 237
Page 29 Dr Adrian Turcanu & Dr Mateja Presern

Using

|A ∪ E ∪ M | = |A| + |E| + |M | − |A ∩ E| − |A ∩ M | − |E ∩ M | + |A ∩ E ∩ M |

237 = 120 + 100 + 150 − 20 − 50 − 70 + |A ∩ E ∩ M |

|A ∩ E ∩ M | = 7
Hence there 7 students who study all three subjects.
The set of students not studying ethics is E .
Hence the number who study accounting and maths but not ethics is A ∩ M ∩ E .
Let X = A ∩ M ∩ E and Y = A ∩ M ∩ E . Then

X ∩Y =∅ X ∪Y =A∩M

By (1.5) , |X ∪ Y | = |X| + |Y | − |X ∩ Y | so

|A ∩ M | = |A ∩ M ∩ E| + |A ∩ M ∩ E|

Hence
50 = 7 + |A ∩ M ∩ E|
and |A ∩ M ∩ E| = 43 .

1.9.1 Exercises

Exercise 1.9.1. A and B are sets with |A| = 30 and |B| = 20 . Find A ∪ B if

(a) A ∩ B = ∅ ,

(b) |A ∩ B| = 5 ,

Exercise 1.9.2. A , B and C are sets with |A| = |B| = |C| = 100 .
Find |A ∪ B ∪ C| in the following cases.
(a) A = B = C .
(b) A ∩ B = A ∩ C = B ∩ C = ∅ .
(c) |A ∩ B| = |A ∩ C| = |B ∩ C| = 50 and A ∩ B ∩ C = ∅ .
(d) |A ∩ B| = |A ∩ C| = |B ∩ C| = 50 and |A ∩ B ∩ C| = 20 .
Discrete Mathematics Page 30

Exercise 1.9.3. How many integers from 1 to 1000 are divisible by 7 or 11?

Exercise 1.9.4. How many integers from 1 to 200 are divisible by 2 or 5 or 7?

Exercise 1.9.5. At the University of Scabia all first year students have to study
at least one of the three modules : computing, physics and mathematics. There
are 1400 first year students. Also, 700 study maths, 600 study physics, 500
study computing, 200 study both physics and maths, 150 study both physics and
computing and 120 study both maths and computing.

(a) Find the number who study all three subjects.

(b) Find the number who study computing and maths but not physics.
Chapter 2

Probability

2.1 Probability Space

Probability theory provides mathematical models for understanding uncertain sit-
uations. We call such situations random experiments. Some examples of random
experiments are:

flipping of a coin;

rolling a die;

next week’s lottery draw;

computer software is designed to process random amounts of data customers

will put in: for a random input data measuring the amount of time needed to
process it by the program is a random experiment;

the top link in a Google search (it will either lead to the desired information
or it won’t);

generating a number using a C ++ random number function “rand( )”.

A probability model, although it cannot be used to determine (in advance) the

outcome of a random experiment, should capture the essential features of the random
experiment. Thus it has to be consistent with any observable “statistical regularity”
displayed by the random phenomenon.

A probability model for a random experiment has two ingredients: a sample

space and a probability function.

Definition 2.1. The sample space S for a random experiment is the set of all
possible outcomes of the experiment.

31
Discrete Mathematics Page 32

Here are some examples of sample spaces:

Experiment Sample space

Flip a coin once S = {H, T }

Roll a die once S = {1, 2, 3, 4, 5, 6}

Flip a coin repeatedly until you obtain a “head” S = {H, T H, T T H, T T T H, . . . }

The sample spaces in these examples are all discrete (the elements can be counted)
with the first two spaces being finite and the last one infinite. In applications a sam-
ple space can be also a continuous set. In this course we will only deal with discrete
sample spaces.
Definition 2.2. Any subset of a sample space is called an event.
Consider for example the rolling a die experiment. The sample space is S =
{1, 2, 3, 4, 5, 6}. The subset A = {2, 4, 6} is the event that the outcome of the roll is
even. The subset {1, 2, 3, 4} is the event that the outcome is at most 4. A∩B = {2, 4}
is the event that outcome is even and at most 4. A ∪ B = {1, 2, 3, 4, 6} is the event
that the outcome is even or at most 4. Note that any element of the sample space S
can be viewed as a subset and thus also represents an event. For example in rolling
a die model C = {2} is the event that the outcome is 2.

Example 2.3. Suggest suitable sample spaces, and identify the subset correspond-
ing to the event A, for the following situations:
(a) A coin, which shows heads (H) or tails (T), is tossed three times; A is the event
that the coin shows head twice.
(b) A game of football is played; A is the event that the match is drawn.
(c) A couple have two children; A is the event that both are girls.
Solution. (a) We can take
S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }
and
A = {HHT, HT H, T HH}.
(b) One option is to take
S = {(n, m) | n, m are integers ≥ 0}
where n is the first team’s score and m is the second team’s score. In this case
A = {(n, m) | n = m}.
Alternatively we can consider S = {W, D, L} with W meaning the first team
wins, D — it’s a draw, L — first team loses. In this case A = {D}.
Page 33 Dr Adrian Turcanu & Dr Mateja Presern

(c) We can take S = {GG, GB, BG, BB} where, for example, GB means first child
is a girl, second child is a boy. In this case, A = {GG}. Alternatively we can
take S = {0, 1, 2} with each sample point corresponding to number of girls born.
In this case, A = {2}.

The second ingredient in a probability model is the probability function.

Definition 2.4. The probability function is a function P (E) that assigns a value
between 0 and 1 to any event E ⊂ S. The value 0 ≤ P (E) ≤ 1 is called the
probability of event E.

P (E) indicates, on a scale from 0 to 1, how “likely” it is that the outcome of

the random experiment will be some element of the event E ⊂ S. P (E) = 0 means
that it is impossible that the outcome will be in set E. P (E) = 1 means that it is
certain that the outcome will be in set E.
The values that a probability function takes have to satisfy certain consistency
requirements, formulated below.

Probability axioms

1. For all subsets E ⊂ S of the sample space 0 ≤ P (E) ≤ 1;

2. P (∅) = 0, P (S) = 1;

3. For any two disjoint subsets: A, B ⊂ S, A ∩ B = ∅ we have

P (A ∪ B) = P (A) + P (B) .

Axiom 3 can be extended to the case where one has more than two events: if the
events E1 , E2 , . . . En are mutually exclusive, that is Ei ∩ Ej = ∅ for any i ̸= j, then

P (E1 ∪ E2 ∪ · · · ∪ En ) = P (E1 ) + P (E2 ) + · · · + P (En ) .

In particular for any finite subset E the probability P (E) is given by the sum of
probabilities assigned to its individual points.
This gives a recipe for specifying and computing a probability function. By
Axiom 2 the sum of the probabilities assigned to all sample points must by equal to
1:
n
X
for S = {1, 2, 3, . . . , n} one has pi = 1, (2.1)
i=1

where we used the notation pi = P ({i}). To specify a probability space it suffices

to specify the values pi ≥ 0 for each of its sample point satisfying (2.1).
Discrete Mathematics Page 34

Example 2.5. Consider again the experiment in which we roll a die. The sample
space is S = {1, 2, 3, 4, 5, 6}. If we think that the die is fair, that is all possibilities
are equally likely, then we should set p1 = p2 = ... = p6 = 16 . Then condition (2.1)
is satisfied.

In general for a finite sample space S with N elements one can define an equiprob-
able probability function by assigning pi = N1 for each sample point i ∈ S. This
means that all outcomes are equally likely.

WARNING: not all probability functions on finite sets are equiprobable!

For any event E we call its complement E = S \ E an event “not E” (any other
outcome but E). The following theorem holds.

Theorem 2.6. For any event E ⊂ S we have

P (E) = 1 − P (E) . (2.2)
Proof. The sample space S can be represented as a union of two disjoint sets: S =
E ∪ E. Therefore by Axioms 2 and 3: 1 = P (S) = P (E) + P (Ē). Now subtract
P (E) from both sides to obtain (2.2).
Axiom 3 tells us about the probability of a union of two sets when those sets are
disjoint. A natural question arises — what can be said about P (A ∪ B) when A and
B are not disjoint? The answer is given by the following theorem.

Theorem 2.7. For arbitrary events A, B ⊂ S we have

P (A ∪ B) = P (A) + P (B) − P (A ∩ B) . (2.3)
Proof. From the Venn diagram it can be seen that we can represent each set A, B,
A ∪ B in terms of disjoint unions:
A = (A \ B) ∪ (A ∩ B) , B = (B \ A) ∪ (A ∩ B) ,
A ∪ B = (A \ B) ∪ (A ∩ B) ∪ (B \ A) .
By axiom 3 we have
P (A) = P (A \ B) + P (A ∩ B) , P (B) = P (B \ A) + P (A ∩ B) ,
P (A ∪ B) = P (A \ B) + P (A ∩ B) + P (B \ A)
Adding together the first two equations and subtracting the third one we obtain
P (A) + P (B) − P (A ∪ B) = [P (A \ B) + P (A ∩ B)] + [P (B \ A) + P (A ∩ B)]
−[P (A \ B) + P (A ∩ B) + P (B \ A)]
= P (A ∩ B)
and (2.3) follows by rearranging the terms.
Page 35 Dr Adrian Turcanu & Dr Mateja Presern

Note the similarity of the statement of last theorem to the Inclusion-Exclusion

Principle. The similarity extends even further. We will be using the following
theorem.

Theorem 2.8. For any three events A, B, and C we have

P (A∪B∪C) = P (A)+P (B)+P (C)−P (A∩B)−P (A∩C)−P (B∩C)+P (A∩B∩C) .

The proof uses the same idea as in the proof of Theorem 2.7. You are warmly
encouraged to try and write it down yourself as an exercise!

Example 2.9 (*). If P (A) = 2/3 and P (B) = 3/4 what are the largest and the
smallest values that P (A ∩ B) can take?
Solution. The intersection A ∩ B is a subset of A and also is a subset of B. Thus
its probability has to be smaller than P (A) and smaller than P (B). So P (A ∩ B) ≤
2
3
= P (A) - the smaller of the two. Moreover it may take that value if A ⊂ B in
which case P (A ∩ B) = P (A) = 23 . Thus the largest P (A ∩ B) can be is 2/3. Next
consider the equality: P (A ∩ B) = P (A) + P (B) − P (A ∪ B) = 1712
− P (A ∪ B). The
17 5
largest P (A ∪ B) can be is 1. Thus the smallest P (A ∩ B) can be is 12 − 1 = 12 .

Example 2.10. One often hears that the theory of probability started in the sev-
enteenth century, when a French nobleman, the Chevalier de Méré, proposed the
following problem in 1654 to his friend Pascal: Why is one more likely to obtain a
6 in four throws of a die than to obtain a double 6 in 24 throws of two dice? This
problem is known as de Méré’s paradox. We use the word paradox, because, based
on the fact that there are 6 possible results when we roll a die and 36 possible results
when we roll two dice, some people thought that the two events above should have
the same probability. Indeed, notice that the number of throws, divided by the num-
ber of possible results, is equal to 2/3 in both cases (4/6 = 24/36 = 2/3 = 0.66...).
Nowadays, we can easily compute the probability of each event. We find that the
probability of obtaining at least one 6 in four rolls of a (fair or non-biased) die is
1 − (5/6)4 = 671/1296 ≈ 0.5177, while the probability of getting at least a double
6 in throwing two dice 24 times is 1 − (35/36)24 ≈ 0.4914. One can conclude that
the Chevalier de Méré must have spent a lot of time throwing dice to discover such
a small difference.
Discrete Mathematics Page 36

2.1.1 Exercises

Exercise 2.1.1. In the following experiments decide what are the outcomes and
state if they are equally likely.

(a) A red, blue and a yellow ball are put into a bag and a ball drawn out at random.

(b) Two yellow balls, one red ball and one blue are in a bag. One ball is drawn at
random.

(c) Same collection of balls. Two balls are drawn at once.

(d) Two children throw a die to see who starts the game. If the die shows a prime
number Joe starts, otherwise Ellen starts.

Exercise 2.1.2. Prove that if A ⊂ B then P (A) ≤ P (B).

Exercise 2.1.3. A weather forecaster assigns probabilities as follows:

P (A) = P ( rain today ) = 30% , P (B) = P ( rain tomorrow ) = 40% ,

P (C) = P ( rain today and tomorrow ) = 20% .

What is the probability that it will rain tomorrow but not today?

Exercise 2.1.4. Events A and B are such that P (A) = 0.75, P (B) = 0.65. Explain
why events A and B cannot be mutually exclusive. (Two events A and B are called
mutually exclusive if A ∩ B = ∅.) What can you say about P (A ∪ B) and P (A ∩ B)?
Page 37 Dr Adrian Turcanu & Dr Mateja Presern

2.2 Conditional Probability and Independence

Consider the following example.

Example 2.11 (*). A fair coin is tossed twice. Let A be the event that at least one
head is obtained. Suppose at the end of experiment we are told that A has occurred
(but we have not seen the actual result). What are the probabilities of other events?
Solution. In light of the occurrence of A we should revise the probabilities of other
events. We define a new conditional probability function P (·|A) given that event A
has occurred:
Sample point P (·) P (·|A)
1 1
HH
4 3
1 1
TH
4 3
1 1
HT
4 3
1
TT 0
4
How did we get the revised probabilities given that A has occurred? Sample
points outside A (the point TT) are assigned probability 0 (since we know that A has
occurred). Sample points inside event A have their original probabilities multiplied
by a constant in such a way that they still add up to 1. Original probabilities total
to 3/4 so if we divided each by 3/4 we get revised probabilities which sum to 1.

Note that 3/4 — the factor we had to divide by — is P (A). This is not a
coincidence. The construction of conditional probability follows the following general
rule
P (B ∩ A)
P (B|A) = . (2.4)
P (A)
(It is assumed that P (A) ̸= 0.)

Example 2.12 (*). A fair coin is tossed three times. What is the probability that
the first toss was tails given that at least one head is obtained?
Solution. We have that A = {HHH, HHT, HT H, HT T, T HH, T HT, T T H} is the
event that at least one head has occurred and B = {T HH, T HT, T T H, T T T } is the
event that the first toss was tails. Their intersection is A∩B = {T HH, T HT, T T H}.
Since each point has probability 1/8 we have P (A) = 7/8, P (A ∩ B) = 3/8. Hence
by the above general formula
P (B ∩ A) 3/8 3
P (B|A) = = = .
P (A) 7/8 7
Discrete Mathematics Page 38

Example 2.13 (*). An urn contains six red balls and three blue balls. Three balls
are drawn at a time from the urn (without replacement). What is the probability
that the first ball is red? How is your answer modified if you are supplied an
additional information that the last two balls are red?
Solution. First we should define a sample space for this model. Let us enumerate
the balls so we can distinguish them: balls numbered from 1 to 6 are red and the
balls numbered 7,8,9 are blue. Then the sample points are triples of distinct numbers
each from 1 to 9 corresponding to the three draws. The total number of points in the
sample space is |S| = 9×8×7 = 504. There is no reason to suppose that one sample
point is more likely than another so we set the probability of each sample point to
be 1/504. Let A1 be the event that the first ball is red. We have |A1 | = 6 × 8 × 7
(we have 6 choices for the red ball at the first draw, 8 choices for the 2nd draw since
one ball was already picked, and 7 choices for the 3rd draw). Therefore the answer
to the first question is

|A1 | 6×8×7 6 2
P (A1 ) = = = = .
|S| 9×8×7 9 3

Now let B be the event that the last two balls are red. Then A1 ∩ B is the event
that all three balls are red. We have |B| = 6 × 5 × 7 and |A1 ∩ B| = 6 × 5 × 4 = 120.
Therefore
P (A1 ∩ B) |A1 ∩ B| 6×5×4 4
P (A1 |B) = = = = .
P (B) |B| 6×5×7 7

The idea of events being unrelated is expressed in probability theory as indepen-

dence of the two events.

Definition 2.14. We say that two events A and B are independent if the occurrence
of B has no effect on the probability of A. So

P (A|B) = P (A) . (2.5)

It follows from formula (2.4) (CHECK THIS!) that the events A and B are
independent if and only if

P (A ∩ B) = P (A)P (B) . (2.6)

The criteria of independence stated in this form is manifestly symmetric with respect
to the roles of A and B. If B has no effect on A then A has no effect on B so that

P (B|A) = P (B)

holds as well.
Page 39 Dr Adrian Turcanu & Dr Mateja Presern

Example 2.15 (*). A coin is tossed 3 times. Let the event A be that three heads are
obtained and the event B be that the first toss is a head. Are A and B independent
events?
Solution. We have
1 4 1 1
P (A) = , P (B) = = , P (A ∩ B) = P (A) = .
8 8 2 8
Then
1 1
P (A)P (B) = ̸= P (A ∩ B) = .
16 8
Hence, the events are not independent.

Example 2.16 (*). In a class there are 4 left-handed men, 6 left-handed women
and 6 right-handed men. How many right-handed women must be present if sex
and handedness are to be independent when a student is selected at random?
Solution. Let A be an event that a student is a woman, B the event that the student
is left-handed and x the number of right-handed women students in the class. So
the number of women in the class is 6 + x, the number of left-handed students is
10 and the total number of students is 16 + x. Using formula (2.6) we obtain an
equation

6+x 10 6
P (A)P (B) = = P (A ∩ B) =
16 + x 16 + x 16 + x

which is equivalent to
10(6 + x) = 6(16 + x)
and the solution is x = 9. The class must have 9 right-handed women.

2.2.1 Exercises

Exercise 2.2.1. If P (A) = 2/3 and P (B) = 3/4, what is the largest and smallest
P (B|A) can be? Answer the same question for P (A|B).
Discrete Mathematics Page 40

Exercise 2.2.2. The following questions may shed some light on why many peo-
ple believe that past performance of coin-tossing (or repeatable random events in
general) affects future performance. In the answers define the sample space and
probability on it first.

(a) A fair coin was tossed 6 times and heads came up (exactly) 4 times. If the coin
is tossed again, what is the probability that the seventh toss will be a head?

(b) A fair coin is tossed 6 times and heads came up (exactly) 4 times. What is the
probability that the 6th toss was a head?

(c) A fair coin is tossed 6 times and heads came up (exactly) 4 times. It is also
known that 2 of the first 3 tosses were heads. What is the probability that the
6th toss was a head?

Exercise 2.2.3. A box contains 10 items of type I, of which 3 are defective, and
20 items of type II, of which 5 are defective. An item is chosen at random from
the 30 items in the box. Let A be the event that the item is of type I and B be
the event that the item is defective. Find the following probabilities: P (A), P (Ā),
P (B), P (A ∩ B), P (A ∪ B), P (A|B), P (B|A). Are A and B mutually exclusive?
Are A and B independent?
Chapter 3

Recurrence relations

3.1 Introduction
What are recurrence relations? We have seen a few examples already in previous
chapters; here are some additional examples:

an = 2an−1 + 1 , n ≥ 2 , (3.1)

an = 5an−1 − 6an−2 , n ≥ 3 . (3.2)

By looking at (3.1), we realise that we need one initial value to find an for all
n ≥ 2 recursively. For example, if a1 = 1 then a2 = 3 and a3 = 7.
By looking at (3.2), we realise that we need two initial values to find an for all
n ≥ 3 recursively. For example, if a1 = 1 and a2 = 5 then a3 = 19 and a4 = 65.

Example 3.1. Verify that the solution of (3.1) with a1 = 1 is an = 2n − 1 .

Solution. We have to do two things

(a) Check that the given formula gives the correct initial value.

(b) Check that the given formula solves (3.1).

Putting n = 1 in an = 2n − 1 gives a1 = 2 − 1 = 1 , so (a) is completed.

To do (b) we evaluate 2an−1 + 1 using the given formula and show that it is
equal to an .
Now an−1 = 2n−1 − 1 so

2an−1 + 1 = 2 2n−1 − 1 + 1 = 2n − 1

so an = 2an−1 + 1 .

Recurrence relations have many applications.

41
Discrete Mathematics Page 42

Example 3.2. Suppose that you put £100 into a savings account yielding 4%
compounded annually. Let an be the amount (in pounds) in the account after n
years.
Then an is equal to the amount in the account after n − 1 years plus the interest
for the nth year. For example, a1 is equal to 100 plus the interest which is 4. Hence
a1 = 104.
In general, an = an−1 + (0.04)an−1 so that

an = (1.04)an−1 , n ≥ 1

with a0 = 100.

Solving this we obtain

a1 = 100(1.04)
a2 = (1.04)a1 = 100(1.04)2
a3 = (1.04)a2 = 100(1.04)3
and in general
an = 100(1.04)n

Example 3.3. The tower of Hanoi is a puzzle consisting of three pegs mounted on
a board and n discs of different sizes, The picture in Figure 1 shows the n = 3 case.

Fig 1

A
B

Peg 1 Peg 2 Peg 3

The object is to transfer the tower to one of the other pegs, moving only one
disc at a time and never moving a larger disc onto a smaller disc.
Page 43 Dr Adrian Turcanu & Dr Mateja Presern

Let an be the minimum number of moves needed to solve the puzzle with n discs.
Clearly a1 = 1 and a2 = 3.

For the n = 3 case we use the moves

A to peg 2 , B to peg 3 , A to peg 3 , C to peg 2 ,

A to peg 1 , B to peg 2 , A to peg 2 .

This shows that a3 = 7.

For the n disc problem we will derive a recurrence relation for an . We first move
the top n − 1 discs to peg 2. This takes an−1 moves. We transfer the largest disc
to peg 3 ; this takes one move. Another an−1 moves transfers the n − 1 discs on
peg 2 to peg 3. This shows that

an = 2an−1 + 1

The initial condition is a1 = 1. We proved in Problem set 2, Question 8 and again

in Example 3.1 that the solution is an = 2n − 1 .

Example 3.4. I can climb up stairs by taking either one step or two steps at a
time. Let an be the number of ways I can climb n steps.

(a) Find a1 and a2 .

(b) Find a recurrence relation for an and use it to find a7 .

Solution. Clearly a1 = 1 and a2 = 2. One can get to the n th step by either

climbing up 1 step from the (n − 1) th step, or by

climbing up 2 steps from the (n − 2) th step.

There are an−1 ways of reaching the (n − 1) th stair and an−2 ways of reaching the
(n − 2) th stair. Hence an = an−1 + an−2 . Using a1 = 1 and a2 = 2, we obtain
a7 = 21.
Discrete Mathematics Page 44

3.1.1 Exercises

Exercise 3.1.1. Verify that the solution of

an = 5an−1 − 12

with a1 = 13 is an = 2(5)n + 3 .

Exercise 3.1.2. Let xn satisfy the recurrence relation

2n xn−1
xn =
n+1
with x1 = 1 .

(a) Let an = (n + 1) xn . Show that an = 2an−1 with a1 = 2.

(b) Solve the recurrence relation for an . Hence find xn .

Page 45 Dr Adrian Turcanu & Dr Mateja Presern

3.2 Linear Homogeneous Recurrence Relations

The following are examples of linear homogeneous recurrence relations:

an = 5an−1 , (3.3)

an = 7an−1 − 12an−2 . (3.4)

They are linear since terms such as a2n−1 are not present (no powers). They are
homogeneous since all the terms are multiples of an−j for some j.

Fact 3.5. The general solution of

an = p an−1

is
an = c pn ,
where c is a constant determined by the initial condition.

Example 3.6. Find the solution of (3.3) with a1 = 20.

Solution. The general solution is an = c 5n .
Since a1 = 20 ,
5c = 20
Hence c = 4 and an = 4 (5)n .

Solving recurrence relations of the form (3.4) is somewhat trickier. Suppose

an = rn is a solution of (3.4). Then

rn = 7rn−1 − 12rn−2

Dividing by rn−2 , we obtain

r2 = 7r − 12
This quadratic equation has the solutions r = 3 and r = 4 . This tells us that
an = 3n and an = 4n are solutions of (3.4). Hence, the general solution of (3.4) is

an = c 3n + d 4n

where c and d are constants.

The above argument can be applied to a general equation with the same structure
as (3.4) to obtain the following.

Fact 3.7. The general solution of the recurrence relation

an = pan−1 + qan−2 (3.5)

Discrete Mathematics Page 46

is
an = c (r1 )n + d (r2 )n ,
where r1 and r2 are distinct solutions of the characteristic equation

r2 = pr + q,

and c and d are constants determined by the two initial conditions.

Example 3.8. Find the general solution of an = an−1 + 2an−2 .

Solution. The characteristic equation is

r2 = r + 2 ,

which can be recast as

r2 − r − 2 = (r + 1) (r − 2) = 0 .

The above equation tells us that r1 = −1 and r2 = 2 . The general solution we are
after therefore is
an = c 2n + d (−1)n .

Example 3.9. Find the solution of an = 4an−1 − 3an−2 with a1 = 0 and a2 = 12 .

Solution. The characteristic equation reads

r2 − 4r + 3 = (r − 3) (r − 1) = 0

so r1 = 3 and r2 = 1 . Hence, the general solution is

an = c 3n + d

Putting n = 1 and n = 2 in an = c 3n + d and using the initial conditions gives us

the system of equations
3c + d = 0 ,
9c + d = 12 .
Solving these gives c = 2 and d = −6 , so

an = 2 (3)n − 6 .
Page 47 Dr Adrian Turcanu & Dr Mateja Presern

Example 3.10. Find the solution of an = an−1 + an−2 with a0 = 0 and a1 = 1 .

Solution. The characteristic equation reads

r2 − r − 1 = 0 .

The latter has solutions

√ √
1+ 5 1− 5
and .
2 2
Hence, the general solution of our recurrence relation is
√ !n √ !n
1+ 5 1− 5
an = c +d .
2 2

Let us determine c and d by using the initial conditions. Substituting the initial
conditions a0 = 0 and a1 = 1 into the above general solution we get

c+d=0
√ ! √ !
1+ 5 1− 5
c +d = 1.
2 2
Solving these gives
1 1
c= √ d = −√ ,
5 5
so √ !n √ !n
1 1+ 5 1 1− 5
an = √ −√ .
5 2 5 2

The above method only works when the characteristic equation has two distinct real
solutions.
Fact 3.11. If the characteristic equation has a repeated (double) solution r = r1 ,
then the general solution of the recurrence relation (3.5) is
an = c r1n + d n r1n .

Example 3.12. Find the general solution of the recurrence relation an = 4an−1 −
4an−2 .
Solution. The characteristic equation

r2 − 4r + 4 = (r − 2)2 = 0
Discrete Mathematics Page 48

has only one solution r = 2 .

The general solution of the recurrence relation is

an = c 2n + d n 2n .

3.2.1 Exercises

Exercise 3.2.1. Find the solution of an = 6an−1 −5an−2 with a1 = 1 and a2 = 41 .

Exercise 3.2.2. Find the solution of an = an−1 + 2an−2 with a1 = −3 and a2 = 9 .

3.3 Solving Inhomogeneous Recurrence Relations

The following are examples of inhomogeneous recurrence relations:

an = 4an−1 − 15 , (3.6)

an = 7an−1 − 12an−2 + 5 (7)n . (3.7)

We begin by finding a particular solution to an inhomogeneous recurrence rela-
tion. The simplest way to do this is to look for a solution of the same form as the
inhomogeneous term (the term without ak ’s).

Example 3.13. Find a particular solution of an = 4an−1 − 15 .

Solution. Let an = p where p is a constant. We have to find p.
Putting an = p into an = 4an−1 − 15 gives

p = 4p − 15

so that 3p − 15 = 0 and p = 5 . A particular solution of our recurrence relation is

therefore an = 5 .

Example 3.14. Find a particular solution of an = 3an−1 + 9 − 2n .

Page 49 Dr Adrian Turcanu & Dr Mateja Presern

Solution. Let an = pn + q where p and q are a constants. We have to find p and q.

Substituting our guess into the recurrence relation we get

pn + q = 3 [p(n − 1) + q] + 9 − 2n ,

pn + q = 3pn − 3p + 3q + 9 − 2n ,
0 = n(2p − 2) − 3p + 2q + 9 .
Hence 2p − 2 = 0 so p = 1. Then 2q + 6 = 0 so q = −3. A particular solution is
therefore an = n − 3 .

To find a particular solution of an = 4an−1 − 3an−2 + 5 (2)n we would look for a

solution an = p (2)n .

This method of finding a particular solution works in most cases. It would not work
for
an = 4an−1 − 3an−2 + 5 (3)n .
In this case we would look for a solution an = pn (3)n .

To find the general solution of an inhomogeneous recurrence relation we add a

particular solution to the general solution of the associated homogeneous recurrence
relation.

Example 3.15. Find the general solution of solution of an = 3an−1 + 9 − 2n .

Solution. By Example 3.14, a particular solution is an = n − 3. Now, the general
solution of the associated homogeneous equation

an = 3an−1

is an = c 3n . Therefore, the general solution of an = 3an−1 + 9 − 2n is

an = c 3n + n − 3 .

Example 3.16. Find the solution of an = (0.5) an−1 + 3 with a0 = 8.

Solution. To find a particular solution let an = p . Then

p = (0.5) p + 3

so that p = 6 . Hence a particular solution is an = 6 .

Discrete Mathematics Page 50

The general solution of the associated homogeneous equation an = (0.5) an−1 is

an = c (0.5)n .

Therefore, the general solution of an = (0.5) an−1 + 3 is

an = c (0.5)n + 6 .

Putting n = 0 in an = c (0.5)n + 6 gives

c+6=8

so that c = 2 . Hence
an = 2 (0.5)n + 6 .

Example 3.17. The average number of comparisons xn made by the quick sort
algorithm when sorting n pieces of data satisfies

n xn = (n + 1) xn−1 + 2n

for n ≥ 1 with x0 = 0.
(a) Let xn = (n + 1) an . Find a recurrence relation for an .

(b) Solve the recurrence relation for an and hence find xn .

Solution. If xn = (n + 1) an then

n(n + 1) an = n(n + 1) an−1 + 2n,

so that
2
an = an−1 + . (3.8)
n+1
Also, since x0 = 0 we have that a0 = 0 . From (3.8),
2 2 2
a1 = 1 , a2 = 1 + , a3 = 1 + + ,
3 3 4
2 2 2
an = 1 + + + ··· + .
3 4 n+1
Using xn = (n + 1) an , we obtain

2 2 2
xn = (n + 1) 1 + + + · · · + .
3 4 n+1
Page 51 Dr Adrian Turcanu & Dr Mateja Presern

3.3.1 Exercises

Exercise 3.3.1. Find the solution of an = (0.4) an−1 + 12 with a1 = 8. What is

the behaviour of an for large n?

Exercise 3.3.2. Let xn satisfy the recurrence relation

(n + 1) xn = 5(n + 2) xn−1 + 8(n + 1)(n + 2) .

for n ≥ 1 with x1 = 24.

(a) Let xn = (n + 2) an . Find a recurrence relation for an .

(b) Solve the recurrence relation for an and hence find xn .

3.4 Further Examples

Example 3.18. Many algorithms take a problem and then divide it into a number
of smaller problems. The binary search method reduces the search for an element
in a list of size m to a search in a list of size m/2. The method assumes that the
list has terms in order of increasing size; for example numbers in increasing size or
words in alphabetical order.
We show how the method works for finding the number 9 in the list of eight numbers
1 2 3 4 5 6 9 10
We first divide the list in two :
1 2 3 4 5 6 9 10
We then compare 9 with the largest number in the first list. Since 4 < 9 we keep
the second list and divide it in two :
5 6 9 10
Since 6 < 9 we keep the second list 9 10. After one more application we locate 9.

Suppose that we have m = 2n elements in the list. Let an be the number of

comparisons needed to find a particular element (in the worst case).
The binary search method reduces the search in a list of size m = 2n to a search in
a list of size 2n−1 . This reduction requires two comparisons; one to find out which
half of the list to use and the other to check if any terms in the list remain. Hence
an = an−1 + 2 .
Discrete Mathematics Page 52

Example 3.19. A bit has two possible values, 0 and 1. A bit string is a sequence
of bits. The length of a bit string is the number of bits in the string, for example,
11010110 has length eight. There are 2n bit strings of length n. Let an be the
number of bit strings of length n that do not have two consecutive 0’s.
Find a1 and a2 . Find a recurrence relation for an and hence find hence find
a6 .
Solution. We call a bit string valid if it does not have two consecutive 0’s. Both 0
and 1 are valid bit strings of length 1 so a1 = 2 . a2 = 3 , since 01 , 10 and 11 are
valid.
We can form a valid bit string of length n by

(i) Adding 1 at the end of a valid bit string of length n − 1.

(ii) Adding 1 0 at the end of a valid bit string of length n − 2.

Now, there are an−1 of type (i) and an−2 of type (ii) so

an = an−1 + an−2 .

Using the above recurrence relation we obtain a6 = 21 .

Example 3.20. A computer system considers a string of decimal digits a valid

codeword if it contains an even number of 0 digits. Thus 07903 and 68433 are
valid codewords of length five while 86031 is not valid.
Let an be the number of valid codewords of length n.

(a) Find a recurrence relation for an .

(b) Find a1 and hence find the solution of the recurrence relation.

Solution. We can form a valid codeword of length n by

(i) Adding a digit other than 0 at the end of a valid string of length n − 1.

(ii) Adding the digit 0 at the end of an invalid string of length n − 1.

For item (i) there are nine digits that we can put in at the end. Hence there are
9an−1 ways of forming a valid codeword of length n in this way.
The number of ways we can do item (ii) is equal to the number of invalid strings
of length n − 1.
Since there are a total of 10n−1 strings of length n − 1 and an−1 of these are
valid, there are
10n−1 − an−1
Page 53 Dr Adrian Turcanu & Dr Mateja Presern

invalid strings of length n − 1. Hence

an = 9an−1 + 10n−1 − an−1 ,

an = 8an−1 + 10n−1 .
We have to solve the above recurrence relation with a1 = 9. We first find a particular
solution of the form an = p 10n−1 . We have

p 10n−1 = 8p 10n−2 + 10n−1 ,

0 = 10n−2 (10p − 8p − 10) ,

so that p = 5.
The general solution of the associated homogeneous equation an = 8an−1 is
an = c 8n−1 . The general solution of an = 8an−1 + 10n−1 is therefore

an = c 8n−1 + 5 (10)n−1 .

Using a1 = 9 , we obtain
9 = c + 5.
Hence c = 4 and
an = 4 (8)n−1 + 5 (10)n−1 .

3.4.1 Exercises

Exercise 3.4.1. A ternary string is a string that only contains the symbols 0, 1
and 2. Let an be the number of ternary strings of length n that do not have two
consecutive 0’s. Find a recurrence relation for an and use the recurrence relation
to find a6 .

Exercise 3.4.2. This question is about the number of regions formed if we draw
n lines in the plane. With one line we get 2 regions and with two lines we get 4
regions. Find a recurrence relation for an , where an is number of regions that a
plane is divided into by n lines, if no two of the lines are parallel and no three of
the lines go through the same point.
Chapter 4

Graph Theory

4.1 Introduction to Graphs

A graph is a mathematical abstraction that is useful for solving many kinds of
problems. A graph consists of a set of elements called vertices and a list of unordered
pairs of these elements called edges. More formally, a graph G = (V, E) consists of
a nonempty set V of vertices and a list E of unordered pairs of elements from V .
For example, the graph G with V = {a, b, c, d} and

E : {a, b}, {a, d}, {b, d}, {c, d} .

is a graph with four vertices and four edges.

Note that the same edge can appear several times in a list, and edges connecting
a vertex to itself (loops) may also appear.
We can obtain a diagram of a graph G by using dots as the vertices and lines or
curves as the edges connecting the appropriate dots. The diagram in Fig. 1 is one
representation of the graph G described above.

a a e5

e3 e4
d
b e1
e2 c
c b
Fig 1 Fig 2

We can present graphs by showing only their diagrams. For example, we can read
off Fig. 2 that the diagram therein represents the graph with vertex set V = {a, b, c}
and edge list
E : {a, b}, {a, c}, {a, a}, {c, b}, {c, b} .

54
Page 55 Dr Adrian Turcanu & Dr Mateja Presern

To distinguish the multiple edges they can be equipped with additional labels. Thus
for the graph depicted in Fig. 2 we can label the edges {e1 , e2 , e3 , e4 , e5 } as shown
in the picture. With each label we associate the pair of vertices the corresponding
edge connects. For example e2 is associated with {b, c} and e5 with {a, a}. These
vertices are called the end points of the corresponding edge.
Definition 4.1. A graph is called simple if it has no loops and no multiple edges.
When edges and vertices are removed from a graph, without removing endpoints
of any remaining edges, a smaller graph is obtained. Such a graph is called a subgraph
of the original graph. The vertices of the subgraph form a subset of the vertices of
the original graph and the edges of the subgraph appear in the edge list of the
original graph.
For example a graph with vertices V = {a, b} and edges E : {a, a}, {a, b}, de-
picted below, is a subgraph of the graph from Fig. 2.

a e5

4.1.1 Exercises

Exercise 4.1.1. Draw the diagram corresponding to the graph given by V =

{a, b, c, d} and
E : {a, b}, {c, d}, {a, d} .

Exercise 4.1.2. Draw the diagram corresponding to the graph given by V =

{a, b, c, d, e f } and

E : {a, f }, {a, e}, {b, c}, {b, f }, {c, f }, {d, e}, {e, f } .

4.2 Adjacency Matrix

We shall often make use of the notation V (G) for the set of vertices of a graph G
and E(G) for the list of edges. If {v, w} is an edge of a graph G then we say
that the vertices v and w are adjacent. We will also say that {v, w} is an edge
connecting the vertices v and w.
Discrete Mathematics Page 56

Definition 4.2. Let G be a graph with vertex set V = {v1 , v2 , . . . , vn } . The

adjacency matrix of G is the n × n matrix A = (aij ) defined by

aij = the number of edges connecting vi and vj .

Example 4.3. Suppose that

 
0 1 1 0
1 0 0 1
A=
1

0 0 1
0 1 1 0

is the adjacency matrix of a graph G with vertex set V (G) = {v1 , v2 , , v3 , v4 } .

Since a12 = 1 the vertices v1 and v2 are adjacent. Since a32 = 0 the vertices v3
and v2 are not adjacent. Continuing this analysis we find that

E(G) : {v1 , v2 }, {v1 , v3 }, {v2 , v4 }, {v3 , v4 } .

Example 4.4 (*). Find the adjacency matrix for the graph from Fig. 2.
Solution. Ordering the vertices alphabetically we obtain
 
1 1 1
A = 1 0 2 .
1 2 0

Note that the adjacency matrix of a graph is symmetric, that is, aij = aji (the
matrix is symmetric upon reflecting across the diagonal).

For a simple graph all entries aij are either 0 or 1. Also, since for a simple graph
an edge does not connect a vertex to itself, we have that in this case the diagonal
elements vanish: aii = 0 .
Definition 4.5. Let v be a vertex of a graph G . The degree of v, denoted by
deg(v) , is the number of edges of G having v as an endpoint with each loop (v, v)
contributing twice.

Example 4.6. For the graph in Fig. 1, deg(a) = 2, deg(b) = 2, deg(c) = 1 and
deg(d) = 3.
Page 57 Dr Adrian Turcanu & Dr Mateja Presern

By examining structure of the graph, we notice that each edge in a graph has
two ends. Hence, each edge must contribute 2 to the sum of the degrees. This
means that the sum of the degrees of the vertices is twice the number of edges. This
observation is formalised by the following theorem.
Theorem 4.7 (Handshaking Theorem). Let G = (V, E) be a graph with vertex set
V = {v1 , v2 , . . . , vn } . Then

deg(v1 ) + deg(v2 ) + · · · + deg(vn ) = 2|E|

The name “Handshaking Theorem” comes from the fact that a graph can be
used to represent a group of people shaking hands.

4.2.1 Exercises

Exercise 4.2.1. Find the adjacency matrix of the graph in Fig. 1.

Exercise 4.2.2. What can you say about the sum of numbers in any row or column
of the adjacency matrix of a graph with no loops?

Exercise 4.2.3. A graph G = (V, E) has 23 edges and every vertex of G has
degree four or larger. What is the largest possible number of vertices that G can
have?

Exercise 4.2.4. Prove that in any graph the number of vertices of odd degree is
even.

Exercise 4.2.5. Let G be a simple graph with at least two vertices. Prove that G
has two vertices of the same degree.

4.3 Connectivity
Definition 4.8. A path of length n from vertex a to vertex b is a sequence of edges
e1 , e2 , . . . , en such that e1 is associated with vertices {a, v1 }, e2 is associated with
{v1 , v2 }, and so on, with en associated with {vn−1 , b}.
Discrete Mathematics Page 58

In a simple graph a path can be represented by a sequence of vertices: a, v1 , v2 , . . . , b,

in which each pair of neighbouring vertices is adjacent.

Definition 4.9. A path is called simple if it does not contain the same edge twice.

a b c
a
d

d e f e

b c
Fig 3 Fig 4

In Figure 3, a, b, c, f is a simple path of length three. Also, d, e, c, a is not a

path since {e, c} is not an edge.
In general, it can be shown that if vertices are joined by a path then they can
be joined by a simple path.
In an acquaintanceship graph, each person in a particular group of people is
represented by a vertex. An edge is used to connect two people when these people
know each other. It has been conjectured that almost every pair of people in the
world are linked by a small chain of people, perhaps containing just five or fewer
people. For the acquaintanceship graph containing all the people in the world, this
would mean that most vertices are linked by a path of length at most four.

Definition 4.10. A graph is connected if there is a path between every pair of

distinct vertices.

For example, the graph in Figure 3 is connected, whereas the graph in Figure 4
is not connected because, for example, the vertices a and d are not joined by a path.

Definition 4.11. A connected component of a graph G is a subgraph of G that is

connected and that is not contained in any larger connected subgraph.

Any graph can be represented as a union of its connected components which are
disjoint. The graph on Fig. 4 has two connected components.

The number of paths between vertices in a graph can be found by using its
adjacency matrix. Let A = (aij ) be the adjacency matrix of a graph G and let
A2 =: B = (bij ) . Assume for simplicity that the graph G is simple. Then

bij = ai1 a1j + ai2 a2j + ai3 a3j + · · · + ain anj

Now each aik akj is either 0 or 1. Also, aik akj = 1 if and only if both aik = 1
and akj = 1 . Hence aik akj = 1 if and only if both {vi , vk } and {vk , vj } are edges
of G. This argument is easily extended to non-simple graphs.
Page 59 Dr Adrian Turcanu & Dr Mateja Presern

This shows that the sum

bij = ai1 a1j + ai2 a2j + ai3 a3j + · · · + ain anj

counts the number of vertices vk such that vi , vk , vj is a path from vi to vj .

Hence, the (i, j) entry of A2 is the number of paths of length 2 joining vi to vj .
In general, it can be shown that the (i, j) entry of Am is the number of paths
of length m joining vi to vj .

Example 4.12 (*). For the graph shown in Figure 5:

(a) find the number of paths of length 2 joining b and c ;

(b) find the number of paths of length 3 joining a and b .

a b
c

a c d

Fig 5 Fig 6

Solution. The adjacency matrix of the graph is

 
0 1 0
A= 1 0
 1
0 1 0

and    
1 0 1 0 2 0
A2 = 0 2 0 A3 = 2 0 2
1 0 1 0 2 0
Since the (2, 3) entry of A2 is 0 there are no paths of length 2 joining b and c .
Since the (1, 2) entry of A3 is 2 there are 2 paths of length 3 joining a and b .
Discrete Mathematics Page 60

4.3.1 Exercises

Exercise 4.3.1. If a graph is connected, then every vertex must be adjacent to some
other vertex. Give an example of a graph with four vertices which is not connected,
but has each vertex connected to some other vertex.

Exercise 4.3.2. Let G be a simple graph with vertex set V = {v1 , v2 , . . . , vn }

and let A be the adjacency matrix of G . Explain why the (i, i) entry of A2 is
equal to deg(vi ) , the degree of the vertex vi .

4.4 Directed graphs

Definition 4.13. A directed graph (or digraph) consists of a set of vertices V and
a list of ordered pairs of vertices called directed edges.

A digraph is depicted the same way as an ordinary graph with arrows put on
the edges marking the direction, that is for a directed edge {a, b} the arrow points
from a to b. Thus a digraph specified by V = {a, b, c, d} and

E : {b, b}, {b, a}, {a, b}, {a, a}, {c, a}, {c, d}, {d, d}

is depicted as

b
a

c d

Definition 4.14. A directed path from the vertex v0 to the vertex vn in a digraph
is a sequence of vertices v0 , v1 , v2 , . . . , vn such that for each pair of neighbouring
vertices vi , vi+1 in the sequence there is a directed edge {vi , vi+1 } belonging to the
digraph. The number of edges in the directed path is called its length.

Definition 4.15. A digraph is called strongly connected if there exists a directed

path between any pair of its vertices. A digraph is called weakly connected if the
underlying undirected graph, obtained by simply forgetting about the directions of
edges, is connected.
Page 61 Dr Adrian Turcanu & Dr Mateja Presern

Any strongly connected digraph is also weakly connected but the opposite is not
always true.
The digraph on Fig. 7A below is strongly connected while the digraph on Fig. 7B
is weakly but not strongly connected.
c c

b e d b e d

a a

Fig. 7A Fig. 7B

The subgraphs of a digraph G that are strongly connected but are not contained
in larger strongly connected subgraphs are called strongly connected components of
G.

The graph depicted in Fig. 7B has 3 strongly connected components depicted on

Fig. 7C below. The first component contains the vertices {e, b, c} and the directed
edges {e, b}, {b, c}, {c, e} whereas the other two components are each comprised of
a single vertex — {d} and {a}, respectively — and no edges.
c

b e d a

Fig. 7C

Unlike the case of connected components, a digraph in general cannot be repre-

sented as a simple union of its strongly connected components.

The adjacency matrix of a digraph with n vertices is defined as an n × n matrix

(aij ) with entries
aij = the number of directed edges {ai , aj }
where zero is put if there are no edges {ai , aj }. For the digraph given before Defini-
tion 4.14, arranging the vertices alphabetically we obtain
 
1 1 0 0
1 1 0 0
A= 1 0 0

1
0 0 0 1
Discrete Mathematics Page 62

Notice that an adjacency matrix of a digraph needs no longer be symmetric.

As for an ordinary graph the adjacency matrix of a digraph encodes a complete
information on the digraph.

The number of directed paths of length n from vertex vi to vertex vj in a digraph

is given by the (i, j)-th entry in the matrix An where A is the adjacency matrix of
the given digraph.

Example 4.16. Find the number of directed paths of length 3 from a to b in the
digraph depicted below:

b
a

c
Solution. Arranging the vertices alphabetically we obtain the adjacency matrix
 
1 1 1
A = 1 1 0 .
0 1 0

Next compute the power  

5 6 2
A3 = 4 5 2 .
2 2 1
The (1, 2) entry corresponding to directed paths from a to b is 6. Hence the number
of paths from a to b of length 3 equals 6.

4.4.1 Exercises

Exercise 4.4.1. Find the number of directed paths from a to b of length 3 and the
number of directed paths of length 4 from c to b for the digraph
Page 63 Dr Adrian Turcanu & Dr Mateja Presern

b
a

4.5 Weighted Graphs

In this section we consider graphs that have positive numerical weights attached to
their edges. An example of a weighted graph is shown in Figure 8.

4 6
2

a 7 c 3 g
3

4 4 z
4
d f
2 5

1
12
Fig 8
e

The length of a path in a weighted graph is the sum of the weights on the edges
of the path. In Figure 8 the path a, b, c, f has length 10.

We study the problem of finding the shortest path between two vertices a and
z. We use Dijkstra’s Algorithm to do this.

Let G be a graph with vertex set a = v0 , v1 , v2 , . . . , vn = z and positive weights

w(vi , vj ) where w(vi , vj ) = ∞ if {vi , vj } is not an edge in G . The algorithm finds
the length of the shortest path between the vertices a and z.
Discrete Mathematics Page 64

Dijkstra’s Algorithm

S := ∅, L(a) := 0
For i = 1 to n
L(vi ) := ∞
while z ∈
/S
begin
u := a vertex not in S with L(u) minimal
S := S ∪ {u}
for all vertices v not in S
if L(u) + w(u, v) < L(v) then L(v) := L(u) + w(u, v)
end

L(z) is the length of the shortest path from a to z.

Starting with S = ∅ , one vertex is added to S at each stage of the iteration.

Each vertex v has a label attached to it starting with L(a) = 0 and L(v) = ∞ if
v ̸= a . The vertex added to S is the vertex with the minimal label.
At each stage of the iteration

if v ∈ S then L(v) is the length of the shortest path from a to v.

if v ∈
/ S then L(v) is the length of the shortest path from a to v that contains
only (other than v) vertices in S .

At the kth iteration let S = Sk and denote the labels by Lk .

Suppose Sk = Sk−1 ∪ {u} and let v be a vertex not in Sk . We have to update the
label Lk (v) . A shortest path from a to v that contains only (other than v) vertices
in Sk may or may not include the vertex u.

if the shortest path does not include u then Lk (v) = Lk−1 (v) .

if the shortest path contains u then Lk (v) = Lk−1 (u) + w(u, v) .

Hence
Lk (v) = min{Lk−1 (u) + w(u, v), Lk−1 (v)}.

Example 4.17 (*). Use Dijkstra’s Algorithm to find the length of the shortest path
between the vertices a and z in the weighted graph displayed in Figure 9. Show the
main steps used in the algorithm.
Page 65 Dr Adrian Turcanu & Dr Mateja Presern

b 5 d

4 6

8
1 Fig 9
a 2 z

2
3

10
e
c

Solution. S = ∅, L(a) = 0, L(b) = L(c) = L(d) = L(e) = L(z) = ∞

S = {a}, L(b) = 4, L(c) = 2, L(d) = L(e) = L(z) = ∞

S = {a, c}, L(b) = 3, L(d) = 10, L(e) = 12, L(z) = ∞

S = {a, c, b}, L(d) = 8, L(e) = 12, L(z) = ∞

S = {a, c, b, d}, L(e) = 10, L(z) = 14

S = {a, c, b, d, e}, L(z) = 13

z ∈ S, L(z) = 13 so the shortest path has length 13.

Example 4.18 (*). Use Dijkstra’s Algorithm to find the length of the shortest path
between the vertices a and z in the weighted graph displayed in Figure 8. Show the
main steps used in the algorithm.
Solution. S = ∅, L(a) = 0, L(b) = L(c) = L(d) = L(e) = L(f ) = L(g) = L(z) =
∞

S = {a}, L(b) = 4, L(c) = 7, L(d) = 4, L(d) = 4, L(e) = 2, L(f ) = L(g) = L(z) =

∞

S = {a, e}, L(b) = 4, L(c) = 7, L(d) = 3, L(f ) = ∞, L(g) = ∞, L(z) = 14

Discrete Mathematics Page 66

S = {a, e, d}, L(b) = 4, L(c) = 7, L(f ) = 8, L(g) = ∞, L(z) = 14

S = {a, e, d, b}, L(c) = 6, L(f ) = 8, L(g) = 10, L(z) = 14

S = {a, e, d, b, c}, L(f ) = 8, L(g) = 9, L(z) = 14

S = {a, e, d, b, c, f }, L(g) = 9, L(z) = 12

S = {a, e, d, b, c, f, g}, L(z) = 12

z ∈ S, L(z) = 12 so the shortest path has length 12.

4.5.1 Exercises

Exercise 4.5.1. The table below gives the cost of travelling on a toll road between
cities a, b, c, d and e. Represent this data by a weighted graph. Use Dijkstra’s
Algorithm to find the least expensive route in terms of total tolls between a and
each of the cities.

a b c d e
a - 4 18 5 10
b 4 - 2 5 8
c 18 2 - 11 5
d 5 5 11 - 2
e 10 8 5 2 -

4.6 Trees
Definition 4.19. A path in a graph is called a circuit if it starts and ends at the
same vertex. A circuit is simple if it does not contain the same edge twice.

Definition 4.20. We define a tree to be a connected graph that contains no simple

circuits.

Example 4.21. The graph shown in Figure 1 is a tree. The graph in Figure 2 is
not connected so it is not a tree. The graph in Figure 3 is not a tree since it contains
a simple circuit.
Page 67 Dr Adrian Turcanu & Dr Mateja Presern

Fig 1 Fig 2 Fig 3

Trees are useful for many problems. File structures in a computer system can
often be represented by a tree. Trees can be used to construct good algorithms
for locating items in a list. For weighted graphs, trees can be used to construct
minimum cost transportation networks.

Note that there are many equivalent ways of defining a tree. Let T be a graph
with n vertices. The following are all equivalent.
(i) T is a tree.
(ii) T is connected and has n − 1 edges.
(iii) any two vertices of T are connected by exactly one path.
(iv) T is connected, but removing any edge from T results in a disconnected graph.
Definition 4.22. A tree is called a full m-ary tree (binary for m = 2, ternary for
m = 3) if each internal vertex except for the root has degree m + 1 and the root
vertex has degree m.
Vertices of degree one in a tree, distinct from the root, are called leaves. The
vertices which are not leaves are called internal.
The root is always an internal vertex.

If a graph is not connected then it can be split up into a number of connected

subgraphs called connected components.
Definition 4.23. A subset of vertices is a connected component if the subset is
connected and the subset is maximal in that no new vertices can be added to the
subset which keeps the subset connected.

Example 4.24. As an example, the graph depicted below has two components G1
and G2 where

V (G1 ) = {a, b, c} E(G1 ) : {a, b}, {a, c}, {b, c} .

V (G2 ) = {d, e} E(G2 ) : {d, e} .

Discrete Mathematics Page 68

a
d

e
c
b

Definition 4.25. A forest is a graph for which every connected component is a tree.

Example 4.26. A forest graph contains 10 trees. The graph has 5 vertices of degree
10, 8 vertices of degree 7, 10 vertices of degree 3. The rest of the vertices are leaves.
Find the total number of vertices in the graph.
Solution. Let |Ei | and |Vi | be the number of edges and the number of vertices in the
i-th connected component respectively. Since each connected component is a tree
we have
|Ei | = |Vi | − 1
Summing up these 10 equations we obtain
10
X 10
X
|E| = |Ei | = |Vi | − 10 = |V | − 10 .
i=1 i=1

Let L be the number of leaves in the graph. Then we have |V | = L + 5 + 8 + 10 =

L + 23. Substituting this in the previous relation we obtain

|E| = |V | − 10 = L + 23 − 10 = L + 13

Also by the handshaking lemma we have

2|E| = 5 · 10 + 8 · 7 + 10 · 3 + L · 1 = 136 + L

Combining the last two equations we obtain L = 110 and therefore |V | = L + 23 =

133.

Example 4.27. A spam e-mail is initially sent to 4 people. The e-mail asks to
sent it out to 4 more people. Some people follow the request and send 4 e-mails to
their friends, others ignore it and do not send it out. The e-mail stops propagating
when 100 people received the e-mail but did not send it out. How many people in
total received the letter (including the very first person who sent it out) if nobody
received it twice.
Page 69 Dr Adrian Turcanu & Dr Mateja Presern

Solution. The condition that nobody received the spam e-mail twice means that the
propagation graph of the e-mail is a tree. Let i be the number of internal vertices
in the graph (including the root). Each internal vertex corresponds to a person who
received the letter and spread it further. Let l be the number of leaves in the graph.
Each leave corresponds to a person who received the letter but did not spread it
further. Thus l = 100 and the total number of vertices |V | = l + i = 100 + i. Since
the graph is a tree |E| = |V | − 1. Finally, since each edge comes from an internal
vertex and there are 4 edges coming from each internal vertex we have |E| = i · 4.
We thus obtain 3 equations for 3 unknowns:

|V | = 100 + i ,
|E| = |V | − 1 ,
|E| = i · 4 .

Solving it we obtain |V | = 133.

4.6.1 Exercises

Exercise 4.6.1. A full ternary tree has 71 leaves. What is the total number of
vertices in this tree?

4.7 Spanning trees

Definition 4.28. Let G be a connected graph. A spanning tree of G is a subgraph
of G that contains every vertex of G and is a tree.
Spanning trees are important because they use the least possible number of edges
to connect all vertices of a graph. The picture below shows a graph G and three of
its spanning trees.

G spanning tree spanning tree spanning tree

There are two basic methods for searching a graph:

The breadth-first search method visits all the vertices adjacent to the current
one before penetrating deep into a graph.
The depth-first search method penetrates as deeply as possible into a graph
before fanning out to other vertices.
Discrete Mathematics Page 70

Breadth-first search

Let G be a connected graph with vertices 1, 2, . . . , n . We say that

an edge is incident with a vertex v if v is an endpoint of that edge.

Step 1. We start with vertex 1. We then add all edges incident to

vertex 1 (in numerical order). The new vertices are level one in the
spanning tree.

Step 2. We then take each vertex in level 1 in numerical order.

For each vertex we add each edge incident to it as long as it does
not produce a simple circuit. This produces level 2 of the tree. The
process continues until all vertices have been added.

If a vertex v is distance d from vertex 1 in a graph G then v is in level

d of the breadth-first spanning tree of G.

Example 4.29. The pictures below show the breadth-first spanning tree T of a
graph G. We use the convention in which vertices with smaller labels are inspected
before vertices with larger labels.

9 10 11
8

6 1 2
7 G

5 4 3

12
13
Page 71 Dr Adrian Turcanu & Dr Mateja Presern

2 4 6 10

T
11
3 5 13 7 9

12 8

Depth-first search

In the depth search method we start with the root vertex.

We then add new vertices successively choosing those adjacent to the

vertices already in the tree and not forming simple circuits. When
this is not possible we back track to the first visited vertex that has
an adjacent vertex not yet in the tree, and then branch off adding
that vertex and a new path going as deep in the graph as possible.

We stop when all vertices are in the tree.

Example 4.30. Below is the depth first spanning tree T ′ for the graph G from
Example 4.29.
1

6 13

7 12
9 8

10
.
11
Discrete Mathematics Page 72

4.7.1 Exercises

Exercise 4.7.1. A connected graph has n vertices and k edges. How many edges
must be removed to produce a spanning tree?

Exercise 4.7.2. Give a example of a graph with 4 vertices and 3 edges which is not
a tree.

Exercise 4.7.3. Use a breadth-first search to produce a spanning tree of the graph
in Figure 5 below.

4.8 Minimum Spanning Trees

Definition 4.31. A minimum spanning tree of G is a spanning tree of G with the
smallest possible weight.

We will study two methods for finding minimum spanning trees.

4.8.1 Prim’s algorithm

Let us begin with Prim’s algorithm.

Step 1. Start by putting any edge with the smallest weight into the spanning
tree T .

Step 2. Continue by adding edges of minimum weight that are incident to a

vertex already in T and not forming a simple circuit with those edges already in T .
The algorithm stops when n − 1 edges are in T .
Page 73 Dr Adrian Turcanu & Dr Mateja Presern

Example 4.32. Below we apply Prim’s algorithm to find a minimum spanning tree
of the graph in Figure 7.

a 2 b 3 c 1 d

3 1 2 5

4 f 3 g 3 h Fig 7
e

4 2 4 3

3 3 1
i j k l

The algorithm:
choice 1. edge {b, f } weight 1
choice 2. edge {a, b} weight 2
choice 3. edge {f, j} weight 2
choice 4. edge {a, e} weight 3
choice 5. edge {i, j} weight 3
choice 6. edge {f, g} weight 3
choice 7. edge {c, g} weight 2
choice 8. edge {c, d} weight 1
choice 9. edge {g, h} weight 3
choice 10 edge {h, l} weight 3
choice 11 edge {k, l} weight 1
The minimum spanning tree is shown in Figure 8 and has weight 24.

a b c d

f g h Fig 8
e

3
i j k l
Discrete Mathematics Page 74

4.8.2 Kruskal’s algorithm

Let us now describe Kruskal’s algorithm.

Step 1. We start by putting any edge with the smallest weight into the spanning
tree T .

Step 2. We then successively add edges with smallest weight that do not form a
simple circuit with those edges already in T . The algorithm stops when n − 1 edges
are in T .

Example 4.33. Below we use Kruskal’s algorithm to find a minimum spanning tree
of the graph in Figure 7 from Example 4.32.
The algorithm:

choice 1. edge {c, d} weight 1

choice 2. edge {k, l} weight 1
choice 3. edge {b, f } weight 1
choice 4. edge {c, g} weight 2
choice 5. edge {a, b} weight 2
choice 6. edge {f, j} weight 2
choice 7. edge {b, c} weight 3
choice 8. edge {j, k} weight 3
choice 9. edge {g, h} weight 3
choice 10 edge {i, j} weight 3
choice 11 edge {a, e} weight 3
The minimum spanning tree is shown in Figure 9 and has weight 24.
Page 75 Dr Adrian Turcanu & Dr Mateja Presern

4.8.3 Exercises

Exercise 4.8.1. Give an example of a connected graph with 3 vertices that has
more than one minimum spanning tree.

Exercise 4.8.2. What is the main difference between Prim’s algorithm and
Kruskal’s algorithm?

Exercise 4.8.3. Starting with the edge {k, l} , use Prim’s algorithm to find a min-
imum spanning tree of the graph in Figure 7.

Exercise 4.8.4. Use Kruskal’s algorithm to find a minimum spanning tree of the
graph in Figure 11 below.
a b 4 c
5

3 6
2 3
5

7 e 1 Fig 11
d f

3
6 8 4
4

g 4 h 2 i

4.9 Euler Paths and Circuits

An Euler circuit in a graph G is a simple circuit containing every edge of G.

The graph in Figure 12 has an Euler circuit, for example a, b, c, d, e, c, a .
It is easy to check the graph in Figure 13 does not have an Euler circuit.
Discrete Mathematics Page 76

d e c d

a b a b

Fig 12 Fig 13

Theorem 4.34. A connected graph G has an Euler circuit if and only if each of its
vertices has even degree.

Note that there are two parts to the above result.

1. if G has an Euler circuit then every vertex has even degree.
2. if every vertex has even degree then G has an Euler circuit.
Part 1 was proved by Euler in 1736. Part 2 was proved 100 years later.

Proof. We prove part 1. If we follow the Euler circuit, whenever we pass through a
vertex we add 2 towards the degree of a vertex. Since each edge is used once, the
degree of each vertex is a sum of 2’s.

For the graph in Figure 12, deg(a) = 2 , deg(b) = 2 , deg(c) = 4 , deg(d) = 2 and
deg(e) = 2. Hence there is an Euler circuit.
For the graph in Figure 13, deg(a) = 3 so it does not have an Euler circuit.
An edge in a graph is called a cut edge if its removal produces a subgraph with more
connected components than the original graph.
The cut edges in the graph in Figure 14 are {a, b} and {c, e} .

a d f g

Fig 14

b c e h

Fleury’s algorithm for constructing an Euler circuit begins with choosing a starting
vertex and then adding edges successively. Once a edge is chosen it is removed and
Page 77 Dr Adrian Turcanu & Dr Mateja Presern

vertices of degree 0 are erased. Each edge begins where the last edge ends. A cut
edge is only used if there is no alternative.

Example 4.35. Explain why the graph shown in Figure 15 has an Euler circuit.
Use Fleury’s algorithm to find an Euler circuit starting with the vertex c.

g f e

Fig 15

a b c d

Solution. deg(a) = 2 , deg(b) = 4 , deg(c) = 4 and deg(d) = deg(e) = deg(f ) =

deg(g) = 2.
Since the degrees are all even there is an Euler circuit.
We start with the edge {c, f } followed by {f, b} . The edge {b, c} is a cut edge so
we choose {b, g} followed by {g, a} and {a, b} .
At the vertex b there is no alternative, we have to pick the cut edge {b, c} .
We complete the circuit with the edges {c, e}, {e, d} and {d, c} .
We can present the solution to this problem in the form of a diagram as shown in
Figure 16.
The reader should apply Fleury’s algorithm to find an Euler circuit, starting at a
different vertex.

g f e

2
1 8 Fig 16
4 3
7

a 5 b 6 c 9 d
Discrete Mathematics Page 78

An Euler path in a graph G is a simple path containing every edge of G.

The graph in Figure 2 does not have an Euler circuit. It has an Euler path, for
example a, c, d, a, b, d .

Theorem 4.36. Let G be a connected graph. G has an Euler path but not an Euler
circuit if and only if exactly two of its vertices have odd degree.

Proof. We first prove that if there is an Euler path but not an Euler circuit then
exactly two vertices have odd degree. As in the proof of Theorem 1, every vertex
on the Euler path has even degree except for the initial and final vertex. Only two
vertices, the initial and final, have odd degree.
Now assume that G has only two vertices of odd degree, x and y. We have to prove
that there is an Euler path. We make a new graph H by adding a new vertex z to
G and two new vertices, {x, z} and {y, z} . All the vertices in H have even degree.
By Theorem 1, there is an Euler circuit in H. By removing the edges {x, z} and
{y, z} from the Euler circuit we are left with an Euler path, starting at x and ending
at y.

For the graph in Figure 13, deg(a) = deg(d) = 3 and deg(b) = deg(c) = 2 so it has
an Euler path.
In order to find an Euler path, we use Fleury’s algorithm, starting at a vertex which
has odd degree.
Euler circuits and paths have many applications. For example, we might want to
find a path that traverses exactly once each street in a town, or each connection in
a utility grid, or each link in a communications network.

4.9.1 Exercises

Exercise 4.9.1. Theorems 4.34 and 4.36 are about graphs with exactly zero or two
vertices of odd degree. What can you say about graphs with exactly one vertex of
odd degree?

Exercise 4.9.2. The complete graph on n vertices, denoted by Kn , is the graph

that contains exactly one edge between each pair of distict vertices. For which values
of n does Kn have an Euler circuit?
Page 79 Dr Adrian Turcanu & Dr Mateja Presern

Exercise 4.9.3. Explain why the graph shown in Figure 17 has an Euler path. Use
Fleury’s algorithm to find an Euler path.

Fig 17 Fig 18

Exercise 4.9.4. Explain why the graph shown in Figure 18 has an Euler circuit.
Use Fleury’s algorithm to find an Euler circuit.

4.10 Hamilton Paths and Circuits

Euler paths and circuits concern the edges of a graph. What about paths that
contain every vertex of a graph exactly once.
A Hamilton circuit in a graph G is a simple circuit that visits every vertex of G
exactly once. A Hamilton path is a simple path that visits every vertex exactly
once.
The graph in Figure 13 has a Hamilton circuit, for example a, b, d, c .
The graph in Figure 14 does not have a Hamiltonian circuit since any circuit would
have to contain the edge {c, e} twice. It has the Hamilton path a, b, d, c, e, f, g, h .
The complete graph on n vertices, denoted by Kn , is the graph that contains exactly
one edge between each pair of distict vertices. Clearly there is a Hamiltonian circuit
in Kn for all values of n.
There is no simple criteria for determining whether or not a graph has a Hamiltonian
circuit or a Hamiltonian path. However, simple properties can be used to show that
a graph does not have a Hamiltonian circuit. For example, suppose G is a graph
containing a vertex with degree 1. Then it cannot contain a Hamiltonian circuit
since in a Hamiltonian circuit each vertex is incident with two edges.
Discrete Mathematics Page 80

The best algorithms known for finding a Hamiltonian circuit or determining that
no such circuit exists have exponential time complexity in the number of vertices in
the graph.

Hamiltonian circuits have many applications. The traveling salesman problem is:
given a set of cities and the distance between each pair, find the shortest route
which visits each city and returns to the initial city. This is equivalent to finding
a Hamiltonian circuit in a weighted complete graph Kn such that the total weight
of its edges is as small as possible. In principle we can solve the traveling salesman
problem by looking at all possible Hamiltonian circuits and selecting the one with
the minimum weight. However, when we increase the number of cities we run into
difficulties.

Suppose there are n vertices. Once we have chosen a starting point there are (n−1)!
Hamiltonian circuits to look at. We only need look at half of these since we can
travel on a Hamiltonian circuit in reverse order. The number

(n − 1)!
2

grows very quickly. For n = 10 we have to check 181440 possible routes and this
is feasible. For n = 25 there are approximately 3.1 × 1023 different Hamiltonian
circuits to check. This is impossible. The best algorithms for this problem produce
a solution that is close to the correct solution.
One simple algorithm that can be used in finding approximate solutions to the
travelling salesman problem is the Nearest Neighboor algorithm. This is an example
of a greedy algorithm. It proceeds as follows

1. Select the starting vertex

2. Travel along the edge with the smallest weight to a vertex that has not been
visited yet

3. Repeat step 2 untill all verices visited

Example 4.37. For a graph depicted below use the Nearest Neighboor algorithm
to find Hamiltonian paths starting from the vertex a and starting from the vertex
f.
Page 81 Dr Adrian Turcanu & Dr Mateja Presern

b 5 c

10 6
10 6 8
8 11
a 13 d

7 3
7
12 10

f 8 e

Solution. The Nearest Neighboor algorithm starting with vertex a gives the path:
aebcdf a with the total weight 7+6+5+6+3+12=39, while starting with the vertex
f it gives f cbeadf with weight 41.

One should be aware that for certain graphs the Nearest Neighboor algorithm
may produce results which are very far from optimal no matter from which vertex one
starts. For the graph depicted below each path produced by the Nearest Neighboor
algorithm has a total weight exceeding 100 while the optimal path adceba has the
weight of only 16.
b

3 100
100
2
a
10 c
5 4

3
1

d 2 e
The following Lower Bound algorithm allows one to obtain a lower bound on the
weight of any Hamilton path:
Discrete Mathematics Page 82

1. Choose any vertex v and delete it and all of its incident edges from the graph
2. Find a minimum weight spanning tree in the resulting graph using Kruskal’s
or Prim’s algorithm
3. Find the two edges incident with v that have the smallest weight
4. A lower bound is the sum of the weights of the minimum weight spanning tree
found at step 2 and the two edges found at step 3

Example 4.38. Use the lower bound algorithm to find a lower bound for Hamilton
paths for the graph in Example 4.2. Start with the vertex a.
Solution. We find that the minimal tree contains the edges {f, d}, {bc}, {cd}, {eb}
and has the weight 20. The two edges incident with a having the smallest weight
are {ae} of weight 7 and {ab} of weight 10. Adding these to the spanning tree
weight we obtain a lower bound of 7+10+20=37. We now know that the result
produced by the Nearest Neighboor algorithm starting with a, which equals 39, is
pretty close to the optimal. As the optimal result is in any case larger than 37 the
weight of 39 is within 2/37 × 100% ≈ 5% from the optimal result. Thus the Nearest
Neighboor algorithm combined with the Lowest Bound algorithm can lead to rather
good approximate solutions to the traveling salesman problem.

4.10.1 Exercises

Exercise 4.10.1. Apply the Nearest Neighboor algorithm to the graph below, start-
ing with the vertices a, d, c. Find the weights of the corresponding Hamilton circuits.
b 5 c

8 5
13 9
8 8
3 7
a 17 d

12
16
10 2

e 11 f
Page 83 Dr Adrian Turcanu & Dr Mateja Presern

Exercise 4.10.2. For the same graph as in the previous problem use the Lower
Bound algorithm to find the lower bound removing the vertex b. Same for the
vertex e.

Exercise 4.10.3. Use the largest lower bound found in problem 4.4.2 to estimate
how close to the optimal is the best result found in problem 4.4.1.
Chapter 5

Matrices and linear

transformations

5.1 Linear equations and Gaussian Elimination

We aim to study systems of m linear equations in n unknowns:

a11 x1 + a12 x2 + · · · + a1n xn = b1

a21 x1 + a22 x2 + · · · + a2n xn = b2
... = ...
... = ...
... = ...
am1 x1 + am2 x2 + · · · + amn xn = bm . (5.1)

Here x1 , x2 , . . . , xn are the unknowns and aij ’s are coefficients and bj ’s are constants.
Given such a system we would like to have answers to the following questions

Does it have a solution? What conditions are required on the coefficients aij
and the constants bj for the system to have a solution?

How many solutions?

How do we find them?

If the constants b1 = b2 = · · · = bm = 0 then the system (5.1) is said to be

homogeneous. A homogeneous system always has at least one solution, namely
x1 = x2 = · · · = xn = 0 — the trivial solution.

Example 5.1. Solve the equation

ax = b (5.2)

84
Page 85 Dr Adrian Turcanu & Dr Mateja Presern

Solution. There are 3 cases.

1. If a ̸= 0, then for any b (5.2) has a unique solution

b
x= .
a
For example the equation
3x = 12
has unique solution
12
x= = 4.
3

2. If a = 0 but b ̸= 0, then (5.2) becomes

0 · x = b.

Since b is not zero no value of x will make this statement true, so (5.2) has no
solutions in this case.

3. If both a = 0 and b = 0, then (5.2) becomes

0 · x = 0.

This is true for any x. So (5.2) has infinitely many solutions in this case.

We will see that the 3 cases in the above example exactly mirror the general
case.

System (5.1) written in matrix form reads

AX = b (5.3)

where    
x1 b1
 x2   b2 
A = (aij )m×n , X= , b=
   
.. .. 
 .   . 
xm bm
We put all of this data into an m × (n + 1) augmented matrix:

(A|b) .

Each row of the augmented matrix represents an equation of system (5.1). To

facilitate the solution of system (5.1) we define three matrix row operations:
Discrete Mathematics Page 86

1. Rj → Rj + kRi means: add k times row i to row j of the matrix

2. Ri → kRi means: multiply row i by k ̸= 0
3. Pi ↔ Pj means: interchange row i and row j.
One can perform the above operations on the augmented matrix to reduce it to
row echelon form. Row echelon form means:
any all-zero rows are at the bottom of the reduced matrix,
in a non-all-zero row, the first-from-left nonzero value is 1,
the first ‘1’ in each row is to the right of the first ‘1’ in the row above.
Once the process is complete, it’s easy to find the solution to the system of
equations by back-substitution.

Let us show by means of an example how this is done.

Example 5.2. Solve the system of linear equations

x + 2y − 3z = 4
x + 3y + z = 11
2x + 5y − 4z = 13

Solution. Translated into a matrix problem this becomes

 ∗ 
1 2 −3 4
(A|b) =  1 3 1 11 
2 5 −4 13

We use the first row to kill the rest of the first column. This corresponds to elimi-
nating x from all but the first equation. The top left entry of the matrix, which is
used to drive all the entries directly below it to zero, is called a pivot and is marked
by an asterisk. We apply

R2 → R2 − R1 , R3 → R3 − 2R1

to obtain  
1 2 −3 4
 0 1∗ 4 7 
0 1 2 5
The process of driving all the entries below a pivot to zero is called a downsweep.
Next we use the second enry on the second row as a pivot to kill the elements below
it. We do this by applying R3 → R3 − R2
 
1 2 −3 4
 0 1 4 7 
0 0 −2 −2
Page 87 Dr Adrian Turcanu & Dr Mateja Presern

We further simplfy the matrix by multiplying the last row by −1/2:

 
1 2 −3 4
 0 1 4 7  (5.4)
0 0 1∗ 1

We now reverse the strategy and use the third row to kill elements in the third
column. This process is called an upsweep. After applying

R2 → R2 − 4R3 , R1 → R1 + 3R3

we obtain  
1 2 0 7
 0 1∗ 0 3 
0 0 1 1
We next use the second row to kill elements in the second column by applying
R1 → R1 − 2R2  
1 0 0 1
 0 1 0 3 
0 0 1 1
The unique solution of the system can now be read off the rightmost column:

x = 1, y = 3, z = 1.

The process that we employed to solve the above problems is called Gaussian
elimination. In it one first carries out a sequence of downsweeps using pivots on the
diagonal sucessively starting with the first row. One next scales the last row and
uses it for upsweeping. Then one scales the second row and uses it for upsweeping,
etc. until one obtains a unit matrix in the left block of the augmented matrix. The
rightmost column of the resulting matrix contains solutions. This algorithm assumes
that all pivots encountered in the course of its execution are nonvanishing.
An alternative to upsweepings is back-substitutions. In the last example once
we were done with all the downsweeps we obtained matrix (5.4). The corresponding
system of equations is

x + 2y − 3z = 4
y + 4z = 7
z = 1

We can find the solution by starting with the bottom equation and substituting
z = 1 into the equation directly above it. This gives y + 4 = 7 so that y = 3.
Substituting now z = 1 and y = 3 into the top equation we get x + 2 · 3 − 3 = 4
that yields x = 1. More generally, when we have more equations, we start with
the bottom equation and substitute its solution into the equation above it, then
substitute all known variables into the the equation above the bottom two, and so
on, always moving upwards.
Discrete Mathematics Page 88

Sometimes rather than having a unique solution the system may have no solution
or infinitely many solutions. Consider the following example.

Example 5.3. Solve the system of linear equations

x + 2y + 3z = 1
4x + 5y + 6z = 2
7x + 8y + 9z = 1

Solution. The augmented matrix for this problem is

 
1 2 3 1
 4 5 6 2 
7 8 9 1

Downsweeping brings this matrix to

 
1 2 3 1
 0 −3 −6 −2 
0 0 0 −2

From here on we are unable to proceed with upsweeping because the entry we
normally use as a pivot is zero. However we do not have to proceed because the last
equation now reads 0 · x + 0 · y + 0 · z = −2 which cannot be satisfied for any values
of x,y, z. The system therefore has no solutions.

Example 5.4. Solve the system of linear equations

x + 2y − 3z = 6
2x − y + 4z = 2
4x + 3y − 2z = 14
Solution. The augmented matrix for this system is
 
1 2 −3 6
 2 −1 4 2 
4 3 −2 14
Downsweeping yields  
1 2 −3 6
 0 1∗ −2 2 
0 0 0 0
The last row corresponds to the equation
0·x+0·y+0·z =0
Page 89 Dr Adrian Turcanu & Dr Mateja Presern

which is true for any values of x, y, z. To simplify the system further we do the
upsweep using the entry on the second row. We obtain
 
1 0 1 2
 0 1 −2 2 
0 0 0 0

The first two rows correspond to the equations

x+z = 2
y − 2z = 2

respectively. While the value of z remains undetermined (arbitrary), given such a

value the corresponding values of x and y are obtained from

x = 2−z, y = 2 + 2z .

The general solution of our system is

(x, y, z) = (2 − z, 2 + 2z, z)

where z is arbitrary and is called a free variable. There are thus infinitely many
solutions (for each value of z). A particular solution is obtained by fixing the
value of z. For example for z = 3, x = −1, y = 8 is a particular solution.

Sometimes pivots do not appear where we naively expect them to be and rows
may need to be interchanged. Consider the following example.

Example 5.5. Solve the system of linear equations represented by the augmented
matrix  ∗ 
1 2 3 1
 2 4 5 1 
−1 1 2 4
Solution. After downsweeping from the first pivot we obtain
 
1 2 3 1
 0 0∗ −1 −1 
0 3 5 5

We are blocked from using the natural pivot by a zero. However we can interchange
rows 2 and 3 and use a new pivot.
 
1 2 3 1
 0 3∗ 5 5 
0 0 −1 −1
Discrete Mathematics Page 90

Remaining downsweeps change nothing. After scalings and upsweeps we obtain

 
1 0 0 1
 0 1 0 0 
0 0 1 1

and the system has a unique solution (x, y, z) = (−2, 0, 1).

Example 5.6. Solve the system of linear equations represented by the augmented
matrix  ∗ 
1 2 3 1
 2 4 5 1 
−1 −2 2 4
Solution. All that is changed from the previous example is the entry A3,2 but it
makes a big difference. After downsweeping the first column we get
 
1 2 3 1
 0 0∗ −1 −1 
0 0 5 5

and there is nothing further down in the second column. If we can’t move down we
move right:  
1 2 3 1
 0 0 −1∗ −1 
0 0 5 5
We downsweep column 3 to obtain
 
1 2 3 1
 0 0 −1∗ −1 
0 0 0 0

Scaling and upsweeping using the entry A2,3 as a pivot we obtain

 
1 2 0 −2
 0 0 1 1 
0 0 0 0

There are infinitely many solutions. We can express the pivot variables (the ones
multiplied by the units) in terms of the non-pivot ones:

x = −2y − 2 , z = 1.

The value of y is arbitrary.

Page 91 Dr Adrian Turcanu & Dr Mateja Presern

The number of solutions in a linear system can be read off from the echelon form.
Recall that a matrix is said to be in echelon form if
1. Any all-zero rows are below all other rows; and
2. The first non-zero entry of any row (called the pivot entry) is strictly father
right that the first nonzero entry of any row above it
If (A′ |b′ ) is the augmented matrix obtained by using the Gaussian elimination just
after the last downsweep than the matrix A′ is in echelon form. Here is an example
of an augmented matrix after the last downsweep:
 
1 2 1 1 4 1 2
 0 0 2 −2 6 2 2 
 
 0 0 0 0 0 3 6 
0 0 0 0 0 0 0
The coefficient part A′ is in echelon form. The pivot entries are put in boxes.

A matrix is said to be in reduced echelon form if all the following are true
1. It is in echelon form
2. The first nonzero entry of any row is 1. This is called a pivotal 1.
3. All entries above a pivotal 1 are 0
If (A′′ |b′′ ) is the final augmented matrix obtained by Gaussian elimination than A′′
is in the reduced echelon form. In the above example the augmented matrix after
scalings and upsweeps is brought to the form
 
1 2 0 2 1 0 1
 0 0 1 −1 3 0 −1 
 
 0 0 0 0 0 1 2 
0 0 0 0 0 0 0
Its coefficient part is in reduced echelon form. The pivotal 1’s are put into boxes.

Consider now a general system of n equations for m unknowns. The coefficient

matrix A has dimensions n × m. At the end of Gaussian elimination it has the form
 
0 ... 0 1 0 0 0 d1
 0 ... 0 0 ... 1 0 0 d2 
 
 0 ... 0 0 ... 0 ... 1 0 d3 
 
 . . . . . . .
 .. .. .. .. .. .. .. 

 
 0 ... 0 0 ... 0 ... 0 ... 1 dl 
 
 0 ... 0 0 ... 0 . . . 0 . . . 0 . . . d 
 l+1 
 . .. .. .. .. .. .. 
 .. . . . . . . 
0 ... 0 0 ... 0 ... 0 ... 0 ... dn
The first l rows contain pivotal 1’s while the last n − l rows have coefficients zero.
Discrete Mathematics Page 92

There are the following possibilities

1. If at least one of the entries dj with l + 1 ≥ j ≥ n is non-vanishing the system

has no solutions

2. If dj = 0 for all l + 1 ≥ j ≥ n and l < m the system has infinitely many

solutions. The non-pivotal variables can be taken as free (undertermined)
variables. All pivotal variables are expressed via (only)the non-pivotal ones.

3. If dj = 0 for all l + 1 ≥ j ≥ n and l = m the system has a unique solution:

xi = di , i = 1, . . . , m.

Example 5.7. Determine how many solutions there is to the system of equations

x + 2y − z = 0
2x + 5y + 2z = 0
x + 4y + 7z = 0
x + 3y + 3z = 0

Solution. After downsweeping we obtain

     
1 2 −1 0 1 2 −1 0 1 2 −1 0
 2 5 2 0   0 1 4 0 
→ 0 1 4 0 
 
 →
 1 4 7 0   0 2 8 0   0 0 0 0 
1 3 3 0 0 1 4 0 0 0 0 0

We see that in the echelon form there are two equations for 3 unknowns (l = 2 <
m = 3) so the system has infinitely many solutions.

Example 5.8. Solve the system

x + 3y − 5z + w = 4
2x + 5y − 2z + 4w = 6

Solution. After downsweeping we obtain

1 3 −5 1 4 1 3 −5 1 4
→
2 5 −2 4 6 0 −1 8 2 −2

The matrix is now in the echelon form

We have l = 2 < m = 4 so the system has infinitely many solutions. The pivotal
variables are x and y so we choose z and w as free variables. We can now use
Page 93 Dr Adrian Turcanu & Dr Mateja Presern

backsubstitutions to express x and y in terms of z and w. The bottom equation

reads −y + 8z + 2w = −2 that gives y = 2 + 8z + 2w. We substitute this expression
into the first equation. We obtain

x + 3(2 + 8z + 2w) − 5z + w = 4

After some elementary algebra we obtain from this equation x = −2 − 19z − 7w.
The general solution is therefore

(x, y, z, w) = (−2 − 19z − 7w, 2 + 8z + 2w, z, w)

The non-pivotal variables w, z are arbitrary.

Definition 5.9. The number of non-zero rows in the echelon form of matrix A is
called the rank of A, and is denoted rank(A).

Example 5.10. Find the rank of

 
1 1 1 2

 3 4 4 7 
A=
 −2 1 2 −3 

 5 3 4 6 
4 5 3 13

Solution. To find the rank we bring the matrix to echelon form by downsweeping:
     
1 1 1 2 1 1 1 2 1 1 1 2
 3 4 4 7   0 1 1 1   0 1 1 1 
     
 −2 1 2 −3  →  0 3 4 1  →  0 0 1 −2 
 
   
 5 3 4 6   0 −2 −1 −4   0 0 1 −2 
4 5 3 13 0 1 −1 5 0 0 −2 4
   
1 1 1 2 1 1 1 2

 0 1 1 1 


 0 1 1 1 

 0 0 1 −2 
 → 
 0 0 1 −2 

 0 0 1 −2   0 0 0 0 
0 0 −2 4 0 0 0 0
The matrix is now in echelon form. There are 3 non-zero rows so rank(A) = 3.

5.1.1 Exercises
Discrete Mathematics Page 94

Exercise 5.1.1. Solve by Gaussian elimination the system of equations

2x − 2y + 4z = −6
2x − y + 6z = −6
−x + 2y − 3z = 6

Exercise 5.1.2. Solve by Gaussian elimination the system of equations

x + 2y + 3z = 1
2x + 3y + 4z = 2
−x + y + 2z = 1

Exercise 5.1.3. Simplify by Gaussian elimination the augmented matrix

 
1 2 −2 3 2
 2 4 −3 4 5 
5 10 −8 11 12

Find the general solution of this system.

Exercise 5.1.4. Solve the system

2x + 8y + 6z = 20
4x + 2y − 2z = −2
−6x + 4y + 10z = 24

Exercise 5.1.5. Solve the system

x − 3y + 4z + 2w = 5
2y + 5z + w = 2
y − 3z = 4
Page 95 Dr Adrian Turcanu & Dr Mateja Presern

Exercise 5.1.6. Determine which condition must be placed on a,b, c so that the
following system in unknowns x, y, z has a solution

x + 2y − 3z = a
2x + 6y − 11z = b
x − 2y + 7z = c

5.2 Matrices as linear transformations in vector

spaces
We will now describe how to regard a matrix as a linear transformation and what
this tells us about solving systems of linear equations. First we set the scene for
some geometry.
Recall the following notions:
R2 denotes the set of all ordered pairs of real numbers

x
y
which are called two-dimensional vectors.
R3 denotes the set of all ordered triples of real numbers
 
x
 y 
z
called 3-dimensional vectors.
Rn denotes the set of ordered n-tuples
 
x1
 x2 
 
 .. 
 . 
xn

which are called n-dimensional vectors or n-vectors. Rn is called n-dimensional

space. The n-vectors have the following properties
1. We can add any two n-vectors by adding their components:
     
x1 y1 x1 + y1
 x2   y2   x2 + y2 
 ..  +  ..  = 
     
.. 
 .   .   . 
xn yn xn + yn
Discrete Mathematics Page 96

We can multiply any vector by a number λ by multiplying each component:

   
x1 λx1
 x2   λx2 
λ  ..  = 
   
.. 
 .   . 
xn λxn

There is a special n-vector called the zero vector whose every component is
a zero:  
0
 0 
0 =  .. 
 
 . 
0

Let A = (Aij ) be an m × n matrix and let

 
x1
 x2 
X =  .. 
 
 . 
xn
be an n × 1 matrix regarded as an n-vector or a point in Rn . The matrix product
AX is an m × 1 matrix which can be regarded as an m-vector or a point in Rm . So
A gives rise to a linear transformation:
X 7→ AX .

This mapping is sometimes denoted as:

TA : Rn → Rm .

Example 5.11. Let  

1 −3 4 −2
A= 0 2 5 1 
0 1 −3 0
and let  
4
 −2 
X=
 3 


−1
be a point in R4 . Then  
24
TA (X) := AX =  10 
−11
Page 97 Dr Adrian Turcanu & Dr Mateja Presern

is a point in R3 .

Example 5.12 (Rotations in 3d). Consider R3 . Rotations of 3-dimensional space

about the x3 -axis by an angle ϕ are implemented by the matrix
 
cos(ϕ) sin(ϕ) 0
Rϕ =  − sin(ϕ) cos(ϕ) 0  (5.5)
0 0 1

so that the corresponding linear transformation sends a vector X into a rotated

vector TRϕ (X), e.g. for ϕ = π/4
   √ 
1 3/√2
X=  2  7 TRπ/4 (X) = Rπ/4 X =
→  1/ 2  . (5.6)
3 3

For any linear transformation TA : Rn → Rm , we can define the null-space of

TA as a space consisting of all vectors in Rn which TA ”kills”, that is maps onto the
zero vector 0 ∈ Rm . These are all vectors X ∈ Rn such that

TA (X) = 0

that is the vectors from the null space are solutions to the homogeneous system
AX = 0. A trivial solution is of course always there. To solve the homogeneous
system we apply Gaussian elimination to the augmented matrix (A|0) to bring it
to reduced echelon form. The sequence of elementary row operations employed for
that is exactly the same we used on (A|Y ) to find the rank of A.

Example 5.13. Take as A the matrix

 
1 1 1 1 1
 3 2 1 1 −3 
A=  0

1 2 2 6 
5 4 3 3 −1

Applying Gaussian elimination we bring this matrix to the reduced echelon form
 
1 0 −1 −1 −5
 0 1 2 2 6 
A=  0 0

0 0 0 
0 0 0 0 0
Discrete Mathematics Page 98

This gives a solution X ∈ R5 to the equation AX = 0:

   
x1 x3 + x4 + 5x5
 x2   −2x3 − 2x4 − 6x5 
   
X=  x3  =  x3
  

 x4   x4 
x5 x5

which we can recast as

     
1 1 5
 −2   −2   −6 
     
X = x3  1  + x4  0  + x5  0  := x3 u + x4 v + x5 w
    
 0   1   0 
0 0 1

So any vector in the null space of TA must be a linear combination of the three vectors
u,v, w. The null space is thus an R3 sitting inside R5 . The number of independent
vectors in the null space is called the nullity of A, and is the dimension of the null
space of TA . Thus in the above example nullity(A) = 3.

Notice now that in the above example the sum of the rank and the nullity equals
the dimension of the space on which TA acts:

rank(A) + nullity(A) = 2 + 3 = 5

This is no accident as the following general relation holds known as the rank-nullity
theorem.

Theorem 5.14 (Rank-nullity theorem). For any m × n matrix A we have

rank(A) + nullity(A) = n.

5.2.1 Exercises

Exercise 5.2.1. Find the rank and nullity of

 
1 1 −2 4
A =  2 2 −3 1 
3 3 −4 −2
Page 99 Dr Adrian Turcanu & Dr Mateja Presern

Exercise 5.2.2. Find the rank and nullity of

 
1 3 −2 5 −3
C= 2 7 −3 7 −5  .
3 11 −4 10 −9

5.3 Eigenvectors and eigenvalues

Suppose that A is a square n × n matrix; A maps Rn to itself. We can ask if there
are any non-zero vectors that A just stretches when it acts on them:
AX = λX
where λ is a number. For λ = 0 the corresponding vectors belong to the null space
of A. For λ = 1 the corresponding vector is called a fixed point because it does not
change under the action of A.

Example 5.15. Let

   
1 −3 3 1
A =  3 −5 3  X= 1 
6 −6 4 2

then    
4 1
AX =  4  = 4 1  = 4X .

8 2

In matrix language we seek a vector X and a number λ such that

AX = λX .
Such a λ (if exists) is called an eigenvalue of A, and the corresponding (non-zero)
X is called an eigenvector.

A matrix can have more than one eigenvector and eigenvalue. In the previous
example the matrix A was shown to have an eigenvalue λ = 4 with eigenvector
 
1
X= 1 
2
One can check that λ = −2 is also an eigenvalue of A with corresponding eigenvectors
   
1 1
X1 =  1  X2 =  0 
0 −1
Discrete Mathematics Page 100

that is AX1 = −2X1 and AX2 = −2X2 . This example also shows that there can
be more than one eigenvector corresponding to the same eigenvalue. In fact, for the
above example, one can check that any linear combination

Y = C 1 X1 + C 2 X2

where C1 , C2 are constants (not both equal to zero), is also an egenvector of A with
eigenvalue −2.

The problem of finding the eigenvalues of an n × n matrix A is equivalent to

finding for what values of λ the equation AX = λX has a solution. This system of
equations can be rewritten as

(A − λI)X = 0

where I is the n × n identity matrix. One can use the Gaussian elimination to
bring the augmented matrix (A − λI|0) to to a reduced echelon form. Then the
homogeneous system at hand has a nontrivial solution in the reduced echelon form
there is at least one zero row. This imposes a condition on the values of λ.

Example 5.16. Find the eigenvalues of matrix

1 2
A=
1 0

Solution. Consider the matrix

1−λ 2
A − λI =
1 −λ

Downsweeping in the first column we obtain

1−λ 2
2
0 −λ − 1−λ

This matrix has a nontrivial null space if the lower right entry is zero, that is, if

λ2 − λ − 2 = (λ − 2)(λ + 1) = 0

Therefore A has exactly 2 eigenvalues, λ = 2 and λ = −1.

The presence of a variable λ in the matrix made Gaussian elimination messier

than usual. Even worse: our computation has been incomplete as we tacitly assumed
that λ − 1 ̸= 0. This assumption effectively excluded the case λ = 1 from our
computations and one needs to check separately that 1 is not an eigenvalue of A. A
more straightforward way to compute eigenvalues is by using the determinants.
Page 101 Dr Adrian Turcanu & Dr Mateja Presern

Definition 5.17. The determinant of a 2 × 2 matrix A is the number

a b
det(A) := det = ad − bc . (5.7)
c d

The determinant of a 3 × 3 matrix is a number defined as

 
a b c
e f d f d e
det  d e f  := a × det − b × det + c × det
h i g i g h
g h i
where the determinants of 2 × 2 matrices are computed according to formula (5.7).
In the last example det(A) = 1 · 0 − 2 · 1 = −2.

Example 5.18. For the matrix A defined in example 6.17

 
1 −3 3
−5 3 3 3 3 −5
det  3 −5 3  = 1 × det − (−3) × det + 3 × det
−6 4 6 4 6 −6
6 −6 4
= (−5 × 4 − 3 × (−6)) + 3(3 × 4 − 3 × 6) + 3(3 × (−6) − (−5) × 6)
= −2 + 3 × (−6) + 3 × 12 = 16

In general a determinant of an n × n matrix A = (Aij ) can be defined recursively

as n
X
det(A) = (−1)j+1 A1j × det(A(1,j) )
j=1

where A(1,j) is an (n − 1) × (n − 1) matrix obtained from A by removing from A the

1st row and the j-th column.

We will accept without proof the following result.

Theorem 5.19. Let M be a n × n matrix. The homogeneous system of n equations
for n unknowns
MX = 0
has a nontrivial solution if and only if det(M ) = 0.
We can now apply the above theorem to the problem of finding eigenvalues. A
number λ is an eigenvalue of matrix A if and only if
det(A − λI) = 0 . (5.8)
Definition 5.20. We call the polynomial of degree n
χA (λ) := det(A − λI)
characteristic polynomial of A. Equation (5.8) is called the characteristic equation
of A.
Discrete Mathematics Page 102

Thus the eigenvalues of A coincide with the roots (=solutions) of its characteristic
polynomial.

Example 5.21. Find the eigenvalues of matrix

 
8 −12 5
C=  15 −25 11 
24 −42 19

Solution. The characteristic equation reads

 
8−λ −12 5
0 = χC (λ) = det(C − λI) = det  15 −25 − λ 11 
24 −42 19 − λ
= (8 − λ)[(−25 − λ)(19 − λ) − (−42)(11)] − (−12)[15(19 − λ) − 24 · 11]
+5[15 · (−42) − 24(−25 − λ)] = −λ3 + 2λ2 + λ − 2

The polynomial factorizes to give

(λ − 1)(λ + 1)(λ − 2) = 0

so that the eigenvalues are

λ1 = 1 , λ2 = −1 , λ3 = 2.

To find the eigenvalues and the corresponding eigenvectors of an n × n matrix

A we proceed as follows.

1. Find the characteristic polynomial χA (λ) = det(A − λI)

2. Find the distinct roots λ1 , λ2 , . . . , λk of the characteristic equation χA (λ) = 0.

These are the eigenvalues of A.

3. For each root λi , i = 1, 2, . . . , k, solve the homogeneous equation (A−λi I)X =

0 for
 
x1
 x2 
X =  ..  .
 
 . 
xn

The solutions are the corresponding eigenvectors.

Page 103 Dr Adrian Turcanu & Dr Mateja Presern

Example 5.22. Find the eigenvalues and eigenvectors of

 
1 0 0
A=  0 1 1 
0 1 1
Solution. (Step 1) The characteristic polynomial is
 
1−λ 0 0
χA (λ) = det  0 1−λ 1  = −λ(λ − 1))(λ − 2)
0 1 1−λ
(Step 2) The above polynomial has roots
λ1 = 0 , λ2 = 1 , λ3 = 2 .
These are the eigenvalues of A.
(Step 3) For λ1 = 0 the homogeneous equation (A − λ1 I)X = 0 reads
    
1 0 0 x 0
 0 1 1  y  =  0 
0 1 1 z 0
Solving this system gives the corresponding eigenvector (or any multiple thereof)
 
0
X1 =  1 
−1
For λ2 = 1 we have to solve the system (A − I)X = 0 which reads
    
0 0 0 x 0
 0 0 1  y  =  0 
0 1 0 z 0
Solving this gives the eigenvector (or any scalar multiple thereof) corresponding to
λ2 :  
1
X1 =  0 
0
For λ3 = 2 we have to solve the system (A − 2I)X = 0 which reads
    
−1 0 0 x 0
 0 −1 1   y = 0 
0 1 −1 z 0
Solving this gives the eigenvector (or any scalar multiple thereof) corresponding to
λ3 :  
0
X3 =  1 
1
Discrete Mathematics Page 104

5.3.1 Exercises

Exercise 5.3.1. Find the determinant of matrix

 
1 −1 1
A =  −1 −2 −2  .
1 3 3

Exercise 5.3.2. Find the eigenvalues for the matrix A in the previous exercise.

Exercise 5.3.3. Find the eigenvectors and eigenvalues of the matrix

 
8 −12 5
 15 −25 11  .
24 −42 19

Common questions