0% found this document useful (0 votes)
16 views43 pages

Slides 2

UCSD ML

Uploaded by

Manu Vega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views43 pages

Slides 2

UCSD ML

Uploaded by

Manu Vega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Linear Algebra Concepts

Nuno Vasconcelos
(Ken Kreutz-Delgado)

UCSD
Vector spaces
• Definition: a vector space is a set H where
– addition and scalar multiplication are defined and satisfy:
1) x+(x’+x’’) = (x+x’)+x” 5) x H
2) x+x’ = x’+x H 6) 1x = x
3) 0 H, 0 + x = x 7) ( ’ x) = ( ’)x
4) –x H, -x + x = 0 8) (x+x’) = x + x’
( scalar; x, x’, x” H ) 9) ( + ’)x = x + ’x

• the canonical example is Rd with standard


vector addition and scalar multiplication
ed ed
x
x

x+x’ x

x’ e1 e1

e2 e2
Vector spaces
• But there are much more interesting examples
• E.g., the space of functions f:X R with
(f + g)(x) = f(x) + g(x) ( f)(x) = f(x)
• Rd is a vector space of
finite dimension, e.g.
– f = ( f1 , ... , fd )T
• When d goes to infinity
we have a function
– f = f (t )
• The space of all functions
is an infinite dimensional
vector space
Data Vector Spaces
• In this course we will talk a lot about “data”
• Data will always be represented in a vector space:
– an example is really just a point (“datapoint”) on such a space
– from above we know how to perform basic operations on datapoints
– this is nice, because datapoints can be quite abstract
– e.g. images:
 an image is a function
on the image plane
 it assigns a color f(x,y) to
each image
location (x,y)
 the space of images
is a vector space (note: assumes
that images can be negative)
 this image is a point in
Images
• Because of this we can manipulate images by
manipulating their vector representations
• E.g., Suppose one wants to “morph” a(x,y) into b(x,y):
– One way to do this is via the path along the line from a to b.

c( ) = a + (b-a)
= (1- ) a + b
b-a
– for = 0 we have a
– for = 1 we have b (b-a)
b
– for in (0,1) we have a point
on the line between a and b
a
• To morph images we can simply
apply this rule to their vector
representations!
Images
• When we make b-a
c(x,y) = (1- ) a(x,y) + b(x,y) (b-a)
b
we get “image morphing”:
=0 =0.2 =0.4 a

=0.6 =0.8 =1

• The point is that this is possible because the images are


points in a vector space.
Images
• Images are usually represented as points in Rd
– Sample (discretize) an image on a finite grid to get an array of pixels
a(x,y) a(i,j)
– Images are always stored like this on digital computers
– stack all the rows into a vector. E.g. a 3 x 3 image is converted into
a 9 x 1 vector as follows:

– In general a n x m image vector is transformed into a nm x 1 vector


– Note that this is yet another vector space
• The point is that there are generally multiple different, but
isomorphic, vector spaces in which the data can be
represented
Text
• Another common type
of data is text
• Documents are
represented by
word counts:
– associate a counter
with each word
– slide a window through
the text
– whenever the word
occurs increment
its counter
• This is the way search
engines represent
web pages
Text
• E.g. word counts for three
documents in a certain corpus
(only 12 words shown for clarity)
• Note that:
– Each document is a d = 12
dimensional vector
– If I add two word count vectors (documents), I get a new word
count vector (document)
– If I multiply a word count vector (document) by a scalar, I get a
word count vector
– Note: once again we assume word counts could be negative (to
make this happen we can simply subtract the average value)
• This means:
– We are once again in a vector space (positive subset of Rd )
– A document is a point in this space
Bilinear forms
• Inner product vector spaces are popular because they
allow us to measure distances between data points
• We will see that this is crucial for classification
• The main tool for this is the inner product (“dot-product”).
• We can define the dot-product using the notion of a
bilinear form.
• Definition: a bilinear form on a real vector space H is a
bilinear mapping
Q: H x H R
(x,x’) Q(x,x’)
“Bi-linear” means that x,x’,x’’ H
i) Q[( x+ ’x’),x”] = Q(x,x”) + ’Q(x’,x”)
ii) Q[x”,( x+ ’x’)] = Q(x”,x) + ’Q(x”,x’)
Inner Products
• Definition: an inner product on a real vector space H is
a bilinear form
<.,.>: H x H R
(x,x’) <x,x’>
such that
i) <x,x> 0, x H
ii) <x,x> = 0 if and only if x = 0
iii) <x,y> = <y,x> for all x and y

• The positive-definiteness conditions i) and ii) make the


inner product a natural measure of similarity
• “nothing can be more similar to x than itself”
• This becomes more precise with introduction of a norm
Inner Products and Norms
• Any inner product induces a norm via

||x||2 = <x,x>

• By definition, any norm must obey the following properties


– Positive-definiteness: ||x|| 0, & ||x|| 0 iff x
– Homogeneity: || x|| = | | ||x||
– Triangle Inequality: ||x + y|| ≤ ||x|| + ||y||

• A norm defines a corresponding metric


d(x,y) = ||x-y||
which is a measure of the distance between x and y
• Always remember that the induced norm changes with a
different choice of inner product!
Inner Product
• Back to our examples:
– In Rd the standard inner product is
d
T
x, y x y xi yi
i 1

– Which leads to the standard Euclidean norm in Rd


d
T
x x x xi2
i 1

– The distance between two vectors is the standard Euclidean


distance in Rd
d
T
d ( x, y ) x y ( x y) ( x y) ( xi yi ) 2
i 1
Inner Products and Norms
• Note, e.g., that this immediately gives
a measure of similarity
between web pages
– compute word count vector xi
from page i, for all i
– distance between page i and
page j can be simply defined as:

d ( xi , x j ) xi xj ( xi x j )T ( xi xj)

– This allows us to find, in the web, the most similar page i to any given
page j.
• In fact, this is very close to the measure of similarity used by
most search engines!
• What about images and other continuous valued signals?
Inner Products on Function Spaces
• Recall that the space of functions is an infinite
dimensional vector space
– The standard inner product is the natural extension of that in Rd
(just replace summations by integrals)

f ( x), g ( x) f ( x) g ( x) dx

– The norm becomes the “energy” of the function

2
f ( x) f 2 ( x)dx
– The distance between functions the energy of the difference
between them

2
d ( f ( x), g ( x)) f ( x ) g ( x) [ f ( x) g ( x)]2 dx
Basis Vectors
• We know how to measure distances in a vector space
• Another interesting property is that we can fully
characterize the vector space by one of its bases
• A set of vectors x1, …, xk is a basis of a vector space H if
and only if (iff)
– they are linearly independent

cx
i i i
0 ci 0, i
– and they span H : for any v in H, v can be written as

v cx
i i i

• These two conditions mean that any v H can be


uniquely represented in this form.
Basis
• Note that
– By making the vectors xi the columns of a matrix X, these two
conditions can be compactly written as
– Condition 1. The vectors xi are linear independent:

Xc 0 c 0
– Condition 2. The vectors xi span H

v 0, c 0 such that v Xc
• Also, all bases of H have the same number of vectors,
which is called the dimension of H
– This is valid for any vector space!
Basis
• example
– A basis
of the vector
space of images
of faces

– The figure
only shows the
first 16 basis
vectors but
there actually
more

– These vectors are


orthonormal
Orthogonality
• Two vectors are orthogonal iff their inner product is zero
– e.g. 2 2 2
sin ax
sin(ax) cos(ax)dx 0
0
2a 0

in the space of functions defined on [0,2 ], cos(ax) and sin(ax)


are orthogonal
• Two subspaces V and W are orthogonal, V W, if
every vector in V is orthogonal to every vector in W
• a set of vectors x1, …, xk is called
– orthogonal if all pairs of vectors are orthogonal.
– orthonormal if all vectors also have unit norm.
0, if i j
xi , x j
1, if i j
Matrix
• an m x n matrix represents a linear operator that maps a vector
from the domain X = Rn to a vector in the codomain Y = Rm

• E.g. the equation y = Ax


sends x in Rn to y in Rm y1 a11 a1n x1
according to     
ym am1 amn xn
X en em Y

x y
A

e1 e1

e2

• note that there is nothing magical about this, it follows rather


mechanically from the definition of matrix-vector multiplication
Matrix-Vector Multiplication I
• Consider y = Ax, i.e. yi = j=1
n aijxj , i = 1,…,m
• We can think of this as
x1 n
yi ai1 ain aij x j ai x (m rows)
j 1
xn

• where “(– ai –)” means the ith row of A. Hence


– the ith component of y is the inner product of (– ai –) and x.
– y is the projection of x on the subspace (of the domain space) spanned
by the rows of A

en X -am- X
ym
x
x A’s action in X

e1 y1 -a1-
y2
e2 -a2-
Matrix-Vector Multiplication II
• But there is more. Let y = Ax, i.e. yi = j=1
n aijxj , now be written as
y1  a11 x1  a1n xn | |
n
 aij x j  a1 x1  a n xn
j 1
ym am1 x1  amn xn | |

• where ai with “|” above and below means the ith column of A.
• hence
– xi is the ith component of y in the subspace (of the co-domain) spanned
by the columns of A
– y is a linear combination of the columns of A

en | Y
X an
|
x
xn y
A maps from X to Y
e1 |
x1
a1
e2 |
Matrix-Vector Multiplication
• two alternative (dual) pictures of y = Ax:
– y = coordinates of x in row space of A (The X = Rn viewpoint)
-am- Domain X = Rn viewpoint
ym
Domain X = Rn x
en
yi ai x ( m rows)
x A y1 -a1-
y2
-a2-
e1 |
an Codomain Y = Rm viewpoint
e2 |
xn y | |
y a1 x1 an xn
x1 | | |
a1
|

– x = coordinates of y in column space of A (Y = Rm viewpoint)


A cool trick
• the matrix multiplication formula

C AB cij aik bkj


k

also applies to “block matrices” when these are defined


properly
• for example, if A,B,C,D,E,F,G,H are matrices,

A B E F AE BG AF BH
C D G H CE DG CF DH

• only but important caveat: the sizes of A,B,C,D,E,F,G,H


have to be such that the intermediate operations make
sense! (they have to be “conformal”)
Matrix-Vector Multiplication
• This makes it easy to derive the two alternative pictures
• The row space picture (or viewpoint):
   x1  
yi ain  ain  ai 1 xn
xnx1 ai x
   xn  

is just like scalar multiplication, with blocks (–ai-) and x


• The column space picture (or viewpoint):

   x1 | | x1 1x1 |
yi ain  ain  a1  an  ai xi
i
   xn | | xn 1x1 |
mx1 mx1

is just a inner product, with (scalar) blocks xi and the


column blocks of A.
Matrix-Vector Multiplication
• two alternative (dual) pictures of y = Ax:
– y = coordinates of x in row space of A (The X = Rn viewpoint)
-am- Domain X = Rn viewpoint
ym
Domain X = Rn x
en
yi ai x ( m rows)
x A y1 -a1-
y2
-a2-
e1 |
an Codomain Y = Rm viewpoint
e2 |
xn y | |
y a1 x1 an xn
x1 | | |
a1
|

– x = coordinates of y in column space of A (Y = Rm viewpoint)


Square n x n matrices
• in this case m = n and the row and column subspaces are
both equal to (copies of) Rn
-an-
yn
x
en

x A y1 -a1-
y2
-a2-
e1 |
an
e2 |
xn y

x1 |
a1
x2 |
|
a2
|
Orthogonal matrices
• A matrix is called orthogonal if it is square and has
orthonormal columns.
• Important properties:
– 1) The inverse of an orthogonal matrix is its transpose
 this can be easily shown with the block matrix trick. (Also see later.)
1 0 0
|
0 1 0
AT A aiT aj
1 n
| n 1
0 0 1
– 2) A proper (det(A) = 1) orthogonal matrix is a rotation matrix
 this follows from the fact that it does not change the norms (“sizes”)
of the vectors on which it operates,
2
Ax ( Ax)T ( Ax) xT AT Ax xT x || x ||2 ,
and does not induce a reflection.
Rotation matrices
• The combination of
1. “operator” interpretation
2. “block matrix trick”
is useful in many situations
• Poll:
– “What is the matrix R that rotates the plane R2 by degrees?”

e2

e1
Rotation matrices
• The key is to consider how the matrix operates on the
vectors ei of the canonical basis
– note that R sends e1 to e’1 e2

r11 r12 1 sin


e'1
r21 r22 0

– using the column space picture cos e1


r11 r12 r11
e'1 1 0
r21 r22 r21
– from which we have the first column of the matrix
r12 cos r12
R e'1
r22 sin r22
Rotation Matrices
• and we do the same for e2
– R sends e2 to e’2

r11 r12 0 r11 r12 r12


e' 2 0 1
r21 r22 1 r21 r22 r22
e2
– from which cos
sin
cos sin
R e'1 e'2
sin cos

– check -sin cos e1

T cos sin cos sin


R R I
sin cos sin cos
Analysis/synthesis
• one interesting case is that of matrices with orthogonal
columns
• note that, in this case, the columns of A are
– a basis of the column space of A
– a basis of the row space of AT
• this leads to an interesting interpretation of the two
pictures
– consider the projection of x into the row space of AT
y = AT x
– due to orthonormality, x can then be synthesized by using the
column space picture
x’ = A y
Analysis/synthesis
• note that this is your most common use of basis
• let the columns of A be the basis vectors ai
– the operation y = AT x projects the vector x into the basis, e.g.
y1 1 0  0 x1 y1 x1
y2 0 1  0 x2 y2 x2
this is called
     the canonical
yn 0 0  1 xn yn xn basis of Rn

– The vector x can then be reconstructed by computing x’ = A y,


e.g.
x'1 1 0 0 y1 x1
x '2 0 1 0 y2 x2
y1 y2  yn
     
x 'n 0 0 1 yn xn

– Q: is the synthesized x’ always equal to x?


Projections
• A: not necessarily! Recall
– y = AT x and x’ = A y
– x’ = x if and only if AAT = I !
– this means that A has to be orthonormal.
• what happens when this is not the case?
– we get the projection of x on the column space of A
– e.g. let 1 0 then x1
100 x1
A 01 y x2
010 x2 x
00 x3
e3 e2
and
10 x1 0 x1 e1 x’
x1
x' 01 0 x2 x2
x2 column space of A =
00 0 0 0 row space of AT
Null Space of a Matrix x null space
of AT
e3 e2
• What happens to the part that is lost? x’
e1
• This is the “null space” of AT
column space of A =
T T row space of AT
N A x| A x 0
– In the example, this is comprised of all vectors of the type 0 since
0
0
T 100 0
A x 0 0
010 0

• FACT: N(A) is always orthogonal to the row space of A:


– x is in the null space iff it is orthogonal to all rows of A
• For the previous example this means that N(AT) is
orthogonal to the column space of A
Orthonormal matrices
• Q: why is the orthonormal case special?
• because here there is no null space of AT
• recall that for all x in N(AT)
– AT x 0 x A0 0
• the only vector in the null space is 0
• this makes sense: 1 0 0
– A has n orthonormal columns, e.g. A 0 1 0
– these span all of Rn 0 0 1
– there is no extra room for an orthogonal space
– the null space of AT has to be empty
– the projection into row space of AT (=column space of A) is the
vector x itself
• in this case, we say that the matrix has full rank
The Four Fundamental Subspaces
• These exist for any matrix:
– Column Space: space spanned by the columns
– Row Space: space spanned by the rows
– Nullspace: space of vectors orthogonal to all rows (also known as
the orthogonal complement of the row space)
– Left Nullspace: space of vectors orthogonal to all columns (also
known as the orthogonal complement of the column space)

• You can think of these in the following way


– Row and Nullspace characterize the domain space (inputs)
– Column and Left Nullspace characterize the codomain space
(outputs)
Domain viewpoint
• Domain X = Rn
– y = coordinates of x in row space of A yi ai x ( m rows)
– Row space: space of “useful inputs”,
which A maps to non-zero output
– Null space: space of “useless inputs”, N ( A) x | Ax 0
mapped to zero
– Operation of a matrix on its domain X = Rn

Null
space -am-
en ym
x
x A

y1 -a1-
e1

e2

– Q: what is the null space of a low-pass filter?


Codomain viewpoint
• Codomain Y = Rm | |
– x = coordinates of y in column space of A y a1 x1 an xn
– Column space: space of “possible outputs”, | |
which A can reach
– Left Null space: space of “impossible L( A) y | yT A 0
outputs”, cannot be reached
– Operation of a matrix on its codomain Y = Rm

Left Null
space |
en
an
xn | y
x A
x1 |
a1
e1
|
e2

– Q: what is the column space of a low-pass filter?


The Four Fundamental Subspaces
Assume Domain of A = Codomain of A. Then:
• Special Case I: Square Symmetric Matrices (A = AT):
– Column Space is equal to the Row Space
– Nullspace is equal to the Left Nullspace, and is therefore
orthogonal to the Column Space
• Special Case II: nxn Orthogonal Matrices (ATA = AAT = I)
– Column Space = Row Space = Rn
– Nullspace = Left Nullspace = {0} = the Trivial Subspace
Linear systems as matrices
• A linear and time invariant system
– of impulse response h[n]
– responds to signal x[n] with output y[n] x[k ]h[n k ]
k
– this is the convolution of x[n] with h[n]
• The system is characterized by a matrix
– note that
y[n] x[k ]g n [k ], with g n [k ] h[n k ]
k
– the output is the projection of the input on the space spanned by
the functions gn[k]
y[1] g1 h[0] h[ 1]  h[ (n 1)] x[1]
y[2] g2 h[1] h[0]  h[ (n 2)] x[2]
x
   
y[n] gn h[n 1] h[n 2]  h[0] x[n]
Linear systems as matrices
• the matrix
h[0] h[ 1]  h[ (n 1)]
h[1] h[0]  h[ (n 2)]
A

h[n 1] h[n 2]  h[0]
– characterizes the response of the system to any input
– the system projects the input into shifted and flipped copies of its
impulse response h[n]
– note that the column space is the space spanned by the vectors
h[n], h[n-1], …
– this is the reason why the impulse response determines the
output of the system
– e.g. a low-pass filter is a filter such that the column space of A
only contains low-pass low pass signals
– e.g. if h[n] is the delta function, A is the identity

You might also like