Math (Re+Pre = Over)-view
(1312 supplement to 1112 lecture note 1)
Topics
1. Vectors
2. Calculus, with vectors, multiple variables, Taylor series expansion, and integration,
3. Basic trigonometry
Goals
1. To cover our bases for most of the math we will use in this course
2. To remind you on the difference between 1312 and 1112 within the add/drop period
Remarks
1. This is NOT a math course. We do not worry about rigor, convergence, existence, etc.
in our discussion. You can worry about that, but you should direct your worries to your
math teachers. In a true physicist style, we bravely “generalize” any operations to do-
mains in which they may not apply. If the results are physically wrong, then either the
theory and/ or the math generalization are wrong. Else, if the results are physically plau-
sible, then we leave it to the mathematicians to decide whether or not the generalization
is secretly justifiable.
Updates
1. [04.09.2025] Added brief discussion on linear dependence between vectors
1
1. (PHYSICIST’S) VECTORS: BASICS
You hopefully remember how to add two numbers. The sum of two numbers, a and b, is denoted
by a + b.
Now let ⃗a denote a collection of numbers “in parallel,” and let ai be the i-th component. This
parallelism is key to our understanding on how to deal with vectors. For instance, let ⃗b be another
collection of numbers. We can perform parallel addition by adding each component of ⃗a to each
component of ⃗b. This procedure is well-defined only if the number of components in ⃗a equals to
that in ⃗b. We call this “number of components” the dimension of the vector.
[For fun: if you know some programming languages, say Python, you may be tempted to interpret
a “collection of numbers,” like ⃗a, as a tuple. Given two tuples ⃗a and ⃗b, the sum ⃗a + ⃗b is also de-
fined. Does the definition of “+” in your favorite programming language agree with our definition
above? ]
A collection of numbers can be viewed as the coordinate of some points in some space. We
can scale points in space by, e.g., pushing it further away from the origin. This scaling is achieved
by multiply a number s to a vector, ⃗a 7→ s⃗a. Operationally, we again apply the parallelism and
simply define (s⃗a)i := sai . Of course, this secretly assumes we know what it means to multiply s
to ai (if you want the math buzz word, look up field). We call the “scaling numbers,” like s above,
the “scalars.”
So far, we have defined two operations on vectors: (i) Addition ⃗a + ⃗b, and (ii) multiplication by
a scalar r, ⃗a 7→ s⃗a. These operations are eventually traced to the usual addition and multiplication
which we have learnt in elementary schools. We simply “vectorize” by invoking the mentioned
parallelism in our operations. The natural question next is to ask how to extend the known opera-
tions even further, and whether the different operations are compatible. As with any “easy” math
structures, the operations you want to do will be done in the way you expect. We assert
1. Additive inverse: Let −⃗a := (−1)⃗a, then ⃗a −⃗a := ⃗a + (−⃗a) = ⃗0, the zero vector (0, 0, . . . ).
2. Distributive: s(⃗a + ⃗b) = s⃗a + s⃗b .
In typical applications for our physics course, we want to push the “coordinate” interpretation
further. This can be achieved by first defining a set of independent directions, let’s call them x, y,
and z if we are in three-dimensional space, and then define the special unit vectors
êx = x̂ := (1, 0, 0); êy = ŷ := (0, 1, 0); êz = ẑ := (0, 0, 1). (1)
2
From the stated rules, we see that
⃗r = xêx + yêy + zêz = (x, y, z), (2)
which agrees with our coordinate interpretation. Furthermore, the inversion −⃗r = (−x, −y, −z)
can be viewed as going in the opposite direction from the origin as compared to ⃗r.
But what about the “scale,” which we discussed in introducing the scalars? It makes sense that
if you are at a position ⃗r from me now, and some time later you move to the position 2⃗r, then the
distance between you and me is doubled. To extract this “distance,” however, we need to introduce
a new structure. This can be achieved by asking ourselves the following question: can we multiply
two vectors? Given the mentioned parallelism, it is natural to define a product
⃗a, ⃗b 7→ ⃗a · ⃗b := a1 b1 + a2 b2 + · · · , (3)
i.e., we simply perform the element-wise, elementary-school multiplication first, and then add up
all the results. We call this the dot product = inner product. Importantly, this operation takes two
vectors and return a scalar. If we are only given one vector, then the best we can do is to multiply
it to itself. This gives
X
⃗a · ⃗a = a21 + a22 + · · · = a2i . (4)
i
P
The (scary?) symbol is just a short hand of “summing over what is indicated on the right.”
Following the same mindset above, we ask how this new operation plays with the previously
defined operations. Notice
X X
(s⃗a) · (s⃗a) = (sai )2 = s2 a2i = s2⃗a · ⃗a. (5)
i i
So, if we want to extract the magnitude of the scalar s, we can compare
(s⃗a) · (s⃗a) s2⃗a · ⃗a ⃗a·⃗a̸=0 2
= = s. (6)
⃗a · ⃗a ⃗a · ⃗a
Furthermore, we agree the name “unit vector” means that êx ·êx = êy ·êy = êz ·êz = 1. Furthermore,
from our definitions one can check that êx · êy = êy · êz = êz · êx = 0. It is in this sense that we say
they are “independent directions.” (The better, but mathematically more involved, word choice
here would have been “orthogonal,” since linearly dependence is something different...) Then, we
define
√
s
3D
X p
|⃗a| := ⃗a · ⃗a = a2i = x2 + y 2 + z 2 , (7)
i
3
which simply extracts the magnitude of the scale of a vector ⃗a compared to the previously agreed
unit vectors.
[ For fun: what are the other natural ways to define products between two vectors? Must we return
a scalar? ]
Importantly, for any non-zero vector we can always write
1
⃗a = |⃗a| ⃗a . (8)
|⃗a|
Notice 2
1 1 1 1
⃗a = ⃗a · ⃗a = 2 |⃗a|2 = 1, (9)
|⃗a| |⃗a| |⃗a| |⃗a|
and so this is a unit vector pointing in the same direction as ⃗a. We usually denote it by â, and with
that we can write
⃗a = |⃗a|â. (10)
This is interpreted as saying that a vector is specified by its (i) magnitude |⃗a| and (ii) direction â.
In closing, we note one last important concept about vectors: linear dependence. Let’s start
from elementary school again with real numbers. Let a, b ̸= 0 be two real numbers. Suppose we
want to solve for s and t such that
b
sa + tb = 0 ⇒ s = −t . (11)
a
This is always solvable. Now, promoting to vectors ⃗a and ⃗b, we ask if
s⃗a + t⃗b = ⃗0 (12)
has solution for some real numbers s, t or not. From the mentioned parallelism, we need to solve a
collection of Eqs. (11), one for each component, simultaneously using only two parameters s and t
(in fact, only one parameter: do you see why?). You don’t expect to be able to solve a large number
of tasks with very little resources. Indeed, if a solution exists, then we have s⃗a = −t⃗b, meaning
they have equal magnitude but opposite directions. The magnitude part is always solvable, as we
can simply set
|⃗b|
s = ±t . (13)
|⃗a|
For the direction part, however, the scalars s and t can at most flip a vector to its opposite direction,
and the problem becomes solvable only when
â = ∓b̂. (14)
4
In other words, Eq. (12) has a solution only for some special pairs of vectors which have either
aligned or opposite directions. If this condition is not satisfied, then no values of s and t will
solve it. We say ⃗a and ⃗b are linearly dependent if Eq. (12) has solution, and linearly independent
otherwise. The idea generalizes to a collection of more than two vectors too.
2. BASIC (PHYSICISTS’) CALCULUS
Next, we go from elementary school (arithmetics of numbers) to high school (calculus).
i. Vector
First, we recall what a function f is essentially an evaluation f : x 7→ f (x), which can, for
instance, take a real number to a real number. We know how to package a collection of numbers
a1 , a2 , . . . into a vector ⃗a. Then we know how to package a collection of functions f1 , f2 , . . . into
a vector-valued function f⃗:
f⃗ : x 7→ f⃗(x). (15)
If that is too abstract (indeed, this is quite abstract!), it may help to write out more explicitly that
f⃗(x) := (f1 (x), f2 (x), · · · ). (16)
If that is still too abstract, then imagine we have a particle moving in 3D space as a function of
time. Let t ∈ R (i.e., the real numbers) denote the time variable, and let the coordinates be denoted
by x, y, and z. As the particle is moving, the coordinates should all be time-dependent, i.e., they
are functions of time. Then from the discussion above, the position vector is a vector-valued
function of time
⃗r(t) := (x(t), y(t), z(t)) = x(t)êx + y(t)êy + z(t)êz . (17)
Notice that we have implicitly assumed that the unit vectors êx,y,z , which define the coordinates in
the first place, are time-independent.
[Think: What happens when they are also time-dependent? In fact, can we tell if they are time-
dependent or not? ]
Now, to do calculus on vector-valued functions, we simply invoke the very same parallelism
again. Define
d ⃗ d d
f (t) := f1 (t), f2 (t), · · · . (18)
dt dt dt
5
Since dfi /dt is still a function of t, you can take as many derivatives as you like (provided the
derivatives make sense for each of the component functions).
ii. Multi-variable
Another generalization we need is to consider the case when a function depends on multiple
variables. Recall, the derivative is, by definition, a measure of how the function is locally changing
around the current point of evaluation. When you have multiple variables, there are multiple ways
for your function to change. For the applications in our course, we often find it natural to package
the “multiple variables” into a vector. For instance, consider a function U : ⃗r 7→ U (⃗r), which
maps a point ⃗r in 3D space to a real number U (⃗r). More explicitly, if we have ⃗r := (x, y, z),
then the function U depends on the three different variables U (x, y, z). Now, we can ask how U is
changing as the different variables are varied. In the usual “control experiment” mindset we don’t
want to change multiple things simultaneously. This invites us to define the partial derivative
∂U U (x + δx, y, z) − U (x, y, z)
(x, y, z) := lim . (19)
∂x δx→0 δx
The left-hand-side should be read as “the function ∂U/∂x evaluated at the point (x, y, z).” The
other partial derivatives are defined similarly
∂U U (x, y + δy, z) − U (x, y, z) U (x, y, z + δz) − U (x, y, z)
(x, y, z) := lim ; lim . (20)
∂y δy→0 δy δz→0 δz
So, from the same (multi-variable) function U , we defined three different ways to perform a
“partial derivative” denoted by ∂x , ∂y and ∂z . However, if you look around the physical space
you are in, you would agree it is arbitrary for us to fix which three are the “special directions.”
In other words it will be more natural to define a partial derivative which can work for any other
directions than the arbitrarily chosen êx,y,z . Towards that goal, let us define a “vector-valued partial
derivative” by, once again, invoking the mentioned parallelism:
⃗ ∂U ∂U ∂U
∇U (x, y, z) := (x, y, z), (x, y, z), (x, y, z) . (21)
∂x ∂y ∂z
In addition, we recall the “component” form can be equally expressed as “sum of scalars multiplied
to the basis vectors” as in Eq. (17), i.e.,
⃗ (x, y, z) = ∂U (x, y, z)êx + ∂U (x, y, z)êy + ∂U (x, y, z)êz .
∇U (22)
∂x ∂y ∂z
6
Keeping the argument (x, y, z) implicit and rearranging the expression, we get
⃗ = êx ∂U + êy ∂U + êz ∂U .
∇U (23)
∂x ∂y ∂z
⃗ as an operation applied to the (scalar, multi-varible) function U , with the
We can think of ∇
definition
⃗ := êx ∂ + êy ∂ + êz ∂ .
∇ (24)
∂x ∂y ∂z
This is the usual definition you will see. For our purpose, it is nothing more than a shorthand
of the explicit, component-based expression Eq. (21). In addition, if you want to consider how
the function U changes as we go from ⃗r to some other point ⃗r + ⃗δ, we naturally consider the dot
product ⃗δ · ∇U
⃗ . (Convince yourself that this checks out when we take ⃗δ = êx , for instance.) This
provides a more basis-independent way to thinking about how the multi-varaible function U is
changing in the direction ⃗δ.
To make things less abstract(?), suppose we say there is a vector-valued physical quantity F⃗
which depends on another scalar-valued physical quantity U through
F⃗ = −∇U.
⃗ (25)
All that really means is that the x component of F⃗ is given by the x-partial derivative Fx = −∂x U ,
and likewise for the y and z components.
[In the above, we treated the different variables as independent, but what if they are not? For
instance, maybe U (x, y, z, t) depends on both space and time, and we want to study the change
in U experienced by a particle which is moving according to some time-dependent position ⃗r(t),
such that ultimately x, y, z we care about are also dependent on time. Look up “total derivative”
if you want to learn more. ]
iii. Taylor series
As mentioned, the derivative is a (local) way to understand how a function is changing as its
argument is varied. If higher derivatives (= derivative of derivative of derivative · · · ) are given,
then you know more. If you are given all possible derivatives, then may be you know everything
about the function (in a neighborhood :).
To make things more concrete. Suppose I have a real-valued function f (x), and you want to
know what is f (x) in some neighborhood near x0 . I resist telling you what f (x) is, but I can tell
7
df
you everything about f and its derivative at x0 , i.e., I can provide the real numbers f (x0 ), dx
(x0 ),
d2 f
dx2
(x0 ),. . . . How do you reconstruct the value of f (x) for some x ̸= x0 ? The natural answer is
the Taylor series
near x df 1 d2 f 1 d3 f
f (x) = 0 f (x0 ) + (x − x0 ) + (x − x0 )2 + (x − x0 )3 + · · · , (26)
dx x0 2 dx2 x0 6 dx3 x0
df
here, we introduce another common notation dx x0
meaning we evaluate the derivative at the point
x0 . You can convince yourselves that the mysterious factors of 1/2, 1/6 etc are whatever it takes
to make the series exact when f (x) is a polynomial to start with.
[We never promise that your reconstruction will work – that depends on how “nice” the function
f (x) is. For our course, you can assume that it always works when you are asked to consider the
Taylor series expansion. ]
Note: while one can generalize the discussion above readily to the case of a multi-variable
function, we refrain from providing a systematic discussion here as that will be an overkill for our
course.
iv. Integration = anti-differentiation
Since you have learnt some calculus as your prerequisite, you must have also learnt something
about integration. Differentiation is a task and integration is an art. We don’t have much to say
about integrations beyond:
1. Indefinite integrals are anti-derivatives, i.e., you ask yourself what expression would return
the integrand you are given when you differentiate it.
2. Definite integrals return the difference of evaluating the anti-derivative at the two stated
points.
In addition, by the very same parallelism we mentioned repeatedly, we can extend integration
to a vector-valued function. For instance, suppose we define J⃗ := F⃗ (t)dt. What that means is,
R
component-by-component,
Z Z Z
Jx = Fx (t)dt; Jy = Fy (t)dt; Jz = Fz (t)dt. (27)
8
This corresponds to multiply the “scalar” dt to F⃗ and then integrate. Since we have also introduced
how to multiply a vector to another vector, we can also imagine an alternative kind of integral
Z Z Z Z
⃗
F · d⃗r := Fx dx + Fy dy + Fz dz. (28)
γ γ γ γ
This is called a line integral, which is specified together with a path γ. Along the path γ, we
first evaluate the dot product between the vector-valued function F⃗ and d⃗r, which captures the
infinitesimal segments (both length and direction) along the path, and then integrate = sum the
numbers we get along the way.
3. BASIC TRIGONOMETRY
Trigonometry is, of course, useful in geometry, say in analyzing free-body diagrams and find
the component of a force along a direction etc. Aside from that, they provide good “model func-
tions” for describing more general functions. We assume you are familiar with the basic definitions
and manipulations of trigonometric functions. Handy identities like double angle formulae etc will
be provided in the exams as part of the formula sheet.