Chapter 2
Instructions: Assembly Language
Reading: The corresponding chapter in the 2nd edition is Chapter 3, in the 3rd edition it
is Chapter 2 and Appendix A and in the 4th edition it is Chapter 2 and Appendix B.
2.1 Instructions and Instruction set
The language to command a computer architecture is comprised of instructions and the
vocabulary of that language is called the instruction set. The only way computers can rep-
resent information is based on high or low electric signals, i.e., transistors (electric switches)
being turned on or off. Being limited to those 2 alternatives, we represent information in com-
puters using bits (binary digits), which can have one of two values: 0 or 1. So, instructions
will be stored in and read by computers as sequences of bits. This is called machine language.
To make sure we don’t need to read and write programs using bits, every instruction will also
have a ”natural language” equivalent, called the assembly language notation. For example,
in C, we can use the expression c = a + b; or, in assembly language, we can use add c, a, b
and these instructions will be represented by a sequence of bits 000000 · · · 010001001 in the
computer. Groups of bits are named as follows:
bit 0 or 1
byte 8 bits
half word 16 bits
word 32 bits
double word 64 bits
Since every bit can only be 0 or 1, with a group of n bits, we can generate 2n different
combinations of bits. For example, we can make 28 combinations with one byte (8 bits),
216 with one half word (16 bits), and 232 with one word (32 bits). Please note that we
are not making any statements, so far, on what each of these 2n combinations is actually
representing: it could represent a number, a character, an instruction, a sample from a
digitized CD-quality audio signal, etc. In this chapter, we will discuss how a sequence of 32
bits can represent a machine instruction. In the next chapter, we will see how a sequence of
32 bits can represent numbers.
1
2 CHAPTER 2. INSTRUCTIONS: ASSEMBLY LANGUAGE
2.2 MIPS R2000
The instruction set we will explore in class is the MIPS R2000 instruction set, named
after a company that designed the widely spread MIPS (Microprocessor without Interlocked
Pipeline Stages) architecture and its corresponding instruction set. MIPS R2000 is a 32-bit
based instruction set. So, one instruction is represented by 32 bits. In what follows, we will
discuss
• Arithmetic instructions
• Data transfer instructions
• Decision making (conditional branching) instructions
• Jump (unconditional branching) instructions
It is important to keep in mind that assembly language is a low-level language, so instructions
in assembly language are closely related to their 32-bit representation in machine language.
Since we only have 32 bits available to encode every possible assembly instruction, MIPS
R2000 instructions have to be simple and follow a rigid structure.
2.2.1 Arithmetic instructions
If we want to instruct a computer to add or subtract the variables b and c and assign their
sum to variable a, we would write this as follows in MIPS R2000:
add a, b, c ⇐⇒ a = b + c; as in C language
|{z} | {z }
operation operands
sub a, b, c ⇐⇒ a = b - c; as in C language
|{z} | {z }
operation operands
The operation defines which kind of operation or calculation is required from the CPU. The
operands (or, arguments) are the objects involved in the operation. Notice that each of the
previous MIPS R2000 instructions performs 1 operation and has exactly 3 operands. This
will be the general format for many MIPS R2000 instructions since, as we mentioned before,
we want MIPS R2000 instructions to have a rigid, simple structure. In the case of add, this
implies only two operands can be added at a time. To calculate additions of more than 2
numbers, we would need multiple instructions. The following example illustrates this.
Example 2.2.1
The operation a = b+c+d; can be implemented using one single instruction in C language.
However, if we want to write MIPS assembly code to calculate this sum, we need to write
this addition as a series of two simpler additions
a = b + c;
a = a + d;
2.2. MIPS R2000 3
such that there are only three operands per operation (addition in this case). The
corresponding MIPS code is given by:
add a,b,c
add a,a,d
So, we need multiple instructions in MIPS R2000 to compute the sum of 3 variables.
However, each instruction will be simple (so it can be represented using the 32 bits we have
available) and very fast in hardware. Similarly, to compute a = (b+c)-(d+e); we proceed
as follows
add t0 ,b,c
add t1 ,d,e
sub a, t0 , t1
Registers
In a high-level programming language such as C, we can (virtually) declare as many variables
as we want. In a low-level programming language such as MIPS R2000, the operands of
our operations have to be tied to physical locations where information can be stored. We
cannot use locations in the main physical memory for this, as such would delay the CPU
significantly (indeed, if the CPU would have to access the main memory for every operand
in every instruction, the propagation delay of electric signals on the connection between the
CPU and the memory chip would slow things down significantly). Therefore, the MIPS
architecture provides for 32 special locations, built directly into the CPU, each of them able
to store 32 bits of information (1 word), called “registers”. A small number of registers
that can be accessed easily and quickly will allow the CPU to execute instructions very fast.
As a consequence, each of the three operands of a MIPS R2000 instruction is restricted to
one of the 32 registers.
For instance, each of the operands of add and sub instructions needs to be associated
with one of the 32 registers. Each time an add or sub instruction is executed, the CPU will
access the registers specified as operands for the instruction (without accessing the main
memory).
The instruction
add $1, $2, $3
means “add the value stored in the register named $2 and the value stored in the register
named $3, and then store the result in the register named $1.” The notation $x refers to
the name of a register and, by convention, always starts with a $ sign. In this text, if we
use the name of a register without the $ sign, we refer to its content (what is stored in the
register), for example, x refers to the content of $x.
Large, complex data structures, such as arrays, won’t fit in the 32 registers that are
available on the CPU and need to be stored in the main physical memory (implemented on a
different chip than the CPU and capable of storing a lot more information). To perform, e.g.,
4 CHAPTER 2. INSTRUCTIONS: ASSEMBLY LANGUAGE
arithmetic operations on elements of arrays, elements of the array first need to be loaded into
the registers. Inversely, the results of the computation might need to be stored in memory,
where the array resides.
Register (32 bits) Memory (8 bits)
$0
$1
There are 32 general registers 232 different memory locations (4 GByte)
As shown in the figure above, one register contains 32 bits (1 word) and one memory cell
contains 8 bits (1 byte). Thus, it takes 4 memory cells (4 · 8 bits) to store the contents of
one register (32 bits). For a 32-bit machine (using 32-bit memory addresses), there are 232
different memory addresses, so we could address 232 memory locations, or 4 Gbyte of memory.
To transfer data between registers and memory, MIPS R2000 has data transfer instruc-
tions.
2.2.2 Data transfer instructions
To transfer a word from memory to a register, we use the load word instruction: lw
lw $r1,
| {z } | {z } ⇐⇒ r1 = mem[100 + r2]
100 ($r2)
|{z}
to be loaded offset base register
Again, this MIPS R2000 instruction performs one operation and has 3 operands. The first
operand refers to the register the memory content will be loaded into. The register specified
by the third operand, the base register, contains a memory address. The actual memory
address the CPU accesses is computed as the sum of “the 32-bit word stored in the base
register ($r2 in this case)” and “the offset” (100 in this case). Overall, the above instruction
will make the CPU load the value stored at memory address [100+r2] into register $r1.
Also, it needs to be pointed out that a lw instruction will not only load mem[100 + r2]
into a register, but also the content of the 3 subsequent memory cells, at once. The 4 bytes
from the 4 memory cells will fit nicely in a register that is one word long.
To transfer the content of a register to memory, we use the store word instruction: sw
sw $r1, |{z}
| {z } | {z } ⇐⇒ mem[100 + r2] = r1
100 ($r2)
to be stored offset base register
The structure of this instruction is similar to lw. It stores the content of $r1 in memory,
starting at the memory address obtained as the sum of the offset and the 32-bit word stored
2.2. MIPS R2000 5
in the base register. The 3 subsequent memory cells are also written into (to store all 32 bits
stored in $r1).
Example 2.2.2
Let A be an array of integers (each represented by a 32-bit word), with the base address of
A stored in register $3. Assume that the constant h is stored in register $2. We can
implement
A[12] = h+A[8];
in MIPS R2000 as
lw $0, 32($3) # load A[8] to $0
add $0, $0, $2 # add h and $0
sw $0, 48($3) # store the sum in A[12]
In the first instruction, we use 32 as the offset since one integer is represented by 4 bytes,
i.e., 4 memory cells, so the 8th element of the array is stored 32 bytes away from the base
address. Similarly, the last instruction uses an offset of 48 (12 times 4 bytes). The # sign
allows to insert comments, similar to using ”//” in C.
Loading and storing just one byte
To load just one byte from memory (stored, e.g., at memory address r2 + 100) into a
register, e.g., $r1, we can use the following instruction
lb $r1, 100 ($r2)
This instruction (load byte) loads the byte into the 8 rightmost bits of the register (as we
will see later, the 8 bits will be sign-extended to 32 bits, to fill the entire register). Similarly,
the following instruction (store byte) allows to store the 8 rightmost bits of a register, e.g.,
$r1, into memory, at the address r2 + 100:
sb $r1, 100 ($r2)
2.2.3 Adding a constant
The add instruction we introduced earlier adds the contents of two registers. To add a
constant to the content of a register would require to first load the constant value from
memory into some register and then execute an add instruction to add the content of two
registers. This requires two instructions, including one data transfer between memory and
a register, which can be time consuming. Using the add immediate instruction, this can
be done more efficiently:
addi $r1, $r2, 4 ⇐⇒ r1 = r2 + 4
This allows to add a constant and a register value with just one instruction (without data
transfer). In general, immediate instructions will have two registers and one constant as
operands.