Module 3
ARM Processor fundamentals
Syllabus
3.1 ARM Processor architecture -The Acorn RISC Machine, Architectural
inheritance, The ARM programmer's model, ARM development tools.
3.2 ARM Assembly Language Programming Data processing instructions, Data
transfer instructions, Control flow instructions, writing simple assembly language
programs.
3.3 ARM Organization and Implementation Three stage pipeline ARM
organization, Five stage pipeline ARM organization, ARM instruction execution,
ARM implementation, The ARM coprocessor interface.
ARM basics
ARM Basics
ARM Basics
ARM Basics
ARM Basics
ARM Basics
The Acorn RISC Machine
The RISC ( Reduced Instruction Set Computer) concept, originated in processor
research programmes at Stanford and Berkeley universities around [Link]
instruction set was reduced from the existing CISC concept(Complex Instruction
Set Computer).
•The ARM processor is a Reduced Instruction Set Computer (RISC).
•The first ARM processor was developed at Acorn Computers Limited, of
Cambridge, England, between October 1983 and April 1985. At that time, and
until the formation of Advanced RISC Machines Limited (which later was renamed
simply ARM Limited) in 1990, ARM stood for Acorn RISC Machine.
Later named as Advanced RISC Machine(ARM)
ARM Architecture
ARM Architecture
ARM Architecture
ARM Architecture
ARM Architecture
ARM Architecture
gjh
ARM Architecture-CPSR
N: Negative; the last ALU operation which changed the flags produced a negative
result (the top bit of the 32-bit result was a one).
• Z: Zero; the last ALU operation which changed the flags produced a zero result
(every bit of the 32-bit result was zero).
• C: Carry; the last ALU operation which changed the flags generated a carry-out,
either as a result of an arithmetic operation in the ALU or from the shifter.
• V: oVerflow; the last arithmetic ALU operation which changed the flags generated
an overflow into the sign bit.
ARM Architecture
gjh
ARM Architecture
Register organization for different modes of operation
ARM Programmers model
ARM Programmers model
When writing user-level programs, only the 15 general-purpose 32-bit registers
(r0 to r14), the program counter (r15) and the current program status register
(CPSR) need be considered.
The remaining registers are used only for system-level programming and for
handling exceptions
ARM Architecture Inheritance
a load-store architecture;
fixed-length 32-bit instructions;
3-address instruction formats.
Single-cycle execution of most of the instructions.
Some Instructions are execute in multi-cycle
ARM development tools
Software development for the ARM is supported by a coherent range of tools
developed by ARM Limited, and there are also many third party and public domain
tools available, such as an ARM back-end for the gcc C compiler.
❑C or assembler source files are compiled or assembled into ARM object format
(.aof) files, which are then linked into ARM image format (.aif) files.
❑ The image format files can be built to include the debug tables required by the
ARM symbolic debugger (ARMsd which can load, run and debug programs either
on hardware such as the ARM Development Board or using a software emulation
of the ARM (the ARMulator).
❑The ARMulator has been designed to allow easy extension of the software model
to include system features such as caches, particular memory timing
characteristics, and so on.
ARM development tools
ARMsd
The ARM symbolic debugger is a front-end interface to assist in debugging
programs running either under emulation (on the ARMulator) or remotely on a
target system such as the ARM development board. The remote system must
support the appropriate remote debug protocols either via a serial line or through
a JTAG test interface
At its most basic, ARMsd allows an executable program to be loaded into the
ARMulator or a development board and run. It allows the setting of breakpoints,
which are addresses in the code that, if executed, cause execution to halt so that
the processor state can be examined. In the ARMulator, or when running on
hardware with appropriate support, it also allows the setting of watchpoints.
These are memory addresses that, if accessed as data addresses, cause execution
to halt in a similar way.
ARMulator (ARM emulator)
▪The ARMulator (ARM emulator) is a suite of programs that models the behaviour
of Various ARM processor cores in software on a host system. It can operate at
various levels of accuracy:
▪Instruction-accurate modelling gives the exact behaviour of the system state
without regard to the precise timing characteristics of the processor.
▪ Cycle-accurate modelling gives the exact behaviour of the processor on a cycle
by-cycle basis, allowing the exact number of clock cycles that a program requires to
be established.
▪Timing-accurate modelling presents signals at the correct time within a cycle,
allowing logic delays to be accounted for.
ARM Instruction set
There are three broad categories of instruction in the ARM assembly language.
These are:
◦Data Processing: this includes arithmetic and logical operations, comparison
operations and register movement operations;
◦Data Movement: these are instructions to load and store data from and to
memory;
◦Control Flow: these are software interrupts and branch instructions that alter
the order of execution.
For ARM data processing instructions, operands (values) are always 32 bits wide.
The operands are either held in registers or are specified as constants
(called literals) in the instruction itself. The result of a data processing instruction is
also a 32 bit data and is stored in a register. Most data processing instructions will
have three operands, two of which are inputs and one for the result.
ARM Instruction set
Data processing instructions
•Arithmetic operations.
•Bit-wise logical operations.
•Register movement operations
•Comparison operations
•Shift and rotate instructions
Data processing instructions
Arithmetic operations.
'ADD' is simple addition, 'ADC' is add with carry, 'SUB' is subtract, 'SBC' is subtract
with carry, 'RSB' is reverse subtraction and 'RSC' reverse subtract with carry.
Data processing instructions
Bit-wise logical operations.
AND, OR and XOR (here called EOR) logical operations at the
hardware gate level .The final mnemonic, BIC, stands for 'bit clear' where every '
1' in the second operand clears the corresponding bit in the first.
Data processing instructions
Register movement operations
The 'MVN' mnemonic stands for 'move negated'; it leaves the result register set
to the value obtained by inverting every bit in the source operand.
Data processing instructions
Comparison operations
These instructions do not produce a result (which is therefore omitted from the
assembly language format) but just set the condition code bits (N, Z, C and V) in
the CPSR according to the selected operation
CMN r1, r2 ; ; set cc on r1 + r2
CMP r1, r2 ; set cc on r1 - r2
TEQ r1, r2 ; set cc on r1 xor r2
TST r1, r2 ; set cc on r1 and r2
Shifted register operands
▪The ARM doesn’t have actual shift instructions. Instead it has a barrel
shifter which provides a mechanism to carry out shifts as part of other
instructions.
▪One operand to ALU is routed through the Barrel shifter.
▪Thus, the operand can be modified before it is used.
▪Useful for fast multiplication
Data processing instructions
Shift and rotate instructions
•At their most basic, a shift operation moves the bits in a register to the
left or right and filling the vacant holes with zeros or ones.
•For instance, shifting bits left by one bit is equivalent to multiplying the
original number by 2.
•Shifting bits right is equivalent to division.
•Thus shift operations can simulate certain forms of integer multiplication
and division.
•logic Shifts left by the specified amount (multiplies by powers of two)
Data processing instructions
Shift and rotate instructions
•Why not just use generic multiply and divide instructions?
•Firstly the ARM, like many processors, does not have an integer division
instruction.
•Secondly generic multiplication and division is a slow process - in many
cases, it will much faster to convert standard multiplication and division
calculations into a series of shift operations.
•Shift and rotate operands can be applied to any of the following ARM
instructions: ADC, ADD, AND, BIC, CMN, CMP, EOR, MOV, MVN, ORR, RSB,
SBC, SUB, TEQ, TST.
Shift and rotate instructions
Arm has three types of shift operations
•Logical shift operations : LSL,LSR
•Arithmetic shift operations: ASL,ASR
•Rotate operations: ROR,RRX
Logical shift operations : LSL,LSR
Logical shift operations : LSL,LSR
• LSR: logical shift right by 0 to 32 places; fill the vacated bits at the most
significant end of the word with zeros.
Arithmetic shift operations:ASL,ASR
Arithmetic shift operations:ASR
Rotate operations:ROR,RRX
The ARM processor has two rotate operations, ROR (Rotate Right) and RRX
(Rotate Right with Extend).
ROR behaves much like LSR in that bits are moved between 0 and 32 places to
the right.
However, whereas the rightmost bits in a LSR operation fall off the register, in a
ROR operation, these bits are used to fill the vacated slots at the most significant
end of the register. In this way the bits "rotate".
If the degree of the rotation is 32 places, then the output is identical to the input
as all the bits will have returned to their original location.
Rotate operations:ROR,RRX
Rotate operations:ROR,RRX
Multiply Instructions
MUL r4, r3, r2 ; r4 := (r3 x r2)[ 3 1 : 0 ]
MLA r4, r3, r2, r1 ; r4 := (r3 x r2 + r1)[31:0]
Multiply Instructions
Multiply Instructions
Data transfer instructions
Data transfer instructions transfer data between registers and memory:
▪Memory to register
▪Register to memory
The ARM data transfer instructions are all based around register-indirect
addressing.
Register-indirect addressing uses a value in one register (the base register) as a
memory address and either loads the value from that address into another register
or stores the value from another register into that memory address.
These instructions are written in assembly language as follows:
LDR r0, [r1] ; r0 := mem32[r1] ;
STR r0, [r1] ; mem32[r1] := r0
Control flow instructions
Branch instructions
B is the simplest branch instruction.
Upon encountering a B instruction, the ARM processor will jump immediately to
the address given, and resume execution from there.
The most common way to switch program execution from one place to another is
use the branch instruction:
Control flow instructions
Branch instructions
Simple Programs
start: mov r0, #1 // Move the number 1 into register r0
mov r1, #2 // Move the number 2 into register r1
add r2, r1, r0 // Add r0 and r1 and store the result in r2
stop: b stop // End
ARM Pipelining
•A Pipelining is the mechanism used by RISC(Reduced instruction set computer) processors to
execute instructions,by speeding up the execution by fetching the instruction, while other
instructions are being decoded and executed simultaneously.
•Which in turn allows the memory system and processor to work continuously.
•The pipeline design for each ARM family is different.
•Pipelining is a design technique or a process which plays an important role in increasing the
efficiency of data processing in the processor of a computer and microcontroller. By keeping the
processor in a continuous process of fetching, decoding and executing called (F&E cycle).
•ARM devices need pipelining because of RISC as it emphasizes on compiler complexity. Each
stage is equivalent to 1 cycle, that is n stages = n cycles.
3-stage pipeline ARM Architecture
3-stage pipeline ARM Architecture
3-stage pipeline ARM Architecture
The register bank, which stores the processor state. It has two read ports and one
write port which can each be used to access any register,
plus an additional read port and an additional write port that give special access
to r15, the program counter.
The barrel shifter, which can shift or rotate one operand by any number of bits.
The ALU, which performs the arithmetic and logic functions required by the
instruction set
3-stage pipeline ARM Architecture
The address register and incrementer, which select and hold all memory
addresses and generate sequential addresses when required.
The data registers, which hold data passing to and from memory.
The instruction decoder and associated control logic.
5 stage pipelining
Fetch- It will fetch instructions from memory.
Decode- It decodes the instructions that were fetched in the first cycle.
ALU – It executes the instruction that has been decoded in the previous stage.
LS1(Memory) Loads/Stores the data specified by load or store instructions.
LS2(Write) Extracts (zero or sign) extends the data loaded by byte or half word load instruction.
5 stage pipelining
5 stage pipelining
5 stage pipelining
Simple Programs
Assembly generally has three sections in the source
// .text - This is where all of the code lives (Flash)
// .data - This is where all variables with value go (RAM)
// .bss - This is where uninitialized variables go (RAM)
ARM Co-Processor
The ARM supports a general-purpose extension of its instruction set
through the addition of hardware coprocessors.
A co-processor handles the mathematical part of the work when ARM
processor run complex applications.
▪Additional processor can be added with special purpose to perform
special tasks or to perform operations on numbers in order to offload the
work of the core CPU. The CPU can then work faster.
▪More than one coprocessor can be added to the ARM core via the
coprocessor interface.
ARM Co-Processor
ARM Co-Processor
Support for up to 16 logical coprocessors.
▪The Arm coprocessor interface on a Cortex-M33 processor allows the
microcontroller to directly communicate with external hardware.
▪Arm Cortex-M33 processor, the coprocessor interface connects to the
core through an AHB5 bus, allowing for high-speed communication with
external accelerator hardware
▪Each coprocessor can have up to 16 private registers of any reasonable
size; they are not limited to 32 bits.
ARM Co-Processor
▪Coprocessors use a load-store architecture, with instructions to perform
internal operations on registers, instructions to load and save registers
from and to memory, and instructions to move data to or from an ARM
register.
▪The coprocessor can be accessed through a group of dedicated ARM
instructions that provide a load-store type interface
ARM Co-Processor Interface
CPI pin is pulled low whenever one of the coprocessor instructions is executed.
A 4-bit field in the coprocessor instructions allows up to 16 separate coprocessors to be addressed.
When a coprocessor instruction is executed and /CPI is asserted, the designated coprocessor should respond by
pulling CPA low immediately (CPA = coprocessor absent) to indicate that it will execute the command.
If no coprocessor exists, then CPA stays high and the undefined instruction trap is taken so that the coprocessor
operation can be performed in software.
CPB (coprocessor busy) is used by the coprocessor to delay the ARM processor while the operation is completed. The
ARM will wait until CPB goes low and then start the next instruction.
ARM Co-Processor Interface
Basic instructions
[Link] (from ARM to all coprocessors).
This signal, which stands for 'Coprocessor Instruction', indicates that the
ARM has identified a coprocessor instruction and wishes to execute it
2. cpa (from the coprocessors to ARM).
This is the 'Coprocessor Absent' signal which tells the ARM that there is
no coprocessor present that is able to execute the current instruction.
3. cpb (from the coprocessors to ARM).
This is the 'CoProcessor Busy' signal which tells the ARM that the
coprocessor cannot begin executing the instruction yet.