0% found this document useful (0 votes)
12 views512 pages

All Embedded

The document outlines a course on Embedded Systems (UEC513) taught by Dr. Karmjit Singh Sandha, focusing on ARM processor architecture, programming, and interfacing with peripherals. It covers the definition, classification, and applications of embedded systems, along with laboratory work involving practical programming and interfacing tasks. The course aims to equip students with the skills to design basic embedded systems and understand the underlying hardware and software components.

Uploaded by

gillsanyam92
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views512 pages

All Embedded

The document outlines a course on Embedded Systems (UEC513) taught by Dr. Karmjit Singh Sandha, focusing on ARM processor architecture, programming, and interfacing with peripherals. It covers the definition, classification, and applications of embedded systems, along with laboratory work involving practical programming and interfacing tasks. The course aims to equip students with the skills to design basic embedded systems and understand the underlying hardware and software components.

Uploaded by

gillsanyam92
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Embedded Systems

(UEC513)

By:
Dr. Karmjit Singh Sandha
Associate Professor, ECED
UEC513: EMBEDDED SYSTEMS L T P Cr
3 0 2 4.0
Course objective: The objective of this course is to equip students
with the necessary fundamental knowledge and skills that enable
them to design basic embedded systems. It covers architecture,
programming of ARM processor ad it’s interfacing with peripheral
devices.
Introduction to Embedded Systems: Definition, Embedded Systems Vs
General Computing Systems, Classification of Embedded Systems,
Major application areas. General purpose processor architecture and
organization, Von-Neumann and Harvard architectures, CISC and RISC
architectures, Big and Little endian processors, Processor design trade-
offs, Processor cores: soft and hard.
Introduction to ARM Processor: The ARM design philosophy, ARM
core data flow model, Architecture, Register set, ARM7TDMI Interface
signals, General Purpose Input Output Registers, Memory Interface,
Bus Cycle types, Pipeline, ARM processors family, Operational Modes,
Instruction Format, Data forwarding.
UEC513: EMBEDDED SYSTEMS
• Programming based on ARM7TDMI: ARM Instruction set,
condition codes, Addressing modes, Interrupts, Exceptions
and Vector Table. Assembly Language Programming, Thumb
state, Thumb Programmers model, Thumb Applications, ARM
coprocessor interface and Instructions.
• ARM Tools and Interfacing of Peripherals: ARM
Development Environment, Arm Procedure Call Standard
(APCS), Example C/C++ programs, Embedded software
development, Image structure, linker inputs and outputs,
Protocols (I2C, SPI), Memory Protection Unit (MPU). Physical
Vs Virtual Memory, Paging, Segmentation. The Advanced
Microcontroller Bus Architecture (AMBA), DMA, Peripherals,
Interfacing of peripherals with ARM. Familiarization with
Standards: IEEE 1275.1-1994 snd IEEE 1754.
UEC513: EMBEDDED SYSTEMS
Laboratory Work:
Introduction to Kiel Software, Introduction to ARM processor kit, Programming examples of ARM
processor. Interfacing of LED, Seven Segment Display, Stepper Motor, LCD with ARM7TDMI processor..
Course learning outcome (CLO): The student will be able to:
1. Explain embedded system, its processor architecture and distinguish it from general computing
system.
2. Describe ARM processor internal architecture, assembly instructions, their format and Develop ARM
processor-based assembly language program for a given statement. 3.
3. Describe how thumb mode operations are designed and various coprocessors are interfaced in an
embedded system.
• 4. Interface various hardware peripherals in embedded systems.
• 5. Recognize issues to be handled in any processor software tool chain for embedded system
development especially using C/C++.
Text Books:
1. Carl Hamacher, Zvonko Vranesic, Safwat Zaky, Naraig Manjikian, “COMPUTER ORGANIZATION AND
EMBEDDED SYSTEMS, Sixth Edition, McGraw Hill, 2012.
2. Steve Furber, “ARM System-on-Chip Architecture, Second Edition, PEARSON, 2013.
Reference Books:
1. Stephen Welsh, Peter Knaggs, “ARM: Assembly Language Programming”, Bourne Mouth University
Publication, 2003.
2. Andrew N. Sloss, Dominic Symes, Chris Wright “ARM System Developers Guide, Designing and
Optimizing System Software”, Elsevier Publication.
Block Diagram of Computer
Role of Address, data and control busses
Address Bus:
• To recognized any I/O device or memory location by
processor.
• Assign address must be unique.
• Processor send the address on address lines and
decoding circuit will find respective device.
• More number of address bus means more number of
devices can be interfaced with CPU.
2m=n
m: number of address lines
n: number of memory locations to be addressed
 The address bus is unidirectional
Data Bus:
• Required data transfer or received through data bus.
It indicates the data handling capacity of CPU.
• More data buses mean a more expensive CPU and
computer but with higher data handling capacity
• The data bus is bidirectional
Control bus:
• Use to decide the direction of data, either transfer or
received.
Block diagram of a general purpose microprocessor
Difference Between Microcontroller and Microprocessor
Difference Between Microcontroller and Microprocessor
Difference Between Microcontroller and Microprocessor
Embedded Systems
What is Embedded System?

An Electronic/Electro mechanical system which is designed to


perform a specific function and is a combination of both
hardware and firmware (Software).

E.g. Electronic Toys, Mobile Handsets, Washing Machines, Air


Conditioners, Automotive Control Units, Set Top Box, DVD
Player etc..

Embedded Systems are:

• Unique in character and behavior


• With specialized hardware and software
Embedded Systems (Definitions)
 An embedded system is a system that has software embedded into computer-
hardware, which makes a system dedicated for an application (s) or specific
part of an application or product or part of a larger system.

 An embedded system is one that has a dedicated purpose software embedded in


a computer hardware.

 It is a dedicated computer based system for an application(s) or product. It may


be an independent system or a part of large system. Its software usually embeds
into a ROM (Read Only Memory) or flash.

 It is any device that includes a programmable computer but is not itself


intended to be a general purpose computer.

 Embedded Systems are the electronic systems that contain a microprocessor or


a microcontroller, but we do not think of them as computers– the computer is
hidden or embedded in the system.
General purpose computing system vs
Embedded systems
History of Embedded System
History of Embedded System

• The first recognized embedded system is the Apollo Guidance


Computer(AGC) developed by MIT lab.
• AGC was designed on 4K words of ROM & 256 words of RAM. The clock
frequency of first microchip used in AGC was 1.024 MHz.
• The computing unit of AGC consists of 11 instructions and 16 bit word logic.
• It used 5000 ICs.
• The User Interface of AGC is known as DSKY(display/keyboard) which
resembles a calculator type keypad with an array of numerals.
• The first mass-produced embedded system Autonetics D-17 was guidance
computer for the Minuteman-I missile in 1961.
• In the year 1971 Intel introduced the world's first microprocessor chip called
the 4004, was designed for use in business calculators.
• It was produced by the Japanese company Busicom.
• Microcontroller, early 1980s
• Microcontroller can fulfill the role of a large number of components.
Classification of Embedded System
The classification of embedded system is based on following criteria's:
1. On generation
2. On complexity & performance
3. On deterministic behaviour
4. On triggering

On generation

First generation(1G):
❖ The early embedded systems built around 8-bit microprocessors like 8085
and Z80 and 4-bit microcontrollers
❖ Simple in hardware circuit & firmware developed.
❖ Examples: stepper motor control units, Digital Telephone Keypads etc.

Second generation(2G):
❖ Built around 16-bit µp & 8 or 16-bit µc.
❖ They are more complex & powerful than 1G µp &µc. Examples: SCADA
systems
Classification of Embedded System
Third generation(3G):
❖ Built around 32-bit µp & 16-bit µc.
❖ Concepts like Digital Signal Processors (DSPs), Application Specific
Integrated Circuits(ASICs) evolved.

 The instruction set is complex and powerful.

❖ Examples: Robotics, Media, industrial process control, etc.

Fourth generation:
 Built around 64-bit µp & 32-bit µc.
 The concept of System on Chips (SoC), Multicore Processors evolved.
 It brings high performance, tight integration and miniaturization into the
embedded device market
 Highly complex & very powerful.
 Examples: Smart Phones Devices.
Classification of Embedded Systems
On complexity & performance

Small-scale:
 The embedded systems built around low performance and low cost 8 or 16 bit
microprocessors/ microcontrollers.
 It is suitable for simple applications and where performance is not time critical. It
may or may not contain OS. Example: an electronic toy
Medium-scale:
 Embedded Systems built around medium performance, low cost 16 or 32 bit
microprocessors / microcontrollers or DSPs.
 These are slightly complex in hardware and firmware.
 It may contain GPOS/RTOS. Examples: Industrial machines.
Large-scale:
 Embedded Systems built around high performance 32 or 64 bit RISC
processors/controllers, or multi-core processors and PLD.
 It requires complex hardware and software.
 It contains RTOS for scheduling, prioritization and management.
 Response is time-critical. Examples: Mission critical applications.
Classification of Embedded Systems
On functional requirements

 Stand alone embedded systems


 Real time embedded system
 Networked embedded system
 Mobile embedded system.

Stand alone Embedded systems:

 A stand-alone embedded system works by itself.


 Self-contained device which does not require any host system like a computer.
 It takes either digital or analog inputs from its input ports, calibrates,
converts, and processes the data, and outputs the resulting data to its attached
output device, which either displays data, or controls and drives the attached
devices.
 EX: Temperature measurement systems, Video game consoles, MP3 players,
digital cameras, and microwave ovens are the examples for this category.
Classification of Embedded Systems
Real-time embedded systems:

An embedded system which gives the required output in a specified time or which
strictly follows the time deadlines for completion of a task is known as a Real time
system. i.e. a Real Time system , in addition to functional correctness, also satisfies
the time constraints .

There are two types of Real time systems. (i) Soft real time system and (ii) Hard real
time system.

Soft Real-Time system:

 A Real time system in which the violation of time constraints will cause only
degraded quality, but the system can continue to operate is known as a Soft real
time system.
 Missing a deadline may not be critical and can be tolerated to a certain degree
 Ex: A Microwave Oven, washing machine, TV remote etc.
Classification of Embedded Systems
Hard Real-Time system:

 A Real time system in which the violation of time constraints will cause critical
failure and loss of life or property damage or catastrophe is known as a Hard Real
time system.

 Ex: Deadline in a missile control embedded system , Delayed alarm during a Gas
leakage , car airbag control system , A delayed response in pacemakers, Failure in
RADAR functioning etc.
Classification of Embedded Systems
Networked embedded systems:

The networked embedded systems are related to a network with network interfaces
to access the resources. The connected network can be a Local Area Network (LAN)
or a Wide Area Network (WAN), or the Internet. The connection can be either wired
or wireless.
Ex: A home security system is an example of a LAN networked embedded system
where all sensors (e.g. motion detectors, light sensors, or smoke sensors) are wired
and running on the TCP/IP protocol.

Mobile Embedded systems:

The portable embedded devices like mobile and cellular phones, digital cameras,
MP3 players, PDA (Personal Digital Assistants) are the example for mobile embedded
systems. The basic limitation of these devices is the limitation of memory and other
resources.
Classification of Embedded Systems
On deterministic behavior

This classification is applicable for ―Real Time systems.

 The task execution behavior for an embedded system may be deterministic or non-
deterministic.
 Based on execution behavior Real Time embedded systems are divided into Hard
and Soft.

On triggering

Embedded systems which are ―Reactive in nature can be based on triggering.

Reactive systems can be:


1. Event triggered: Activities within the system (e.g., task run-times) are dynamic and
depend upon occurrence of different events .

2. Time triggered: Activities within the system follow a statically computed schedule
(i.e., they are allocated time slots during which they can take place) and thus by
nature are predictable.
Applicatios of Embedded Systems

APPLICATION OF EMBEDDED SYSTEM

The application areas and the products in the embedded domain are countless.

1. Consumer Electronics: Camcorders, Cameras.


2. Household appliances: Washing machine, Refrigerator.
3. Automotive industry: Anti-lock breaking system(ABS), engine control.
4. Home automation & security systems: Air conditioners, sprinklers, fire alarms.
5. Telecom: Cellular phones, telephone switches
6. Computer peripherals: Printers, scanners.
7. Computer networking systems: Network routers and switches.
8. Healthcare: EEG, ECG machines.
9. Banking & Retail: Automatic teller machines, point of sales.
10. Card Readers: Barcode, smart card readers.
Classifications of architecture
On the basis of hardware:
 Von Neumann Architecture
 Harvard Architecture

On the basis of software:


 RISC
 CISC
von Neumann architecture

The design of a von Neumann architecture machine is simpler than that of


a Harvard architecture machine, which is also a stored-program system but has
one dedicated set of address and data buses for reading data from and writing
data to memory, and another set of address and data buses for instruction
fetching.
Harvard architecture

The Harvard architecture is a computer architecture with


physically separate storage and signal pathways ( Address and data
busses) for instructions and data.
Processor cores: soft and hard
 Hard core: not reconfigurable, components are fixed, integrated into
development board
 Soft core: system on programmable chip, processor, memory created on
programmable logic device

Softcore and hardcore tradeoff:

environment
 Physical (actual hardware) and virtual (emulation of hardware)
 Hardcore solution: lack of complete system limits accomplishment in virtual
environment
 Softcore solution: entire system can be simulated and verified in virtual
environment
Processor cores: soft and hard
Softcore and hardcore tradeoff:

visibility of signal behaviors


 Hardware debug and visibility of signals critical for diagnosis
 Hardcore solution: impossible to monitor signal transitions
 Softcore solution: System level debugging tool acts as virtual logic analyser
displaying any signal in the circuit

design flexibility
 How easily development platform can be expanded: 1) adding IP (Intellectual
Property) blocks or 2) use parallel, serial interfaces

 Hardcore solution: use of standard interfaces


 Softcore solution: use of digital IP cores
Processor cores: soft and hard
Softcore and hardcore tradeoff:

cost
 Hardcore solution: cost effective
 Softcore solution: expensive system
Difference in price between microcontroller and FPGA
 Low price of microcontroller: Advanced technology and high volume
production
 High price of FPGA: not so advanced technology and less volume production

power consumption
Design system for high energy efficiency
 Hardcore solution: power saving modes, very little power consumption
 Softcore solution: FPGA less power efficient
Thanks
• Course Name: Embedded System
• Topic Name: Introduction to ARM

Dr. Karmjit Singh Sandha


Associate Professor, ECED
Introduction
 Key Component in Embedded Systems.
 ARM cores are used in mobile phones, handheld
organizers(PDA),portable consumer devices,
automobile industry, Networking, Security Systems.
 Originally Acorn RISC Machines, but now called as
Advanced RISC Machines.
 Development started in 1985.
 Continuation of the architecture enhancements from
the original architecture.
 Over 1 billion ARM processors were sold by 2001.
ARM7TDMI was most successful ARM core.
 ARM does not fabricate silicon itself
Features of ARM Processors
• 32 bit RISC processor.
• High Code density.( Less memory)
• Hardware Debug Technology.
• Load store architecture.
• Mostly Single Cycle Execution except variable cycle
execution for certain instructions*.
• Inline barrel shifter.
• Thumb 16 bit instruction set.
• Conditional execution *: An instruction is only
executed when a specific condition has been satisfied.
Features Continued
• Enhanced Instructions: DSP
• Large 16 x 32 register file*.
• Uniform and Fixed op code width of 32 bits to ease
decoding and pipelining.
• Powerful indexed addressing modes.
• Simple, but fast, 2-priority-level interrupt subsystem
with switched register banks.
• Good Speed(few Mhz to Ghz) and Power consumption
ratio
• Based on Von Neumaan Architecture or Harvard
Architecture
Architecture Revisions
• ISA : Instruction Set Architecture
• Nomenclature
• ARM {x}{y}{z}-{T}{D}{M}{I}{E}{J}{F}{S}
• X: family, y: Memory Mangmt /Protection,
• Z : Cache
• T: Thumb
• D: Debugger (on-chip debug support)
• M: Extended Multiplier (Consists Multiply instructions)
• I : Embedded ICE macrocell (Allow breakpoint watchpoint to be set)
• E: Enhanced Instructions (DSP processor)
• J: Java acceleration by Jazelle (for JAVA coding)
• F: Vector floating point unit
• S: Synthesizable version ( Core is provided as source code which can be
modified and used by EDA tools)
ARM Cores
ARM Processor Family
ARM Processor Family

[Link]
• Course Name: Embedded System
• Topic Name: Architecture of ARM

Dr. Karmjit Singh Sandha


Associate Professor, ECED
Internal Architecture of ARM
Detail of internal Registers of ARM
• There are 37 ARM registers in total of which
variable amount is available as banked registers
depending on the mode of operation.
• R13 functions always as stack pointer
• R14 functions as link register, where the core puts
the return address whenever it calls a subroutine.
• R15 is the program counter (pc) and contains the
address of the next instruction to be fetched by the
processor.
• CPSR : Current program status register
• SPSR : Saved Program status register.
Modes of ARM Processor
• ARM has 7 modes
• Out of which 6 privileged (Allows full read write access to
cspr)
[Link]( failed attempt to access memory and/or Memory
protection)
[Link] Interrupt request(Fast Interrupt for high speed)
[Link] request(Used for general purpose interrupt handling)
[Link] ( after reset, OS kernal operation)
[Link](Special version of user mode)
[Link] ( undefined instruction and supports software
emulation of hardware coprocessors)
• One non-privileged mode (allows read access to control field
and read write to conditional flags)
[Link] mode (Normal programs and applications)
• Course Name: Embedded System
• Topic Name: Register set of ARM

Dr. Karmjit Singh Sandha


Associate Professor, ECED
Modes of ARM Processor
Registers of ARM
• The ARM has total of 37 registers.

• Out of which 30 are general purpose


registers, 6 are status registers and one is a
program counter.

• Only fifteen of the general purpose registers


are available at any one time depending on
the processor mode.
Registers of ARM
• There are a standard set of eight general purpose
registers that are always available (R0–R7) no
matter which mode the processor is in.
• These registers are truly general purpose, with no
special uses being placed on them by the
processors’ architecture.
• A few registers (R8–R12) are common to all
processor mode with the exception of the fiq
mode.
• When the processor is in the fast interrupt mode
these registers are replaced with the different set
of registers (R8_fiq – R12_fiq)
Registers of ARM
• The general purpose register can be used to handle
8 bit bytes, 16 bit half words, or 32 bit words.
• When we use a 32 bit register in a byte instruction
only the least significant 8 bits are used.
• In a half word instruction only the least significant
16 bits are used.
• The remaining registers (R13 – R15) are special
purpose registers and have very specific roles.
Registers of ARM
• R13 is also known as the Stack pointer, while
R14 is known as the Link Register, and r15 is
the program counter.
• The “user” (usr) and “System” (sys) modes
share the same registers.
• There are also one or two status registers
depending on which mode the processor is in.
• Current processor status register (CPSR)
holds information about the current status of
the processor (including its current mode)
Registers of ARM

• In the exception modes there is an


additional Saved Processor Status register
(SPSR) which holds information on the
processors state before the system changed
into this mode i.e. the processor status just
before an exception.
Stack pointer, SP or R13

• Register r13 is used as a stack pointer and is


also known as the SP register.
• Each exception mode has its own version of
r13, which points to a stack dedicated to that
exception mode.
• The stack is typically used to store temporary
values.
The link register, LR or r14
• Register r14 is also known as the Link register or LR

• It is used to hold the return address of a subroutine.

• When an execution occurs, the exception mode’s


version of r14 is set to the address after the instruction
which has just been completed.

• The SPSR is a copy of the CPSR just before the


exception occurred.
The Program Counter, PC or r15
• Register r15 holds the Program Counter known
as the PC.

• It is used to identify which instruction is to be


performed next.

• As the PC holds the address of the next


instruction it is often referred to as an
instruction pointer.
Current Processor Status Register
(CPSR)
• Current processor status register (CPSR)
contains the current status of the processor.
• This includes various conditional code flags
Interrupt Status Processor mode and other
status and control information.
• The exception modes also have a saved
processor status register (SPSR), that is used to
preserve the value of CPSR when the
associated exception occurs.
• Because the User and System modes are not
exception modes, there is no SPSR available.
Bit pattern Current Processor Status
Register (CPSR)
Current Processor Status Register (CPSR)

The processors’ status is split in to two distinct


parts: the User flags and the Systems Control
flags.

The upper half word is accessible in User mode and


contains a set of flags which can be used to effect
the operation of a program.

Any bit not currently used is reserved for future use


and should be zero.

The I and F bits indicate if interrupts (I) or Fast


Interrupts (F) are allowed.
The system flags can only be altered when the
processor is in protected mode.

User mode programs can not alter the status


register except for the condition code flags.
Current Processor Status Register (CPSR)

The upper four bits of the status register contains a set


of four flags, collectively known at condition code.

The condition code flags are

Negative (N)
Zero (Z)
Carry (C)
Overflow (V)
The condition code can be used to control the flow of the
program execution.
• Course Name: Embedded System
• Topic Name: Modes and Exceptions of
ARM

Dr. Karmjit Singh Sandha


Associate Professor, ECED
Exceptions
• Exceptions are situations that stop the normal functioning of
the program.
• ARM supports seven types of exception, and provides
privileged processing modes for each type.
Exception processing modes

Exception Type Processor Mode


Reset Supervisor svc
Software Interrupt Supervisor svc
Undefined Instruction Undefined und
Prefetch Abort Abort abt
Data Abort Abort abt
Interrupt IRQ irq
Fast Interrupt FIQ fiq
Role of Exceptions
• Reset – Can occur when the processor reset pin is
given a signal or by branching to the reset vector
address (0x0000). The first one is a hardware
reset while the second one is a software reset.
• Undefined instruction – Occurs when the
processor cannot recognize the currently
executing instruction
• Software Interrupt (SWI) – Caused by user
defined interrupts (in the program) or user
requesting to switch to more privileged modes.
SWI can be used to call privileged OS subroutines.
Role of Exceptions
• Prefetch Abort – Occurs when an instruction is
fetched from an illegal address
• Data Abort – A data transfer instruction attempts
to load or store data at an illegal address
• IRQ – The processor’s external interrupt request
pin is asserted(LOW) and the I interrupt mask in
the CPSR is clear (enable). IRQ are assigned to
general purpose interrupts like periodic timers.
• FIQ – The processor’s external fast interrupt
request pin is asserted (LOW) and the F interrupt
mask in the CPSR is clear (enable).
• FIQ is reserved for one single interrupt source that
requires fast response time.
Priority and vector locations of exceptions
Different exceptions are handled in different modes of
the processor. Different exceptions also have different
priorities.
Mode selection using CPSR
Handling of exception:

[Link] the CPSR into the SPSR of the mode in


the which the exception is to be handled.
[Link] the mode bit in the CPSR .
[Link] interrupts
[Link] link register to the return address
[Link] the program counter to the vector address
for the exception
Leaving exception handler:
[Link] the Link Register LR (minus an offset) to
the PC.
[Link] SPSR back to CPSR, this will
automatically changes the mode back to the
previous one.
[Link] the interrupt disable flags (if they were
set).
Exceptions Execution
The exception handler are located a pre-defined locations
known as Exception vectors. It is the responsibility of an
operating system to provide suitable exception handling.
• Course Name: Embedded System
Topic Name: Memory Organization

Dr. Karmjit Singh Sandha


Associate Professor, ECED
Memory Organization
• Little and Big Endian (the bigend bit)
• Little Endian scheme: The lowest numbered byte in a word is considered
to be the least significant byte of the word and the highest numbered byte
is the most significant. Byte 0 of the memory system should be connected
to data lines 7 through 0 (D7:1) in this scheme.
Memory Organization
• Big Endian scheme: The lowest numbered byte in a word is considered to
be the most significant byte of the word and the highest numbered byte is
the lowest significant. Byte 0 of the memory system should be connected
to data lines 31 through 24 (D31:24) in this scheme.
Memory Organization
Thanks
ARM Organization and Implementation

Dr. Karmjit Singh Sandha


Assistant professor
Thapar Institute of Engineering and Technology
Pipeline
Pipeline
Course Name: Embedded System
• Topic Name: Instruction set Architecture (ISA)

By:
Dr. Karmjit Singh Sandha
Assistant Professor, ECED

Reference Books:

• Steve Furber, “ARM System-on-Chip Architecture, Second Edition, PEARSON,


2013.
• Stephen Welsh, Peter Knaggs, “ARM: Assembly Language Programming”, Bourne
Mouth University Publication, 2003.
• Andrew N. Sloss, Dominic Symes, Chris Wright “ARM System Developers Guide,
Designing and Optimizing System Software”, ELSEVIER Publication.
Instruction Set of ARM
The ARM Instruction set can be divided
into six broad classes of instruction
1. Data Movement
2. Arithmetic
3. Memory Access
4. Logical and bit manipulation
5. Flow Control
6. System Control/ Privileged
Instruction Mnemonic
Condition code (cc) Mnemonic
ARM instructions
Type of operation:

Arithmetic
Branch
Load and Store
Logical
Move
Arithmetic Instructions
ADD Add

ADC Add with carry

SUB Subtract

SBC Subtract with carry

RSB Reverse subtract

RSC Reverse subtract with carry

MUL Multiply

MLA Multiply and accumulate

UMULL Multiply - unsigned long

UMLAL Multiply and accumulate - unsigned long

SMULL Multiply - signed long

SMLAL Multiply and accumulate - signed long

CMP Compare

CMN Compare negative


Branch Instructions:

B Branch
BL Branch with link
Load and Store Instructions
LDR Load word
LDRB Load byte
LDRSB Load signed byte
LDRH Load half word
LDRSH Load signed half word
LDM Load multiple
LDM sp! Pop
STR Store word
STRB Store byte
STRH Store half word
STM Store multiple
STM sp! Push
Logical Instructions:

AND AND
EOR Exclusive OR
ORR OR
BIC Bit clear
TST Test
TEQ Test equivalence
Move Instructions
MOV Move
MVN Move and negate
SWP Swap
SWPB Swap byte
MRS Move program status register to register
MSR Move register to program status register
Arithmetic Instruction
Add
Syntax: ADD{cond}{S} Rd, Rn, Operand2

Elements inside curly brackets are optional.

Usage: Adds the value in Rn to Operand2 and places the sum in Rd.

Condition flags: If S is specified then all flags are updated according to the result.

Examples:

ADD r7, r4, #99 ;adds 99 to the value in r4 and places the sum in r7

ADD r1, r2, r3 ;adds the value in r3 to the value in r2 and places the sum in r1
Add with carry
Syntax: ADC{cond}{S} Rd, Rn, Operand2
Elements inside curly brackets are optional.
Usage: Adds the value in Rn to Operand2 and adds another 1 if the carry flag is
set. The sum is placed in Rd.
Condition flags: If S is specified then all flags are updated according to the
result.
Examples:
ADC r7, r4, #99 ;adds 99 to the value in r4 and adds another 1 if the carry flag is
set. Places the sum in r7
ADC r1, r2, r3 ;adds the value in r3 to the value in r2 and adds 1 if the carry flag
is set. Places the sum in r1
ADCCSS r1, r2, r3
Subtract
Syntax: SUB{cond}{S} Rd, Rn, Operand2

Elements inside curly brackets are optional.

Usage: Subtracts Operand2 from the value in Rn and places the difference in
Rd.

Condition flags: If S is specified then all flags are updated according to the
result.

Examples:

SUB r7, r4, #99 ;subtracts 99 from the value in r4 and places the result in r7

SUB r1, r2, r3 ;subtracts the value in r3 from the value in r2 and places the
difference in r1
Subtract with carry

Syntax: SBC{cond}{S} Rd, Rn, Operand2

Elements inside curly brackets are optional.

Usage: Subtracts Operand2 from the value in Rn and subtracts another 1 if the
carry flag is clear. Places the difference in Rd.

Condition flags: If S is specified then all flags are updated according to the result.

Examples:

SBC r7, r4, #99;

SBC r1, r2, r3;


Reverse subtract
Syntax: RSB{cond}{S} Rd, Rn, Operand2

Elements inside curly brackets are optional.

Usage: Subtracts the value in Rn from Operand2 and places the difference in
Rd.

Condition flags: If S is specified then all flags are updated according to the
result.

Examples:

RSB r7, r4, #99 ;subtracts the value in r4 from 99 and places the result in r7

RSB r1, r2, r3 ;subtracts the value in r2 from the value in r3 and places the
difference in r1
Reverse subtract with carry
Syntax: RSC{cond}{S} Rd, Rn, Operand2

Elements inside curly brackets are optional.

Usage: Subtracts the value in Rn from Operand2 and subtracts another 1 if the
carry flag is clear. Places the difference in Rd.

Condition flags: If S is specified then all flags are updated according to the
result.

Examples:

RSC r7, r4, #99

RSC r1, r2, r3


Multiply
Syntax: MUL{cond}{S} Rd, Rm, Rs : Rd = Rm * Rs

Elements inside curly brackets are optional.

Usage: Multiples the values in registers Rm and Rs and places the least
significant 32 bits of the product in register Rd.

Condition flags: If S is specified then the N and Z flags are updated


according to the result, the V flag is not affected and the C flag is
unpredictable for the ARM7 and earlier processors.

Example:

MUL r5, r3, r9 ;multiply the values in r3 and r9 and places the result in r5
Multiply and accumulate
Syntax: MLA{cond}{S} Rd, Rm, Rs, Rn : Rd = (Rm * Rs) + Rn

Elements inside curly brackets are optional.

Usage: Adds the value in Rn to the product of the values in Rm and Rs and places
the least significant 32 bits of the result in register Rd.

Condition flags: If S is specified then the N and Z flags are updated according to the
result, the V flag is not affected and the C flag is unpredictable for the ARM7 and
earlier processors.

Example:

MLA r5, r3, r9, r6 ;multiply the values in r3 and r9, add the product to the value in
r6 and places the result in r5
Multiply - unsigned long
Syntax: UMULL{cond}{S} RdLo, RdHi, Rm, Rs

Elements inside curly brackets are optional.

Usage: Multiples the values (as unsigned integers) in registers Rm and Rs and
places the least significant 32 bits of the product in register RdLo and the most
significant 32 bits of the product in register RdHi.

Condition flags: If S is specified then the N and Z flags are updated according to
the result and the V and C flags are unpredictable for the ARM7 and earlier
processors.

Example:

UMULL r6, r5, r3, r9 ;multiply the values in r3 and r9 and places the result in r5
and r6
Multiply and accumulate - unsigned long
Syntax: UMLAL{cond}{S} RdLo, RdHi, Rm, Rs
Elements inside curly brackets are optional.
Usage: Multiples the values (as unsigned integers) in registers Rm and Rs and
adds the 64 bit product to the unsigned 64 bit value in registers RdLo (least
significant 32 bits) and RdHi (most significant 32 bits).
Condition flags: If S is specified then the N and Z flags are updated according to
the result and the V and C flags are unpredictable for the ARM7 and earlier
processors.
Example:
UMLAL r6, r5, r3, r9 ;multiply the values in r3 and r9 and add the product to the
values in r5 and r6
Multiply - signed long
Syntax: SMULL{cond}{S} RdLo, RdHi, Rm, Rs
Elements inside curly brackets are optional.
Usage: Multiples the values (as two's complement signed integers) in registers
Rm and Rs and places the least significant 32 bits of the product in register
RdLo and the most significant 32 bits of the product in register RdHi.
Condition flags: If S is specified then the N and Z flags are updated according to
the result and the V and C flags are unpredictable for the ARM7 and earlier
processors.
Example:
SMULL r6, r5, r3, r9 ;multiply the values in r3 and r9 and places the result in r5
and r6
Multiply and accumulate - signed long

Syntax: SMLAL{cond}{S} RdLo, RdHi, Rm, Rs

Elements inside curly brackets are optional.

Usage: Multiples the values (as two's complement signed integers) in registers Rm
and Rs and adds the 64 bit product to the two's complement signed 64 bit value in
registers RdLo (least significant 32 bits) and RdHi (most significant 32 bits).

Condition flags: If S is specified then the N and Z flags are updated according to the
result and the V and C flags are unpredictable for the ARM7 and earlier processors.

Example:

SMLAL r6, r5, r3, r9 ;multiply the values in r3 and r9 and add the product to the
values in r5 and r6
Compare
Syntax: CMP{cond} Rn, Operand2

Elements inside curly brackets are optional.

Usage: Subtracts Operand2 from the value in Rn and updates the flags
accordingly. The result is discarded.

Condition flags: All flags are updated according to the result.

Examples:

CMP r1, #9 ;set the flags as if 9 was subtracted from the value in r1.

CMP r6, r2 ;set the flags for the result of (r6 - r2) but discard the result
Compare negative
Syntax: CMN{cond} Rn, Operand2

Elements inside curly brackets are optional.

Usage: Add Operand2 to the value in Rn and updates the flags accordingly. The
result is discarded.

Condition flags: All flags are updated according to the result.

Examples:

CMN r1, #9 ;set the flags as if 9 was added to the value in r1.

CMN r6, r2 ;set the flags for the result of (r6 + r2) but discard the result
Data Movement
• Operations are:

– MOV{cond}{S} Rn, Operand2

– MVN {cond}{S} Rn, operand2

(move the NOT of the 32-bit value into a register)

Note that these make no use of operand1.

• Syntax:

– <Operation>{<cond>}{S} Rd, Operand2

• Find the value in r0, r1, r2:

– MVNEQ r1,#02 r1=

– MOV r0, r1 r0=

– MOVS r2, #10 r2=


Data Movement
• Operations are:
– MOV{cond}{S} Rn, Operand2
– MVN {cond}{S} Rn, operand2
(move the NOT of the 32-bit value into a register)
Note that these make no use of operand1.
• Syntax:
– <Operation>{<cond>}{S} Rd, Operand2
• Find the value in r0, r1, r2:
– MVNEQ r1,#02 r1=
– MOV r0, r1 r0=
– MOVS r2, #10 r2=
Data Movement
• Operations are:
– MOV{cond}{S} Rn, Operand2
– MVN {cond}{S} Rn, operand2
(move the NOT of the 32-bit value into a register)
Note that these make no use of operand1.
• Syntax:
– <Operation>{<cond>}{S} Rd, Operand2
• Find the value in r0, r1, r2:
– MVNEQ r1,#02 r1= 0xfffffffd
– MOV r0, r1 r0=0xfffffffd
– MOVS r2, #10 r2=0x00000010
Barrel Shifter - Left Shift
• Shifts left by the specified amount (multiplies
by powers of two) e.g.
LSL #5 = multiply by 32
MOV R0, R1, LSL #2
MOV R0, R1, LSL R2

CF Destination 0

Logical Shift Left (LSL)


Barrel Shifter - Right Shifts
Logical Shift Right
•Shifts right by the specified
amount (divides by powers of Logical Shift Right
two) e.g.
LSR #5 = divide by 32 ...0 Destination CF
MOV R0, R1, LSR #2
MOV R0, R1, LSR R2
Arithmetic Shift Right
Arithmetic Shift Right
•Shifts right (divides by powers of
two) and preserves the sign bit,
for 2's complement operations. Destination CF
e.g.
ASR #5 = divide by 32
MOV R0, R1, LSR #2 Sign bit shifted in
MOV R0, R1, LSR R2
Barrel Shifter - Rotations
Rotate Right (ROR) Rotate Right
• Similar to an ASR but
the bits wrap around as they
leave the LSB and appear as Destination CF
the MSB.
e.g. ROR #5
•Note the last bit rotated is
also used as the Carry Out.

Rotate Right Extended (RRX) Rotate Right through Carry


• This operation uses the
CPSR C flag as a 33rd bit.
Destination CF
•Rotates right by 1 bit.
Encoded as RRX #0.
Logical Instructions
• Logical instructions perform bitwise logical
operations on the two source registers.
• Syntax: <instruction>{<cond>}{S} Rd, Rn, N
• Elements inside curly brackets are optional
Example
• This example shows a logical OR operation between registers
r1 and r2. r0 holds the result.

• ORR R0, R1, R2

• Pre-execution
• r0 = 0x00000000 , r1 = 0x02040608, r2 = 0x10305070
• Post-execution
• r0 = ?
Example
• This example shows a more complicated logical
instruction called BIC, which carries out a logical bit
clear.
• PRE
r1 = 0b1111
r2 = 0b0101
BIC r0, r1, r2
• POST
r0 = 0b1010
• This is equivalent to
Rd = Rn AND NOT(N)
Example
• This example shows a logical OR operation between registers
r1 and r2. r0 holds the result.

• ORR R0, R1, R2

• Pre-execution
• r0 = 0x00000000 , r1 = 0x02040608, r2 = 0x10305070
• Post-execution
• r0 = 0x12345678
Branch Instructions
• A branch instruction changes the flow of execution or is used to call a routine. This
type of instruction allows programs to have subroutines, if-then-else structures,
and loops.
• Syntax:
• B{<cond>} label
• BL{<cond>} label
• BX{<cond>} Rm
• BLX{<cond>} label | Rm
Branch Instructions
Example of forward and backward unconditional branch

B forward
ADD r1, r2, #4
ADD r0, r6, #2
ADD r3, r7, #4
forward
SUB r1, r2, #4
backward
ADD r1, r2, #4
SUB r1, r2, #4
ADD r4, r6, r7
B backward
Branch Instructions
The branch with link, or BL, instruction is similar to the B instruction but overwrites
the link register lr with a return address. It performs a subroutine call. This example
shows a simple fragment of code that branches to a subroutine using the BL
instruction. To return from a subroutine, you copy the link register to the pc.

BL subroutine ; branch to subroutine


CMP r1, #5 ; compare r1 with 5
MOVEQ r1, #0 ; if (r1==5) then r1 = 0
:
subroutine
<subroutine code>
MOV pc, lr ; return by moving pc = lr
Load-Store Instructions
• Load-store instructions transfer data between
memory and processor registers. There are
three types of load-store instructions:
• Single-register transfer
• Multiple-register transfer,
• and Swap.
Single-Register Transfer
Single-Register Transfer
• These instructions are used for moving a single
data item in and out of a register. The data types
supported are signed and unsigned words (32-
bit), half words (16-bit), and bytes.
• Here are the various load-store single-register
transfer instructions.
• Syntax: <LDR|STR>{<cond>}{B} Rd,addressing1
LDR{<cond>}SB/H/SH Rd, addressing2
STR{<cond>}H Rd, addressing2
Single-Register Transfer
• LDR r0, [r1] ;r0 [r1, #0]
• LDR R0, [R1, #4] ; R0 [R1+ #4]
• LDR R0, [R1, R2] ;R0 [R1+R2]
• LDR R0, [R1, R2, LSL #2]
• Load the data from memory with address
[Sum of R1 and R2 with shift left by 2]
Single-Register Transfer
• STR r0, [r1] ; r0 [r1, #0]
• STR r0, [r1] ; r0 [r1, #0]
• STR R0, [R1, #4] ; R0 [R1+ #4]
• STR R0, [R1, R2] ;R0 [R1+R2]
• STR R0, [R1, R2, LSL #2] ;
• Store to memory with address [Sum of R1 and
R2 with shift left by 2]
Single-Register Transfer Examples
PRE:
r0 = 0x00000000
r1 = 0x00090000
mem32[0x00009000] = 0x01010101
mem32[0x00009004] = 0x02020202
• LDR r0, [r1, #4]!
Pre-indexing with write back:
• POST:
r0 =
r1 =
Single-Register Transfer Examples
PRE:
r0 = 0x00000000
r1 = 0x00090000
mem32[0x00009000] = 0x01010101
mem32[0x00009004] = 0x02020202
• LDR r0, [r1, #4]!
Preindexing with writeback:
• POST:
r0 = 0x02020202
r1 = 0x00009004
Single-Register Transfer Examples
PRE:
r0 = 0x00000000
r1 = 0x00090000
mem32[0x00009000] = 0x01010101
mem32[0x00009004] = 0x02020202
Preindexing:
LDR r0, [r1, #4]
• POST:
r0 = 0x02020202
r1 = 0x00009000
Single-Register Transfer Examples
PRE:
r0 = 0x00000000
r1 = 0x00090000
mem32[0x00009000] = 0x01010101
mem32[0x00009004] = 0x02020202
LDR r0, [r1], #4
Postindexing:
• POST:
r0 = 0x01010101
r1 = 0x00009004
Multiple-Register Transfer
• Load-store multiple instructions can transfer multiple
registers between memory and the processor in a single
instruction. The transfer occurs from a base address
register Rn pointing into memory. Multiple-register transfer
instructions are more efficient from single-register transfers
for moving blocks of data around memory and saving and
restoring context.
• Syntax: <LDM|STM>{<cond>}<addressing mode> Rn{!},<registers>{ˆ}
• LDMIA r0!, (r1-r3)
• STMIB r0!, (r1-r3)
Multiple-Register Transfer
Multiple-Register Transfer
Multiple-Register Transfer
LDM<mode> R0!, {R1-R3}

Mode Start Address End Address Data Word Rn!=2010


IA 2010 2018 2010:R1 201C
2014: R2
2018: R3
IB 2010 201C 2014:R1 201C
2018: R2
201C: R3
IA 2008 2010 2008: R1 2004
200C: R2
2010: R3
IA 2004 20oC 2004: R1 2004
2008: R2
200C: R3
Addressing Modes
• Immediate Addressing Mode
• Register Addressing Mode
• Offset Addressing
• Pre-Index Addressing
• Post-Index Addressing
Immediate Addressing Mode
• When an immediate constant value is the part
of instruction.
• Example:
• MOV R0, #05
• ADD R0,R0, #07
• SUB R0,R0, #06
Register Addressing Mode
• When the address of the data is specified by
the registers of processor.
• Example:
• MOV R0, R2
• ADD R0,R1, R2
• SUB R0,R1, R2
Offset Addressing
• In this the data is to read/write to/from memory
and the offset addressing of the memory address
is formed by adding (or subtracting) an offset to
or from the value held in a base register.
• Examples:
1. LDR R0, [R1] (Constant Value)
2. LDR R0, [R1, #4] (Constant Value)
3. LDR R0, [R1, R2] (Register)
4. LDR R0, [R1, R2, LSL #2] (Scaled)
Pre-Index Addressing
• In pre-index addressing the memory address if formed in the same way as
for offset addressing. The address is not only used to access memory, but
the base register is also modified to hold the new value. In the ARM
system this is known as a write-back and is denoted by placing a
exclamation mark after at the end of the (op2) code.

1. LDR R0, [R1, #4]! (Constant Value)


2. LDR R0, [R1, R2]! (Register)
3. LDR R0, [R1, R2, LSL #2]! (Scaled)
Post-Index Addressing
• In post-index address the memory address is the base register
value. As a side-effect, an offset is added to or subtracted
from the base register value and the result is written back to
the base register.
• Example:
• LDR R0, [R1], #4
• LDR R0, [R1], R2
• LDR R0, [R1], R2, LSL #2
Find the one's compliment (inverse) of
a number
Find the one's compliment (inverse) of
a number
AREA Program, CODE, READONLY
ENTRY
Main
LDR R1, Value ; Load the number to be complimented
MVN R1, R1 ; NOT the contents of R1
STR R1, Result ; Store the result
SWI &11

Value DCD &C123 ; Value to be complemented


Result DCD 0 ; Storage for result
END
Add two numbers
Add two numbers
AREA Program, CODE, READONLY
ENTRY
Main
LDR R1, Value1 ; Load the first number
LDR R2, Value2 ; Load the second number
ADD R1, R1, R2 ; ADD them together into R1 (x = x + y)
STR R1, Result ; Store the result
SWI &11
Value1 DCD &37E3C123 ; First value to be added
Value2 DCD &367402AA ; Second value to be added
Result DCD 0 ; Storage for result
END
Add two numbers and store the result
Add two numbers and store the result
AREA Program, CODE, READONLY
ENTRY
Main
LDR R0, =Value1 ; Load the address of first value
LDR R1, [R0] ; Load what is at that address
ADD R0, R0, #0x4 ; Adjust the pointer
LDR R2, [R0] ; Load what is at the new addr
ADD R1, R1, R2 ; ADD together
LDR R0, =Result ; Load the storage address
STR R1, [R0] ; Store the result
SWI &11 ; All done

Value1 DCD &37E3C123 ; First value


Value2 DCD &367402AA ; Second value
Result DCD 0 ; Space to store result
END
Find the larger of two numbers
Find the larger of two numbers
AREA Program, CODE, READONLY
ENTRY
Main
LDR R1, Value1 ; Load the first value to be compared
LDR R2, Value2 ; Load the second value to be compared
CMP R1, R2 ; Compare them
BHI Done ; If R1 contains the highest
MOV R1, R2 ; otherwise overwrite R1
Done
STR R1, Result ; Store the result
SWI &11
Value1 DCD &12345678 ; Value to be compared
Value2 DCD &87654321 ; Value to be compared
Result DCD 0 ; Space to store result
END
Program: Addition of Two 64-bit
Numbers
Program: Addition of Two 64-bit
Numbers
• Code:
AREA ADD_64BITNOS_PROGRAM, CODE
ENTRY
LDR R1, Value11 ; First number lower 32 bits
LDR R2, Value21 ; First number higher 32 Bits
LDR R3, Value12 ; Second number lower 32 Bits
LDR R4, Value22 ; Second number higher 32 Bits
ADDS R3, R3, R1 ; Add the lower order 32 bits of 2 nos.
ADC R4, R4, R2 ; Add the higher 32 bits along with previous carry.
Value11 DCD &062A7295
Value21 DCD &08594921
Value12 DCD &00101010
Value22 DCD &00010101
END
64 bit addition with memory
AREA Program, CODE, READONLY
ENTRY
Main
LDR R0, =Value1 ; Pointer to first value
LDR R1, [R0] ; Load first part of value1
LDR R2, [R0, #4] ; Load lower part of value1
LDR R0, =Value2 ; Pointer to second value
LDR R3, [R0] ; Load upper part of value2
LDR R4, [R0, #4] ; Load lower part of value2
ADDS R6, R2, R4 ; Add lower 4 bytes and set carry flag
ADC R5, R1, R3 ; Add upper 4 bytes including carry
LDR R0, =Result ; Pointer to Result
STR R5, [R0] ; Store upper part of result
STR R6, [R0, #4] ; Store lower part of result
SWI &11
Value1 DCD &12A2E640, &F2100123 ; Value to be added
Value2 DCD &001019BF, &40023F51 ; Value to be added
Result DCD 0 ; Space to store result
END
Add a series of 16 bit numbers
AREA Program, CODE, READONLY
ENTRY
Main
LDR R0, =Data1 ;load the address of the lookup table
EOR R1, R1, R1 ;clear R1 to store sum
LDR R2, Length ;init element count
Loop
LDR R3, [R0] ;get the data
ADD R1, R1, R3 ;add it to r1
ADD R0, R0, #+4 ;increment pointer
SUBS R2, R2, #0x1 ;decrement count with zero set
BNE Loop ;if zero flag is not set, loop
STR R1, Result ;otherwise done - store result
SWI &11

AREA Data1, DATA


Table DCW &2040 ;table of values to be added
ALIGN ;32 bit aligned
DCW &1C22
ALIGN
DCW &0242
ALIGN
TablEnd DCD 0
AREA Data2, DATA
Length DCW (TablEnd - Table) / 4 ;because we're having to align
ALIGN ;gives the loop count
Result DCW 0 ;storage for result
END
Scan a series of 16 bit numbers to find the largest
AREA Program, CODE, READONLY
ENTRY
Main
LDR R0, =Data1 ;load the address of the lookup table
EOR R1, R1, R1 ;clear R1 to store largest
LDR R2, Length ;init element count
CMP R2, #0
BEQ Done ;if table is empty
Loop
LDR R3, [R0] ;get the data
CMP R3, R1 ; bit is 1
BCC Looptest ;skip next line if zero
MOV R1, R3 ;increment -ve number count
Looptest
ADD R0, R0, #+4 ;increment pointer
SUBS R2, R2, #0x1 decrement count with zero set
BNE Loop ;if zero flag is not set, loop
Done
STR R1, Result ;otherwise done - store result
SWI &11

AREA Data1, DATA


Table DCW &A152 ;table of values to be tested
ALIGN DCW &7F61
ALIGN DCW &F123
ALIGN DCW &8000
ALIGN TablEnd DCD 0

• 38 AREA Data2, DATA


Length DCW (TablEnd - Table) / 4 ;because we're having to align
ALIGN ;gives the loop count
Scan a series of 32 bit numbers to find how many are negative
AREA Program, CODE, READONLY
ENTRY
Main
LDR R0, =Data1 ; load the address of the lookup table
EOR R1, R1, R1 ;clear R1 to store count
LDR R2, Length ;init element count
CMP R2, #0
BEQ Done ;if table is empty
Loop
LDR R3, [R0] get the data
CMP R3, #0
BPL Looptest ;skip next line if +ve or zero
ADD R1, R1, #1 ;increment -ve number count
Looptest
ADD R0, R0, #+4 ;increment pointer
SUBS R2, R2, #0x1 ;decrement count with zero set
BNE Loop ;if zero flag is not set, loop
Done
STR R1, Result ;otherwise done - store result
SWI &11
AREA Data1, DATA
Table DCD &F1522040 ;table of values to be added
DCD &7F611C22
DCD &80000242
TablEnd DCD 0
AREA Data2, DATA
Length DCW (TablEnd - Table) / 4 ;because we're having to align
ALIGN ;gives the loop count
Result DCW 0 ;storage for result
END
Scan a series of 16 bit numbers to nd how many are negative
AREA Program, CODE, READONLY
ENTRY
Main
LDR R0, =Data1 ;load the address of the lookup table
EOR R1, R1, R1 ;clear R1 to store count
LDR R2, Length ;init element count
CMP R2, #0
BEQ Done ;if table is empty
Loop
LDR R3, [R0] ;get the data
AND R3, R3, #0x8000 ;bit wise AND to see if the 16th
CMP R3, #0x8000 ;bit is 1
BEQ Looptest ;skip next line if zero
ADD R1, R1, #1 ;increment -ve number count
Looptest
ADD R0, R0, #+4 ;increment pointer
SUBS R2, R2, #0x1 ;decrement count with zero set
BNE Loop ;if zero flag is not set, loop
Done
STR R1, Result ;otherwise done - store result
SWI &11
AREA Data1, DATA
Table DCW &F152 ;table of values to be tested
ALIGN
DCW &7F61
ALIGN
DCW &8000
ALIGN
TablEnd DCD 0
AREA Data2, DATA
Length DCW (TablEnd - Table) / 4 ;because we're having to align
ALIGN ;gives the loop count
Result DCW 0 ;storage for result
END
ARM Instruction Set Format
31 2827 1615 87 0 Instruction type
Cond 0 0 I Opcode S Rn Rd Operand2 Data processing / PSR Transfer
Cond 0 0 0 0 0 0 A S Rd Rn Rs 1 0 0 1 Rm Multiply
Cond 0 0 0 0 1 U A S RdHi RdLo Rs 1 0 0 1 Rm Long Multiply (v3M / v4 only)
Cond 0 0 0 1 0 B 0 0 Rn Rd 0 0 0 0 1 0 0 1 Rm Swap
Cond 0 1 I P U B W L Rn Rd Offset Load/Store Byte/Word
Cond 1 0 0 P U S W L Rn Register List Load/Store Multiple
Cond 0 0 0 P U 1 W L Rn Rd Offset1 1 S H 1 Offset2 Halfword transfer : Immediate offset (v4 only)

Cond 0 0 0 P U 0 W L Rn Rd 0 0 0 0 1 S H 1 Rm Halfword transfer: Register offset (v4 only)

Cond 1 0 1 L Offset Branch


Cond 0 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 Rn Branch Exchange (v4T only)
Cond 1 1 0 P U N W L Rn CRd CPNum Offset Coprocessor data transfer
Cond 1 1 1 0 Op1 CRn CRd CPNum Op2 0 CRm Coprocessor data operation
Cond 1 1 1 0 Op1 L CRn Rd CPNum Op2 1 CRm Coprocessor register transfer
Cond 1 1 1 1 SWI Number Software interrupt
Conditional Execution
• Most instruction sets only allow branches to be executed
conditionally.
• However by reusing the condition evaluation hardware,
ARM effectively increases number of instructions.
– All instructions contain a condition field which determines
whether the CPU will execute them.
– Non-executed instructions soak up 1 cycle.
• This removes the need for many branches, which stall the
pipeline (3 cycles to refill).
– Allows very dense in-line code, without branches.
– The Time penalty of not executing several conditional
instructions is frequently less than overhead of the branch or
subroutine call that would otherwise be needed.
The Condition Field
31 28 24 20 16 12 8 4 0

Cond

0000 = EQ - Z set (equal) 1001 = LS - C clear or Z (set unsigned


0001 = NE - Z clear (not equal) lower or same)

0010 = HS / CS - C set (unsigned 1010 = GE - N set and V set, or N clear


higher or same) and V clear (>or =)
0011 = LO / CC - C clear (unsigned 1011 = LT - N set and V clear, or N clear
lower) and V set (>)
0100 = MI -N set (negative) 1100 = GT - Z clear, and either N set and
0101 = PL - N clear (positive or V set, or N clear and V set (>)
zero) 1101 = LE - Z set, or N set and V clear,or
0110 = VS - V set (overflow) N clear and V set (<, or =)
0111 = VC - V clear (no overflow) 1110 = AL - always
1000 = HI - C set and Z clear 1111 = NV - reserved.
(unsigned higher)
Branch instructions
• Branch : B{<cond>} label
• Branch with Link : BL{<cond>} sub_routine_label

31 28 27 25 24 23 0

Cond 1 0 1 L Offset

Link bit 0 = Branch


1 = Branch with link
Condition field

• The offset for branch instructions is calculated by the assembler:


– By taking the difference between the branch instruction and the target
address minus 8 (to allow for the pipeline).
– This gives a 26 bit offset which is right shifted 2 bits (as the bottom two
bits are always zero as instructions are word – aligned) and stored into
the instruction encoding.
– This gives a range of ± 32 Mbytes.
Data processing instructions
• The ARM data processing instructions are used to modify data values in registers.
The operations that are supported include arithmetic and bit-wise logical
combinations of 32-bit data types. One operand may be shifted or rotated en route
to the ALU, allowing, for example, shift and add in a single instruction.
ADD{cond}{S} Rd, Rn, Operand2
Data processing instructions
Multiply
• MUL{cond}{S} Rd, Rm, Rs
• MLA{cond}{S} Rd, Rm, Rs, Rn
Load/Store
• LDR|STR {<cond>}{B} Rd,addressing1
LDR{<cond>}SB/H/SH Rd, addressing2
LDR R0, [R1]
LDR R0, [R1, R2, LSL #2]
Load/Store Multiple registers
• Syntax: <LDM|STM>{<cond>}<addressing mode> Rn{!},<registers>{ˆ}
• LDMIA r0!, (r1-r3)
• STMIB r0!, (r1-r3)
Course Name: Embedded System
• Topic Name: Instruction set Architecture (ISA)

By:
Dr. Karmjit Singh Sandha
Assistant Professor, ECED

Reference Books:

• Steve Furber, “ARM System-on-Chip Architecture, Second Edition, PEARSON,


2013.
• Stephen Welsh, Peter Knaggs, “ARM: Assembly Language Programming”, Bourne
Mouth University Publication, 2003.
• Andrew N. Sloss, Dominic Symes, Chris Wright “ARM System Developers Guide,
Designing and Optimizing System Software”, ELSEVIER Publication.
Instruction Set of ARM
The ARM Instruction set can be divided
into six broad classes of instruction
1. Data Movement
2. Arithmetic
3. Memory Access
4. Logical and bit manipulation
5. Flow Control
6. System Control/ Privileged
Instruction Mnemonic
Condition code (cc) Mnemonic
ARM instructions
Type of operation:

Arithmetic
Branch
Load and Store
Logical
Move
Arithmetic Instructions
ADD Add

ADC Add with carry

SUB Subtract

SBC Subtract with carry

RSB Reverse subtract

RSC Reverse subtract with carry

MUL Multiply

MLA Multiply and accumulate

UMULL Multiply - unsigned long

UMLAL Multiply and accumulate - unsigned long

SMULL Multiply - signed long

SMLAL Multiply and accumulate - signed long

CMP Compare

CMN Compare negative


Branch Instructions:

B Branch
BL Branch with link
Load and Store Instructions
LDR Load word
LDRB Load byte
LDRSB Load signed byte
LDRH Load half word
LDRSH Load signed half word
LDM Load multiple
LDM sp! Pop
STR Store word
STRB Store byte
STRH Store half word
STM Store multiple
STM sp! Push
Logical Instructions:

AND AND
EOR Exclusive OR
ORR OR
BIC Bit clear
TST Test
TEQ Test equivalence
Move Instructions
MOV Move
MVN Move and negate
SWP Swap
SWPB Swap byte
MRS Move program status register to register
MSR Move register to program status register
Arithmetic Instruction
Add
Syntax: ADD{cond}{S} Rd, Rn, Operand2

Elements inside curly brackets are optional.

Usage: Adds the value in Rn to Operand2 and places the sum in Rd.

Condition flags: If S is specified then all flags are updated according to the result.

Examples:

ADD r7, r4, #99 ;adds 99 to the value in r4 and places the sum in r7

ADD r1, r2, r3 ;adds the value in r3 to the value in r2 and places the sum in r1
Add with carry
Syntax: ADC{cond}{S} Rd, Rn, Operand2
Elements inside curly brackets are optional.
Usage: Adds the value in Rn to Operand2 and adds another 1 if the carry flag is
set. The sum is placed in Rd.
Condition flags: If S is specified then all flags are updated according to the
result.
Examples:
ADC r7, r4, #99 ;adds 99 to the value in r4 and adds another 1 if the carry flag is
set. Places the sum in r7
ADC r1, r2, r3 ;adds the value in r3 to the value in r2 and adds 1 if the carry flag
is set. Places the sum in r1
ADCCSS r1, r2, r3
Subtract
Syntax: SUB{cond}{S} Rd, Rn, Operand2

Elements inside curly brackets are optional.

Usage: Subtracts Operand2 from the value in Rn and places the difference in
Rd.

Condition flags: If S is specified then all flags are updated according to the
result.

Examples:

SUB r7, r4, #99 ;subtracts 99 from the value in r4 and places the result in r7

SUB r1, r2, r3 ;subtracts the value in r3 from the value in r2 and places the
difference in r1
Subtract with carry

Syntax: SBC{cond}{S} Rd, Rn, Operand2

Elements inside curly brackets are optional.

Usage: Subtracts Operand2 from the value in Rn and subtracts another 1 if the
carry flag is clear. Places the difference in Rd.

Condition flags: If S is specified then all flags are updated according to the result.

Examples:

SBC r7, r4, #99;

SBC r1, r2, r3;


Reverse subtract
Syntax: RSB{cond}{S} Rd, Rn, Operand2

Elements inside curly brackets are optional.

Usage: Subtracts the value in Rn from Operand2 and places the difference in
Rd.

Condition flags: If S is specified then all flags are updated according to the
result.

Examples:

RSB r7, r4, #99 ;subtracts the value in r4 from 99 and places the result in r7

RSB r1, r2, r3 ;subtracts the value in r2 from the value in r3 and places the
difference in r1
Reverse subtract with carry
Syntax: RSC{cond}{S} Rd, Rn, Operand2

Elements inside curly brackets are optional.

Usage: Subtracts the value in Rn from Operand2 and subtracts another 1 if the
carry flag is clear. Places the difference in Rd.

Condition flags: If S is specified then all flags are updated according to the
result.

Examples:

RSC r7, r4, #99

RSC r1, r2, r3


Multiply
Syntax: MUL{cond}{S} Rd, Rm, Rs : Rd = Rm * Rs

Elements inside curly brackets are optional.

Usage: Multiples the values in registers Rm and Rs and places the least
significant 32 bits of the product in register Rd.

Condition flags: If S is specified then the N and Z flags are updated


according to the result, the V flag is not affected and the C flag is
unpredictable for the ARM7 and earlier processors.

Example:

MUL r5, r3, r9 ;multiply the values in r3 and r9 and places the result in r5
Multiply and accumulate
Syntax: MLA{cond}{S} Rd, Rm, Rs, Rn : Rd = (Rm * Rs) + Rn

Elements inside curly brackets are optional.

Usage: Adds the value in Rn to the product of the values in Rm and Rs and places
the least significant 32 bits of the result in register Rd.

Condition flags: If S is specified then the N and Z flags are updated according to the
result, the V flag is not affected and the C flag is unpredictable for the ARM7 and
earlier processors.

Example:

MLA r5, r3, r9, r6 ;multiply the values in r3 and r9, add the product to the value in
r6 and places the result in r5
Multiply - unsigned long
Syntax: UMULL{cond}{S} RdLo, RdHi, Rm, Rs

Elements inside curly brackets are optional.

Usage: Multiples the values (as unsigned integers) in registers Rm and Rs and
places the least significant 32 bits of the product in register RdLo and the most
significant 32 bits of the product in register RdHi.

Condition flags: If S is specified then the N and Z flags are updated according to
the result and the V and C flags are unpredictable for the ARM7 and earlier
processors.

Example:

UMULL r6, r5, r3, r9 ;multiply the values in r3 and r9 and places the result in r5
and r6
Multiply and accumulate - unsigned long
Syntax: UMLAL{cond}{S} RdLo, RdHi, Rm, Rs
Elements inside curly brackets are optional.
Usage: Multiples the values (as unsigned integers) in registers Rm and Rs and
adds the 64 bit product to the unsigned 64 bit value in registers RdLo (least
significant 32 bits) and RdHi (most significant 32 bits).
Condition flags: If S is specified then the N and Z flags are updated according to
the result and the V and C flags are unpredictable for the ARM7 and earlier
processors.
Example:
UMLAL r6, r5, r3, r9 ;multiply the values in r3 and r9 and add the product to the
values in r5 and r6
Multiply - signed long
Syntax: SMULL{cond}{S} RdLo, RdHi, Rm, Rs
Elements inside curly brackets are optional.
Usage: Multiples the values (as two's complement signed integers) in registers
Rm and Rs and places the least significant 32 bits of the product in register
RdLo and the most significant 32 bits of the product in register RdHi.
Condition flags: If S is specified then the N and Z flags are updated according to
the result and the V and C flags are unpredictable for the ARM7 and earlier
processors.
Example:
SMULL r6, r5, r3, r9 ;multiply the values in r3 and r9 and places the result in r5
and r6
Multiply and accumulate - signed long

Syntax: SMLAL{cond}{S} RdLo, RdHi, Rm, Rs

Elements inside curly brackets are optional.

Usage: Multiples the values (as two's complement signed integers) in registers Rm
and Rs and adds the 64 bit product to the two's complement signed 64 bit value in
registers RdLo (least significant 32 bits) and RdHi (most significant 32 bits).

Condition flags: If S is specified then the N and Z flags are updated according to the
result and the V and C flags are unpredictable for the ARM7 and earlier processors.

Example:

SMLAL r6, r5, r3, r9 ;multiply the values in r3 and r9 and add the product to the
values in r5 and r6
Compare
Syntax: CMP{cond} Rn, Operand2

Elements inside curly brackets are optional.

Usage: Subtracts Operand2 from the value in Rn and updates the flags
accordingly. The result is discarded.

Condition flags: All flags are updated according to the result.

Examples:

CMP r1, #9 ;set the flags as if 9 was subtracted from the value in r1.

CMP r6, r2 ;set the flags for the result of (r6 - r2) but discard the result
Compare negative
Syntax: CMN{cond} Rn, Operand2

Elements inside curly brackets are optional.

Usage: Add Operand2 to the value in Rn and updates the flags accordingly. The
result is discarded.

Condition flags: All flags are updated according to the result.

Examples:

CMN r1, #9 ;set the flags as if 9 was added to the value in r1.

CMN r6, r2 ;set the flags for the result of (r6 + r2) but discard the result
Data Movement
• Operations are:

– MOV{cond}{S} Rn, Operand2

– MVN {cond}{S} Rn, operand2

(move the NOT of the 32-bit value into a register)

Note that these make no use of operand1.

• Syntax:

– <Operation>{<cond>}{S} Rd, Operand2

• Find the value in r0, r1, r2:

– MVNEQ r1,#02 r1=

– MOV r0, r1 r0=

– MOVS r2, #10 r2=


Data Movement
• Operations are:
– MOV{cond}{S} Rn, Operand2
– MVN {cond}{S} Rn, operand2
(move the NOT of the 32-bit value into a register)
Note that these make no use of operand1.
• Syntax:
– <Operation>{<cond>}{S} Rd, Operand2
• Find the value in r0, r1, r2:
– MVNEQ r1,#02 r1=
– MOV r0, r1 r0=
– MOVS r2, #10 r2=
Data Movement
• Operations are:
– MOV{cond}{S} Rn, Operand2
– MVN {cond}{S} Rn, operand2
(move the NOT of the 32-bit value into a register)
Note that these make no use of operand1.
• Syntax:
– <Operation>{<cond>}{S} Rd, Operand2
• Find the value in r0, r1, r2:
– MVNEQ r1,#02 r1= 0xfffffffd
– MOV r0, r1 r0=0xfffffffd
– MOVS r2, #10 r2=0x00000010
Barrel Shifter - Left Shift
• Shifts left by the specified amount (multiplies
by powers of two) e.g.
LSL #5 = multiply by 32
MOV R0, R1, LSL #2
MOV R0, R1, LSL R2

CF Destination 0

Logical Shift Left (LSL)


Barrel Shifter - Right Shifts
Logical Shift Right
•Shifts right by the specified
amount (divides by powers of Logical Shift Right
two) e.g.
LSR #5 = divide by 32 ...0 Destination CF
MOV R0, R1, LSR #2
MOV R0, R1, LSR R2
Arithmetic Shift Right
Arithmetic Shift Right
•Shifts right (divides by powers of
two) and preserves the sign bit,
for 2's complement operations. Destination CF
e.g.
ASR #5 = divide by 32
MOV R0, R1, LSR #2 Sign bit shifted in
MOV R0, R1, LSR R2
Barrel Shifter - Rotations
Rotate Right (ROR) Rotate Right
• Similar to an ASR but
the bits wrap around as they
leave the LSB and appear as Destination CF
the MSB.
e.g. ROR #5
•Note the last bit rotated is
also used as the Carry Out.

Rotate Right Extended (RRX) Rotate Right through Carry


• This operation uses the
CPSR C flag as a 33rd bit.
Destination CF
•Rotates right by 1 bit.
Encoded as RRX #0.
Logical Instructions
• Logical instructions perform bitwise logical
operations on the two source registers.
• Syntax: <instruction>{<cond>}{S} Rd, Rn, N
• Elements inside curly brackets are optional
Example
• This example shows a logical OR operation between registers
r1 and r2. r0 holds the result.

• ORR R0, R1, R2

• Pre-execution
• r0 = 0x00000000 , r1 = 0x02040608, r2 = 0x10305070
• Post-execution
• r0 = ?
Example
• This example shows a more complicated logical
instruction called BIC, which carries out a logical bit
clear.
• PRE
r1 = 0b1111
r2 = 0b0101
BIC r0, r1, r2
• POST
r0 = 0b1010
• This is equivalent to
Rd = Rn AND NOT(N)
Example
• This example shows a logical OR operation between registers
r1 and r2. r0 holds the result.

• ORR R0, R1, R2

• Pre-execution
• r0 = 0x00000000 , r1 = 0x02040608, r2 = 0x10305070
• Post-execution
• r0 = 0x12345678
Branch Instructions
• A branch instruction changes the flow of execution or is used to call a routine. This
type of instruction allows programs to have subroutines, if-then-else structures,
and loops.
• Syntax:
• B{<cond>} label
• BL{<cond>} label
• BX{<cond>} Rm
• BLX{<cond>} label | Rm
Branch Instructions
Example of forward and backward unconditional branch

B forward
ADD r1, r2, #4
ADD r0, r6, #2
ADD r3, r7, #4
forward
SUB r1, r2, #4
backward
ADD r1, r2, #4
SUB r1, r2, #4
ADD r4, r6, r7
B backward
Branch Instructions
The branch with link, or BL, instruction is similar to the B instruction but overwrites
the link register lr with a return address. It performs a subroutine call. This example
shows a simple fragment of code that branches to a subroutine using the BL
instruction. To return from a subroutine, you copy the link register to the pc.

BL subroutine ; branch to subroutine


CMP r1, #5 ; compare r1 with 5
MOVEQ r1, #0 ; if (r1==5) then r1 = 0
:
subroutine
<subroutine code>
MOV pc, lr ; return by moving pc = lr
Load-Store Instructions
• Load-store instructions transfer data between
memory and processor registers. There are
three types of load-store instructions:
• Single-register transfer
• Multiple-register transfer,
• and Swap.
Single-Register Transfer
Single-Register Transfer
• These instructions are used for moving a single
data item in and out of a register. The data types
supported are signed and unsigned words (32-
bit), half words (16-bit), and bytes.
• Here are the various load-store single-register
transfer instructions.
• Syntax: <LDR|STR>{<cond>}{B} Rd,addressing1
LDR{<cond>}SB/H/SH Rd, addressing2
STR{<cond>}H Rd, addressing2
Single-Register Transfer
• LDR r0, [r1] ;r0 [r1, #0]
• LDR R0, [R1, #4] ; R0 [R1+ #4]
• LDR R0, [R1, R2] ;R0 [R1+R2]
• LDR R0, [R1, R2, LSL #2]
• Load the data from memory with address
[Sum of R1 and R2 with shift left by 2]
Single-Register Transfer
• STR r0, [r1] ; r0 [r1, #0]
• STR r0, [r1] ; r0 [r1, #0]
• STR R0, [R1, #4] ; R0 [R1+ #4]
• STR R0, [R1, R2] ;R0 [R1+R2]
• STR R0, [R1, R2, LSL #2] ;
• Store to memory with address [Sum of R1 and
R2 with shift left by 2]
Single-Register Transfer Examples
PRE:
r0 = 0x00000000
r1 = 0x00090000
mem32[0x00009000] = 0x01010101
mem32[0x00009004] = 0x02020202
• LDR r0, [r1, #4]!
Pre-indexing with write back:
• POST:
r0 =
r1 =
Single-Register Transfer Examples
PRE:
r0 = 0x00000000
r1 = 0x00090000
mem32[0x00009000] = 0x01010101
mem32[0x00009004] = 0x02020202
• LDR r0, [r1, #4]!
Preindexing with writeback:
• POST:
r0 = 0x02020202
r1 = 0x00009004
Single-Register Transfer Examples
PRE:
r0 = 0x00000000
r1 = 0x00090000
mem32[0x00009000] = 0x01010101
mem32[0x00009004] = 0x02020202
Preindexing:
LDR r0, [r1, #4]
• POST:
r0 = 0x02020202
r1 = 0x00009000
Single-Register Transfer Examples
PRE:
r0 = 0x00000000
r1 = 0x00090000
mem32[0x00009000] = 0x01010101
mem32[0x00009004] = 0x02020202
LDR r0, [r1], #4
Postindexing:
• POST:
r0 = 0x01010101
r1 = 0x00009004
Multiple-Register Transfer
• Load-store multiple instructions can transfer multiple
registers between memory and the processor in a single
instruction. The transfer occurs from a base address
register Rn pointing into memory. Multiple-register transfer
instructions are more efficient from single-register transfers
for moving blocks of data around memory and saving and
restoring context.
• Syntax: <LDM|STM>{<cond>}<addressing mode> Rn{!},<registers>{ˆ}
• LDMIA r0!, (r1-r3)
• STMIB r0!, (r1-r3)
Multiple-Register Transfer
Multiple-Register Transfer
Multiple-Register Transfer
LDM<mode> R0!, {R1-R3}

Mode Start Address End Address Data Word Rn!=2010


IA 2010 2018 2010:R1 201C
2014: R2
2018: R3
IB 2010 201C 2014:R1 201C
2018: R2
201C: R3
IA 2008 2010 2008: R1 2004
200C: R2
2010: R3
IA 2004 20oC 2004: R1 2004
2008: R2
200C: R3
Addressing Modes
• Immediate Addressing Mode
• Register Addressing Mode
• Offset Addressing
• Pre-Index Addressing
• Post-Index Addressing
Immediate Addressing Mode
• When an immediate constant value is the part
of instruction.
• Example:
• MOV R0, #05
• ADD R0,R0, #07
• SUB R0,R0, #06
Register Addressing Mode
• When the address of the data is specified by
the registers of processor.
• Example:
• MOV R0, R2
• ADD R0,R1, R2
• SUB R0,R1, R2
Offset Addressing
• In this the data is to read/write to/from memory
and the offset addressing of the memory address
is formed by adding (or subtracting) an offset to
or from the value held in a base register.
• Examples:
1. LDR R0, [R1] (Constant Value)
2. LDR R0, [R1, #4] (Constant Value)
3. LDR R0, [R1, R2] (Register)
4. LDR R0, [R1, R2, LSL #2] (Scaled)
Pre-Index Addressing
• In pre-index addressing the memory address if formed in the same way as
for offset addressing. The address is not only used to access memory, but
the base register is also modified to hold the new value. In the ARM
system this is known as a write-back and is denoted by placing a
exclamation mark after at the end of the (op2) code.

1. LDR R0, [R1, #4]! (Constant Value)


2. LDR R0, [R1, R2]! (Register)
3. LDR R0, [R1, R2, LSL #2]! (Scaled)
Post-Index Addressing
• In post-index address the memory address is the base register
value. As a side-effect, an offset is added to or subtracted
from the base register value and the result is written back to
the base register.
• Example:
• LDR R0, [R1], #4
• LDR R0, [R1], R2
• LDR R0, [R1], R2, LSL #2
Find the one's compliment (inverse) of
a number
Find the one's compliment (inverse) of
a number
AREA Program, CODE, READONLY
ENTRY
Main
LDR R1, Value ; Load the number to be complimented
MVN R1, R1 ; NOT the contents of R1
STR R1, Result ; Store the result
SWI &11

Value DCD &C123 ; Value to be complemented


Result DCD 0 ; Storage for result
END
Add two numbers
Add two numbers
AREA Program, CODE, READONLY
ENTRY
Main
LDR R1, Value1 ; Load the first number
LDR R2, Value2 ; Load the second number
ADD R1, R1, R2 ; ADD them together into R1 (x = x + y)
STR R1, Result ; Store the result
SWI &11
Value1 DCD &37E3C123 ; First value to be added
Value2 DCD &367402AA ; Second value to be added
Result DCD 0 ; Storage for result
END
Add two numbers and store the result
Add two numbers and store the result
AREA Program, CODE, READONLY
ENTRY
Main
LDR R0, =Value1 ; Load the address of first value
LDR R1, [R0] ; Load what is at that address
ADD R0, R0, #0x4 ; Adjust the pointer
LDR R2, [R0] ; Load what is at the new addr
ADD R1, R1, R2 ; ADD together
LDR R0, =Result ; Load the storage address
STR R1, [R0] ; Store the result
SWI &11 ; All done

Value1 DCD &37E3C123 ; First value


Value2 DCD &367402AA ; Second value
Result DCD 0 ; Space to store result
END
Find the larger of two numbers
Find the larger of two numbers
AREA Program, CODE, READONLY
ENTRY
Main
LDR R1, Value1 ; Load the first value to be compared
LDR R2, Value2 ; Load the second value to be compared
CMP R1, R2 ; Compare them
BHI Done ; If R1 contains the highest
MOV R1, R2 ; otherwise overwrite R1
Done
STR R1, Result ; Store the result
SWI &11
Value1 DCD &12345678 ; Value to be compared
Value2 DCD &87654321 ; Value to be compared
Result DCD 0 ; Space to store result
END
Program: Addition of Two 64-bit
Numbers
Program: Addition of Two 64-bit
Numbers
• Code:
AREA ADD_64BITNOS_PROGRAM, CODE
ENTRY
LDR R1, Value11 ; First number lower 32 bits
LDR R2, Value21 ; First number higher 32 Bits
LDR R3, Value12 ; Second number lower 32 Bits
LDR R4, Value22 ; Second number higher 32 Bits
ADDS R3, R3, R1 ; Add the lower order 32 bits of 2 nos.
ADC R4, R4, R2 ; Add the higher 32 bits along with previous carry.
Value11 DCD &062A7295
Value21 DCD &08594921
Value12 DCD &00101010
Value22 DCD &00010101
END
64 bit addition with memory
AREA Program, CODE, READONLY
ENTRY
Main
LDR R0, =Value1 ; Pointer to first value
LDR R1, [R0] ; Load first part of value1
LDR R2, [R0, #4] ; Load lower part of value1
LDR R0, =Value2 ; Pointer to second value
LDR R3, [R0] ; Load upper part of value2
LDR R4, [R0, #4] ; Load lower part of value2
ADDS R6, R2, R4 ; Add lower 4 bytes and set carry flag
ADC R5, R1, R3 ; Add upper 4 bytes including carry
LDR R0, =Result ; Pointer to Result
STR R5, [R0] ; Store upper part of result
STR R6, [R0, #4] ; Store lower part of result
SWI &11
Value1 DCD &12A2E640, &F2100123 ; Value to be added
Value2 DCD &001019BF, &40023F51 ; Value to be added
Result DCD 0 ; Space to store result
END
Add a series of 16 bit numbers
AREA Program, CODE, READONLY
ENTRY
Main
LDR R0, =Data1 ;load the address of the lookup table
EOR R1, R1, R1 ;clear R1 to store sum
LDR R2, Length ;init element count
Loop
LDR R3, [R0] ;get the data
ADD R1, R1, R3 ;add it to r1
ADD R0, R0, #+4 ;increment pointer
SUBS R2, R2, #0x1 ;decrement count with zero set
BNE Loop ;if zero flag is not set, loop
STR R1, Result ;otherwise done - store result
SWI &11

AREA Data1, DATA


Table DCW &2040 ;table of values to be added
ALIGN ;32 bit aligned
DCW &1C22
ALIGN
DCW &0242
ALIGN
TablEnd DCD 0
AREA Data2, DATA
Length DCW (TablEnd - Table) / 4 ;because we're having to align
ALIGN ;gives the loop count
Result DCW 0 ;storage for result
END
Scan a series of 16 bit numbers to find the largest
AREA Program, CODE, READONLY
ENTRY
Main
LDR R0, =Data1 ;load the address of the lookup table
EOR R1, R1, R1 ;clear R1 to store largest
LDR R2, Length ;init element count
CMP R2, #0
BEQ Done ;if table is empty
Loop
LDR R3, [R0] ;get the data
CMP R3, R1 ; bit is 1
BCC Looptest ;skip next line if zero
MOV R1, R3 ;increment -ve number count
Looptest
ADD R0, R0, #+4 ;increment pointer
SUBS R2, R2, #0x1 decrement count with zero set
BNE Loop ;if zero flag is not set, loop
Done
STR R1, Result ;otherwise done - store result
SWI &11

AREA Data1, DATA


Table DCW &A152 ;table of values to be tested
ALIGN DCW &7F61
ALIGN DCW &F123
ALIGN DCW &8000
ALIGN TablEnd DCD 0

• 38 AREA Data2, DATA


Length DCW (TablEnd - Table) / 4 ;because we're having to align
ALIGN ;gives the loop count
Scan a series of 32 bit numbers to find how many are negative
AREA Program, CODE, READONLY
ENTRY
Main
LDR R0, =Data1 ; load the address of the lookup table
EOR R1, R1, R1 ;clear R1 to store count
LDR R2, Length ;init element count
CMP R2, #0
BEQ Done ;if table is empty
Loop
LDR R3, [R0] get the data
CMP R3, #0
BPL Looptest ;skip next line if +ve or zero
ADD R1, R1, #1 ;increment -ve number count
Looptest
ADD R0, R0, #+4 ;increment pointer
SUBS R2, R2, #0x1 ;decrement count with zero set
BNE Loop ;if zero flag is not set, loop
Done
STR R1, Result ;otherwise done - store result
SWI &11
AREA Data1, DATA
Table DCD &F1522040 ;table of values to be added
DCD &7F611C22
DCD &80000242
TablEnd DCD 0
AREA Data2, DATA
Length DCW (TablEnd - Table) / 4 ;because we're having to align
ALIGN ;gives the loop count
Result DCW 0 ;storage for result
END
Scan a series of 16 bit numbers to nd how many are negative
AREA Program, CODE, READONLY
ENTRY
Main
LDR R0, =Data1 ;load the address of the lookup table
EOR R1, R1, R1 ;clear R1 to store count
LDR R2, Length ;init element count
CMP R2, #0
BEQ Done ;if table is empty
Loop
LDR R3, [R0] ;get the data
AND R3, R3, #0x8000 ;bit wise AND to see if the 16th
CMP R3, #0x8000 ;bit is 1
BEQ Looptest ;skip next line if zero
ADD R1, R1, #1 ;increment -ve number count
Looptest
ADD R0, R0, #+4 ;increment pointer
SUBS R2, R2, #0x1 ;decrement count with zero set
BNE Loop ;if zero flag is not set, loop
Done
STR R1, Result ;otherwise done - store result
SWI &11
AREA Data1, DATA
Table DCW &F152 ;table of values to be tested
ALIGN
DCW &7F61
ALIGN
DCW &8000
ALIGN
TablEnd DCD 0
AREA Data2, DATA
Length DCW (TablEnd - Table) / 4 ;because we're having to align
ALIGN ;gives the loop count
Result DCW 0 ;storage for result
END
ARM Instruction Set Format
31 2827 1615 87 0 Instruction type
Cond 0 0 I Opcode S Rn Rd Operand2 Data processing / PSR Transfer
Cond 0 0 0 0 0 0 A S Rd Rn Rs 1 0 0 1 Rm Multiply
Cond 0 0 0 0 1 U A S RdHi RdLo Rs 1 0 0 1 Rm Long Multiply (v3M / v4 only)
Cond 0 0 0 1 0 B 0 0 Rn Rd 0 0 0 0 1 0 0 1 Rm Swap
Cond 0 1 I P U B W L Rn Rd Offset Load/Store Byte/Word
Cond 1 0 0 P U S W L Rn Register List Load/Store Multiple
Cond 0 0 0 P U 1 W L Rn Rd Offset1 1 S H 1 Offset2 Halfword transfer : Immediate offset (v4 only)

Cond 0 0 0 P U 0 W L Rn Rd 0 0 0 0 1 S H 1 Rm Halfword transfer: Register offset (v4 only)

Cond 1 0 1 L Offset Branch


Cond 0 0 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 Rn Branch Exchange (v4T only)
Cond 1 1 0 P U N W L Rn CRd CPNum Offset Coprocessor data transfer
Cond 1 1 1 0 Op1 CRn CRd CPNum Op2 0 CRm Coprocessor data operation
Cond 1 1 1 0 Op1 L CRn Rd CPNum Op2 1 CRm Coprocessor register transfer
Cond 1 1 1 1 SWI Number Software interrupt
Conditional Execution
• Most instruction sets only allow branches to be executed
conditionally.
• However by reusing the condition evaluation hardware,
ARM effectively increases number of instructions.
– All instructions contain a condition field which determines
whether the CPU will execute them.
– Non-executed instructions soak up 1 cycle.
• This removes the need for many branches, which stall the
pipeline (3 cycles to refill).
– Allows very dense in-line code, without branches.
– The Time penalty of not executing several conditional
instructions is frequently less than overhead of the branch or
subroutine call that would otherwise be needed.
The Condition Field
31 28 24 20 16 12 8 4 0

Cond

0000 = EQ - Z set (equal) 1001 = LS - C clear or Z (set unsigned


0001 = NE - Z clear (not equal) lower or same)

0010 = HS / CS - C set (unsigned 1010 = GE - N set and V set, or N clear


higher or same) and V clear (>or =)
0011 = LO / CC - C clear (unsigned 1011 = LT - N set and V clear, or N clear
lower) and V set (>)
0100 = MI -N set (negative) 1100 = GT - Z clear, and either N set and
0101 = PL - N clear (positive or V set, or N clear and V set (>)
zero) 1101 = LE - Z set, or N set and V clear,or
0110 = VS - V set (overflow) N clear and V set (<, or =)
0111 = VC - V clear (no overflow) 1110 = AL - always
1000 = HI - C set and Z clear 1111 = NV - reserved.
(unsigned higher)
Branch instructions
• Branch : B{<cond>} label
• Branch with Link : BL{<cond>} sub_routine_label

31 28 27 25 24 23 0

Cond 1 0 1 L Offset

Link bit 0 = Branch


1 = Branch with link
Condition field

• The offset for branch instructions is calculated by the assembler:


– By taking the difference between the branch instruction and the target
address minus 8 (to allow for the pipeline).
– This gives a 26 bit offset which is right shifted 2 bits (as the bottom two
bits are always zero as instructions are word – aligned) and stored into
the instruction encoding.
– This gives a range of ± 32 Mbytes.
Data processing instructions
• The ARM data processing instructions are used to modify data values in registers.
The operations that are supported include arithmetic and bit-wise logical
combinations of 32-bit data types. One operand may be shifted or rotated en route
to the ALU, allowing, for example, shift and add in a single instruction.
ADD{cond}{S} Rd, Rn, Operand2
Data processing instructions
Multiply
• MUL{cond}{S} Rd, Rm, Rs
• MLA{cond}{S} Rd, Rm, Rs, Rn
Load/Store
• LDR|STR {<cond>}{B} Rd,addressing1
LDR{<cond>}SB/H/SH Rd, addressing2
LDR R0, [R1]
LDR R0, [R1, R2, LSL #2]
Load/Store Multiple registers
• Syntax: <LDM|STM>{<cond>}<addressing mode> Rn{!},<registers>{ˆ}
• LDMIA r0!, (r1-r3)
• STMIB r0!, (r1-r3)
Thumb State of ARM Processor
Outline
Thumb State
 Design philosophy
 Thumb state Entry and Exit
 Switching from ARM to state
Thumb’s Programmer’s model
 Registers
 ARM and Thumb similarities and
differences
Thumb Implementation
 Decompressor
Thumb Applications
 Thumb State
Thumb Design Philosophy
Code Density
Switching from ARM to Thumb
Operation
Operation
Operations
Operations
Operation
Operation
Operation
Operation
Operation
Operation
Instruction Set Examples
Memory Protection Units (MPU) and
Memory Management Units (MMU)

Dr. KS Sandha
Assistant Professor, ECED
Thapar Institute of Engineering and Technology,
Patiala
Outline
Peripherals

Embedded Systems
(UEC513)
Accessing of I/O Devices
• More than one I/O devices may be connected through
set of three bus.
• Need to assign an unique address
• Two mapping techniques
– Memory mapped I/O
– I/O mapped I/O
I/O Mapping Techniques
• Two techniques are used to assign addressing to I/O
– Memory mapped I/O
– I/O mapped I/O
Accessing of I/O through polling
• Normally, the data transfer of rate of I/O devices is slower than the
speed of the processor. This creates the need for mechanisms to
synchronize data transfers between them.
• Program-controlled I/O: Processor continuously check the status
flag to achieve the necessary synchronization. It is called polling

Two other mechanisms used for synchronizing data transfers between


the processor and memory:
Interrupts driven
Direct Memory Access (DMA).
Interrupt driven I/O
• I/O devices send the request
about the readiness of the data
• Processor complete the current
instruction and send the
acknowledgement to respective
I/O
• Example: Let processor is
executing a program and at
instruction located at address i
when an interrupt occurs.
• Routine executed in response to
an interrupt request is called the
interrupt-service routine (ISR).
• When an interrupt occurs, control
must be transferred to the
interrupt service routine.
• After completion of ISR, the
control back to main program
Interrupt service routine (ISR)
• CPU suspends execution of the current program
– Saves the address of the next instruction to be
executed (current contents of PC) and any other data
• CPU sets the PC to the starting address of an ISR
• CPU proceeds to the fetch cycle and fetches the first
instruction in ISR which is generally a part of the OS
• ISR typically determines the nature of the interrupt and performs
whatever actions are needed.
For example, ISR determines which I/O module
generated the interrupt and may branch to a program
that will write more data out to that I/O module.
Once ISR is completed, CPU will resume the execution
of the user program at the point of interruption.
Daisy Chain in Interrupt
• Connections of all interrupt is in serial
• First device has highest priority
• Same IR is used
• INTA is used to respond to the devices
• Start scanning from device 1 and so on
Multiple Interrupts
Two methods can be used to handle multiple interrupt:
• Sequential execution
• Execute as per priority of interrupt
Direct Memory Access
• A special control unit to provide transfer a block of data directly
between IO and memory by bypassing the processor
• It uses the busses of processor
• It is not a processor, so not having any instruction set
Direct Memory access
• DMA can transfer block of data from IO to processor,
memory to IO, memory to memory without any intervention
from processor
• To initiate the DMA transfer, the processor load the
information about the DMA controller:
– Starting address
– Number of words to be transfer
– Direction of transfer
– Modes of transfer
• After the completion of DMA transfer, it inform the processor
by raising interrupt signal
Operation of DMA with CPU
Coprocessors
Outline
Quiz 1
Summary
I2C and SPI Protocols, USART and GPIO

Dr. KS Sandha
Assistant Professor, ECED
Thapar Institute of Engineering and Technology,
Patiala
Outline
AMBA Architecture (AHB, ASB and APB)
Outline
A basic ARM memory system
References
1. ARM System on Chip Architecture, Second Edition,
Steve Furber.

2. Video Lectures of Prof. Mouli Sankaran on ARM


Based Development.
[Link]
ARM Development Environment

You might also like