20-Mar-22
Chapter 4
ARM MICROCONTROLLER
Introduction to ARM Lmt
Founded in November 1990
Spun out of (tiền thân) Acorn
Computers
Initial funding from Apple, Acorn and
VLSI
Designs the ARM range of RISC
processor cores
Licenses ARM core designs to
semiconductor partners who fabricate
and sell to their customers
ARM does not fabricate silicon itself
2
1
20-Mar-22
ARM does not fabricate (chế tạo) silicon itself
Also develop technologies to assist with the design-in of the ARM
architecture
Software tools, boards, debug hardware
Application software
Bus architectures
Peripherals, etc
ARM= Advanced RISC Machine
ARM (Advanced RISC Machine)
is the industry's leading provider of 32-bit embedded microprocessors
offering (cung cấp) a wide range of processors that deliver high
performance, industry leading power efficiency and reduced system cost
2
20-Mar-22
ARM Partnership Model
Design and license ARM core design but not fabricate
ARM Processor Applications
3
20-Mar-22
Why ARM?
One of the most licensed and thus widespread processor cores in
the world
Used in PDA, cell phones, multimedia players, handheld game console,
digital TV and cameras
ARM7: GBA, iPod
ARM9: NDS, PSP, Sony Ericsson, BenQ
ARM11: Apple iPhone, Nokia N93, N800
90% of 32-bit embedded RISC processors till 2009
Used especially in portable devices due to its low power
consumption and reasonable performance
ARM processors
A simple but powerful design
A whole family of designs sharing similar design principles and a
common instruction set
4
20-Mar-22
Naming ARM
ARMxyzTDMIEJFS
x: series
y: MMU
z: cache
T: Thumb
D: debugger
M: Multiplier
I: EmbeddedICE (built-in debugger hardware)
E: Enhanced instruction
J: Jazelle (JVM)
F: Floating-point
S: Synthesizible version (source code version for EDA tools)
Definition of “Architecture”
The Architecture is the contract between the Hardware and the Software
Confers rights and responsibilities to both the Hardware and the Software
MUCH more than just the instruction set
The architecture distinguishes between:
Architected behaviors:
Must be obeyed
May be just the limits of behavior rather than specific behaviors
Implementation specific behaviors – that expose the micro-architecture
Certain areas are declared implementation specific. E.g.:
Power-down
Cache and TLB Lockdown
Details of the Performance Monitors
Code obeying the architected behaviors is portable across implementations
Reliance on implementation specific behaviors gives no such guarantee
Architecture is different from Micro-architecture
What vs How 10
5
20-Mar-22
History
ARM has quite a lot of history
First ARM core (ARM1) ran code in April 1985…
3 stage pipeline very simple RISC-style processor
Original processor was designed for the Acorn Microcomputer
Replacing a 6502-based design
ARM Ltd formed in 1990 as an “Intellectual Property” company
Taking the 3 stage pipeline as the main building block
This 3 stage pipeline evolved into the ARM7TDMI
Still the mainstay of ARM’s volume
Code compatibility with ARM7TDMI remains very important
Especially at the applications level
11
Evolution of the ARM Architecture
Original ARM architecture:
32-bit RISC architecture focussed on core instruction set
16 Registers - 1 being the Program counter – generally accessible
Conditional execution on all instructions
Load/Store Multiple operations - Good for Code Density
Shifts available on data processing and address generation
Original architecture had 26-bit address space
Augmented by a 32-bit address space early in the evolution
12
6
20-Mar-22
Thumb instruction set was the next big step
ARMv4T architecture (ARM7TDMI)
Introduced a 16-bit instruction set alongside the 32-bit instruction set
Different execution states for different instruction sets
Switching ISA as part of a branch or exception
Not a full instruction set – ARM still essential
ARMv4 architecture was still focused on the Core instruction set only
13
Versions, cores and architectures ?
14
7
20-Mar-22
ARM Instruction Sets
15
Popular ARM architectures
ARM7TDMI
3 pipeline stages (fetch/decode/execute)
High code density/low power consumption
One of the most used ARM-version (for low-end systems)
All ARM cores after ARM7TDMI include TDMI even if they do not
include TDMI in their labels
ARM9TDMI
Compatible with ARM7
5 stages (fetch/decode/execute/memory/write)
Separate instruction and data cache
ARM11
8
20-Mar-22
ARM family comparison
Processor family # of pipeline stages Memory Clock Rate MIPS/MHz
organization
ARM6 3 Von Neumann 25 MHz
ARM7 3 Von Neumann 66 MHz 0.9
ARM8 5 Von Neumann 72 MHz 1.2
ARM9 5 Harvard 200 MHz 1.1
ARM10 6 Harvard 400 MHz 1.25
StrongARM 5 Harvard 233 MHz 1.15
ARM11 8 Von Neumann/ 550 MHz 1.2
Harvard
17
ARM is a RISC
RISC: simple but powerful instructions that execute within a
single cycle at high clock speed.
Four major design rules:
Instructions: reduced set/single cycle/fixed length
Pipeline: decode in one stage/no need for microcode
Registers: a large set of general-purpose registers
Load/store architecture: data processing instructions apply to registers
only; load/store to transfer data from memory
Results in simple design and fast clock rate
The distinction blurs because CISC implements RISC concepts
9
20-Mar-22
ARM features
Different from pure RISC in several ways:
Variable cycle execution for certain instructions: multiple-register
load/store (faster/higher code density)
Inline barrel shifter leading to more complex instructions: improves
performance and code density
Thumb 16-bit instruction set: 30% code density improvement
Conditional execution: improve performance and code density by
reducing branch
Enhanced instructions: DSP instructions
ARM architecture
32-bit RISC-processor core (32-bit instructions)
37 pieces of 32-bit integer registers(16 available)
Pipelined (ARM7: 3 stages)
Cached (depending on the implementation)
Von Neuman-type bus structure (ARM7), Harvard (ARM9)
8 / 16 / 32 -bit data types
7 modes of operation (usr, fiq, irq, svc, abt, sys, und)
Simple structure -> reasonably good speed / power consumption
ratio
20
10
20-Mar-22
What is ARM Architecture
ARM architecture is a family of RISC-based processor
architectures
Well-known for its power efficiency;
Hence widely used in mobile devices, such as smart-phones and tablets
Designed and licensed to a wide eco-system by ARM
ARM Holdings
The company designs ARM-based processors;
Does not manufacture, but licenses designs to semiconduc-tor partners
who add their own Intellectual Property (IP)on top of ARM’s IP,
fabricate and sell to customers;
Also offer other IP apart from processors, such as physical IPs,
interconnect IPs, graphics cores, and development tools 21
ARM7 Architecture
Load/store architecture
Most instructions are RISCy, Some multi-register operations take multiple cycles
All instructions can be executed conditionally
ARM7 is a small, low power, 32-bit microprocessor. Three-stage pipeline, each
stage takes one clock cycle
Instruction fetch from memory
Instruction decode
Instruction execution.
Register read
A shift applied to one operand and the ALU operation
Register write
This limits the CPU max clock speed to around 80 MHz on a 0.35-micron
silicon process.
22
11
20-Mar-22
ARM CPU Core Organization
23
ARM7 Features
ARM7 uses von-Neumann memory architecture where instructions
and data occupy single address space that can limit the performance
Instruction fetching (and execution) must stop for instructions that access
memory
The reduced cost of a single memory outweighs performance in many
embedded applications.
The pipeline stalls during load and store operations, ARM7 can continue
useful work.
24
12
20-Mar-22
ARM Architectures
25
Examle: ARM926EJ-S
5 stage pipeline single issue core
Fetch, Decode, Execute, Memory, Writeback
Most common instructions take 1 cycle in each pipeline stage
Split Instruction/Data Level1 caches Virtually tagged
MMU – hardware page table walk based
26
13
20-Mar-22
ARM1176JZF-S
8 stage pipeline single issue
Split Instruction/Data Level1 caches Physically tagged
Two cycle memory latency
MMU – hardware page table walk based
Hardware branch prediction
27
ARM Processor Families
Cortex-A series (Application)
High performance processors capable of
full Operating Sys-tem (OS) support;
Applications include smart-phones,
digital TV, smart books, home gateways
etc.
Cortex-R series (Real-time)
High performance for real-time
applications;
High reliability
Applications include automotive
braking system, power-trains etc.
28
14
20-Mar-22
Cortex-M series (Microcontroller)
Cost-sensitive solutions for deterministic microcontroller applications;
Applications include microcontrollers, mixed signal devices, smart
sensors, automotive body electronics and airbags;
SecurCore series
High security applications.
Previous classic processors: Include ARM7, ARM9, ARM11
families
29
ARM Families and Architecture Over Time
30
15
20-Mar-22
ARM Cortex-M Series
Cortex-M series: Cortex-M0, M0+, M1, M3, M4, M7, M33.
Energy-efficiency
Lower energy cost, longer battery life
Smaller code
Lower silicon costs
Ease of use
Faster software development and reuse
Embedded applications
Smart metering, human interface devices, automotive and industrial control
systems, white goods, consumer products and medical instrumentation
31
ARM Processors vs. ARM Architectures
ARM architecture
Describes the details of instruction set, programmer’s model, exception
model, and memory map
Documented in the Architecture Reference Manual
ARM processor
Developed using one of the ARM architectures
More implementation details, such as timing information
Documented in processor’s Technical Reference Manual
32
16
20-Mar-22
Cortex-M4 Block Diagram
33
ARM Cortex-M4
Latest Cortex-M series CPU that has a combination of efficient signal
processing and low-power.
34
17
20-Mar-22
Cortex-M4 Block Diagram (cont.)
Processor core
Contains internal registers, the ALU, data path, and some control logic
Registers include sixteen 32-bit registers for both general and special
usage
Processor pipeline stages
Three-stage pipeline: fetch, decode, and execution
Some instructions may take multiple cycles to execute, in which case the
pipeline will be stalled
35
The pipeline will be flushed if a branch instruction is executed
Up to two instructions can be fetched in one transfer (16-bit instructions)
36
18
20-Mar-22
Nested Vectored Interrupt Controller (NVIC)
Up to 240 interrupt request signals and a non-maskable interrupt (NMI)
Automatically handles nested interrupts, such as comparing priorities
between interrupt requests and the current priority level
Wakeup Interrupt Controller (WIC)
For low-power applications, the microcontroller can enter sleep mode by
shutting down most of the components.
When an interrupt request is detected, the WIC can inform the power
management unit to power up the system.
37
Memory Protection Unit (optional)
Used to protect memory content, e.g. make some memory regions read-
only or preventing user applications from accessing privileged
application data
Bus interconnect
Allows data transfer to take place on different buses simultaneously
Provides data transfer management, e.g. a write buffer, bit-oriented
operations (bit-band)
May include bus bridges (e.g. AHB-to-APB bus bridge) to connect
different buses into a network using a single global memory space
38
19
20-Mar-22
Cortex-M4 Processor Overview
Cortex-M4 Processor
Introduced in 2010
Designed with a large variety of highly efficient signal processing
features
Features extended single-cycle multiply accumulate instructions,
optimized SIMD arithmetic, saturating arithmetic and an optional
Floating Point Unit.
High Performance Efficiency
Low Power Consumption
Longer battery life – especially critical in mobile products
39
Cortex-M4 Processor Features
32-bit Reduced Instruction Set Computing (RISC) processor
Harvard architecture
Separated data bus and instruction bus
Instruction set
Include the entire Thumb®-1 (16-bit) and Thumb®-2 (16/32-bit)
instruction sets
Supported Interrupts
Non-maskable Interrupt (NMI) + 1 to 240 physical interrupts
8 to 256 interrupt priority levels
40
20
20-Mar-22
Cortex-M4 Registers
41
R0 – R12: general purpose registers
Low registers (R0 – R7) can be accessed by any instruction
High registers (R8 – R12) sometimes cannot be accessed e.g. by some
Thumb (16-bit) instructions
R13: Stack Pointer (SP)
Records the current address of the stack
Used for saving the context of a program while switching between tasks
Cortex-M4 has two SPs: Main SP, used in applications that require
privileged access e.g. OS kernel, and exception handlers, and Process SP,
used in base-level application code (when not running an exception
handler) 42
21
20-Mar-22
Program Counter (PC)
Records the address of the current
instruction code
Automatically incremented by 4 at
each operation (for 32-bit instruction
code), except branching operations
A branching operation, such as
function calls, will change the PC to a
specific address, meanwhile it saves
the current PC to the Link Register
(LR)
43
R14: Link Register (LR)
The LR is used to store the return address of a subroutine or a function call
The program counter (PC) will load the value from LR after a function is
finished
44
22
20-Mar-22
xPSR, combined Program Status Register
Provides information about program execution and ALU flags
Application PSR (APSR)
Interrupt PSR (IPSR)
Execution PSR (EPSR)
45
APSR
N: negative flag – set to one if the result from ALU is negative
Z: zero flag – set to one if the result from ALU is zero
C: carry flag – set to one if an unsigned overflow occurs
V: overflow flag – set to one if a signed overflow occurs
Q: sticky saturation flag – set to one if saturation has occurred in saturating
arithmetic instructions, or overflow has occurred in certain multiply
instructions
46
23
20-Mar-22
IPSR
ISR number – current executing interrupt service routine number
EPSR
T: Thumb state – always one since Cortex-M4 only supports the Thumb
state (more on processor states in the next module)
IC/IT: Interrupt-Continuable Instruction (ICI) bit, IF-THEN instruction
status bit
47
Interrupt mask registers
1-bit PRIMASK
Set to one will block all the interrupts apart from nonmaskable interrupt
(NMI) and the hard fault exception
1-bit FAULTMASK
Set to one will block all the interrupts apart from NMI
1-bit BASEPRI
Set to one will block all interrupts of the same or lower level (only allow for
interrupts with higher priorities)
48
24
20-Mar-22
CONTROL: special register
1-bit stack definition
Set to one: use the process stack pointer (PSP)
Clear to zero: use the main stack pointer (MSP)
49
ARM Cortex-M3
50
25
20-Mar-22
ARM Cortex-M3
Introduced in 2004, the mainstream ARM processor developed specifically
with microcontroller applications in mind.
51
ARM Cortex-M3
Implement Thumb-2 instruction subset of ARM Instruction Set.
Most Thumb-2 instructions are 16-bit wide that are expanded internally to a full 32-bit
ARM instructions.
ARM CPUs are capable of performing multiple low-level operations in parallel.
A hardware sign extender convert 8-16 bit operands to 32-bit
Load store architecture.
Barrel shifter allows operand Rm to beshited first and then ALU can perform another
operation (e.g. add, subtract, mul etc.)
Barrel shifter can do 5X = X + 22X; -7X = X-23X.
MAC is memory address calculator for different addressing of arrays and repetitive address
calculations.
R0-R12 GPR, R13-R15 special purpose registers i.e. SP, PC and LR (that holds the return
address when a subroutine is called.
52
26
20-Mar-22
ARM Cortex-M3 - Architecture
32-bit microprocessor
32-bit data path
32-bit register bank
32-bit memory interface
Harvard architecture
3-stage pipeline
separate instruction bus and data bus
share the same memory space, difference length of code and data
Interrupts
1 to 240 physical interrupts, plus NMI
12 cycle interrupt latency
Instruction Set
Thumb (entire)
Thumb-2 (entire)
53
Processor Modes
The ARM has seven basic operating modes
Each mode has access to its own stack space and a different
subset of registers
Some operations can only be carried out in a privilegde mode
54
27
20-Mar-22
ARM Registers
55
Processor Register Set
Cortex-M3 core has 16 user-visible registers
All processing takes place in these registers
Three of these registers have dedicated functions
program counter (PC) - holds the address of the next instruction to execute
link register (LR) - holds the address from which the current procedure
was called
“the” stack pointer (SP) - holds the address of the current stack top (CM3
supports multiple execution modes, each with their own private stack
pointer).
56
28
20-Mar-22
The registers set
57
The Registers
ARM has 37 registers all of which are 32-bits long.
1 dedicated program counter
1 dedicated current program status register
5 dedicated saved program status registers
30 general purpose registers
The current processor mode governs which of several banks is accessible.
Each mode can access
a particular set of r0-r12 registers
a particular r13 (the stack pointer, sp) and r14 (the link register, lr)
the program counter, r15 (pc)
the current program status register, cpsr
58
29
20-Mar-22
Program Memory Model
RAM for an executing program is
divided into three regions
Data in RAM are allocated during the link
process and initialized by startup code at
reset
The (optional) heap is managed at runtime
by library code implementing functions
such as the malloc and free which are part
of the standard C library
The stack is managed at runtime by
compiler generated code which generates
per-procedure-call stack frames containing
local variables and saved registers
59
Cortex-M3 Memory Address Space
ARM Cortex-M3 processor has a
single 4 GB address space
The SRAM and Peripheral areas are
accessed through the System bus
The “Code” region is accessed through
the ICode (instructions) and DCode
(constant data) buses
60
30
20-Mar-22
ARM Cortex-M3 Bus
61
62
31
20-Mar-22
Bit Banding
Memory mapped I/O, 4GB memory address space organized in bytes.
4GB is very large for small embedded applications.
Bit-banding happens by taking advantage of this large memory space.
Uses two different regions of the address space to refer the same physical
data in the memory.
In primary bit-band region each address corresponds to single data byte.
In the bit-band alias each address corresponds to 1-bit of the same data.
It allows the access of a bit of data (read or write) by a single instruction.
LDR can load a single bit and STR can write a single bit of data.
63
Two bit band alias regions can be used to access individual status and
control bit of I/O devices or to implement a set of 1-bit Boolean flags that
can be used to implement a set of mutex objects.
Bit-band hardware does not allow interruption of read-modify write.
64
32
20-Mar-22
Bit banding
65
66
33
20-Mar-22
Bit banding
67
Bit Banding Example
68
34
20-Mar-22
ARM7: Programming Model
Word is 32 bits long.
Word can be divided into four 8-bit bytes.
ARM addresses can be 32 bits long.
Address refers to byte.
Address 4 starts at byte 4. 69
ARM Cortex Status Registers (xPSR)
70
35
20-Mar-22
PSR: Program Status Register
Divided into three bit fields
Application Program Status Register (APSR)
Interrupt Program Status Register (IPSR)
Execution Program Status Register (EPSR)
Q-bit is the sticky saturarion bit and supports two rarely used instructions
(SSAT and USAT) SSAT{cond} Rd, #sat, Rm{, shift}
EPSR holds the exception number is exception processing.
ICI/IT bits holds the state information of for IT block instructions or
instructions that are suspended during interrupt processing.
T bit is always 1 to indicate Thumb instructions.
71
Software Development Overview
The software development flow
72
36
20-Mar-22
At the file level the build process for Keil MDK is:
73
When using the GNU tool chain compilation and linking
are merged
74
37
20-Mar-22
A debugging process that we will follow with the limited
capability of the STM32F4 on-board emulator, is the use of a
UART
75
Software Development with MSP432 (ES-Lab)
76
38
20-Mar-22
Software Development (ES-Lab)
Software development is nowadays usually done with the support
of an IDE (Integrated Debugger and Editor / Integrated
Development Environment)
edit and build the code
debug and validate
77
78
39