ARM Processor Architecture Overview
ARM Processor Architecture Overview
1 Lecture Notes
UNIT II- ARM PROCESSOR AND PERIPHERALS
INTRODUCTION
This unit deals with the study of ARM Processor by studying the Architecture, instruction
sets and the peripherals interface. We will start with a brief introduction to the
terminology of computer architectures followed by detailed descriptions of the ARM9
and ARM Cortex M3 processors.
A reduced instruction set computer, or RISC, is a computer with a small, highly - optim
ized set of instructions, rather than the more specialized set often found in other types
of architecture, such as in a complex instruction set computer.
RISC Architecture is used in portable devices due to its power efficiency. For Example,
Apple iPod and Nintendo DS.
Version 1
26 bit addressing. No multiply or coprocessor
Version 2
32 bit result multiply co-processor
Version 3
32 bit addressing
Version 4
Add signed, unsigned half-word and signed byte load and store instructions.
Version 4T
16 bit Thumb compressed form of instructions
Version 5T
Superset of 4T adding new instructions
Version 5TE
Add signal processing extension
Version 5TEJ
Jaze e-DBX- provides acceleration for Java VM
Version 6
Added instructions for doing byte manipulations and graphics algorithms more
efficiently.
Version 7
Thumb 2 extension (with 32 bit instructions)
Jaze e-RCT (Runtime Compiler Target), provides support for interpreted
languages.
Architecture profiles
7A- Application profile
7R- Real Time
7M- Microcontroller
1. Application Profile (ARMv7-A)
Memory management support
Highest performance at low power
To run application/OS systems requirements
Trust Zone and Jaze e-RCT for safe, extensible system
e.g. Cortex-A5, Cortex-A9
Real Time Applications: Smart Phones, Digital TV, Servers &
Networking
2. Real-time profile (ARMv7-R)
Protected memory (MPU)
Low latency
Predictability ‘real-time’ needs
e.g. Cortex-R4
High-performance, real-time, safe, and cost-effective
Real Time Applications: Automobiles (ABS), Cameras, Disk
drive contro lers
ARM V8
• It adds a 64-bit architecture
ARM7TDMI
23
ARM 11
First
ARM1 ARMv1 ARM1 None
implementation
ARMv2 added
the MUL None 4 MIPS @ 8 MHz
ARMv2 ARM2 (multiply) 0.33 DMIPS/MHz
instruction
Integrated MEM
C (MMU), None, MEMC1a
ARM2
graphics and I/O 7 MIPS @12 MHz
processor.
ARMv2a ARM250 ARMv2a added
the SWP and
SWPB (swap)
instructions
ARM architecture has been developed since 1980s and most widely used 32-bit
instruction set architecture.
ARM has several processors that are grouped into number of families based on the
processor core they are implemented with. The architecture of ARM processors has
continued to evolve with every family. Some of the famous ARM Processor families
are ARM7, ARM9, ARM10 and ARM11. The following table 2.1 shows some of the
commonly found ARM Families along with their architectures.
ARM FAMILY ARCHITECTURE
ARM7TDMI ARMv4T
ARM9E ARMv5TE
ARM11 ARMv6
Cortex-M ARMv7-M
Cortex-R ARMv7R
Cortex-A (32-bit) ARMv7-A
Cortex-A (64-bit) ARMv8-A
M – Fast Multiplier
Older ARM Processors used a small and simple multiplier unit. This multiplier unit
required more clock cycles to complete a single multiplication. With the introduction of
Fast Multiplier unit, the clock cycles required for multiplication are significantly reduced
and modern ARM Processors are capable of calculating a 32-bit product in a single
cycle.
I – Embedded ICE
ARM Processors have on-chip debug hardware that allows the processor to set
breakpoints and watch points.
E – Enhanced Instructions
ARM Processors with this mode will support the extended DSP Instruction Set for high
performance DSP applications. With these extended DSP instructions, the DSP
performance of the ARM Processors can be increased without high clock frequencies.
J – Jazelle
ARM Processors with Jazelle Technology can be used in accelerated execution of Java
bytecodes. Jazelle DBX or Direct Bytecode execution is used in mobile phones and
other consumer dev ices for high performance Java execution without affecting memory
or battery.
The ARM core is considered as a functional unit connected by data buses where,
Arrow represents the flow of data
Line represents the buses
Boxes represents either an operation unit or storage area The functional units of
the ARM architecture are,
Priority encoder: The encoder is used in the multiple load and store instruction to
point which register within the register file to be loaded or kept.
Multiplexers: Several multiplexers are accustomed to the management operation
of the processor buses.
The ALU has two 32-bits inputs. The primary comes from the register file, whereas the
other comes from the shifter. Status registers flags modified by the ALU outputs. The
V-bit output goes to the V flag as well as the Count goes to the C flag. Whereas the
foremost significant bit really represents the S flag, the ALU output operation is done
by NOR ed to get the Z flag. The ALU has a 4-bit function bus that perm it’s up to 16
opcode to be implemented.
Booth Algorithm
Barrel Shifter
The barrel shifter features a 32-bit input to be shifted. This input is coming back
from the register file or it might be immediate data. The shifter has different
control inputs coming back from the instruction register. The Shift field within
the instruction controls the operation of the barrel shifter. This field indicates
the kind of shift to be performed (logical left or right, arithmetic right or
rotate right). The quantity by which the register ought to be shifted is contained
in an immediate field within the instruction or it might be the lower 6 bits of a
register within the register file.
The shift val input bus is 6-bits, permitting up to 32bit shift. The shift type
indicates the needed shift sort of 00, 01, 10, 11 are corresponding to shift left,
shift right, an arithmetic shift right and rotate right, respectively. The barrel
shifter is especia ly created with multiplexers.
Control Unit
For any microprocessor, control unit is the heart of the whole process and it is
responsible for the system operation, so the control unit design is the most important
part within the whole design. The control unit is sometimes a pure combinational circuit
design. Here, the control unit is implemented by easy state machine. The processor
timing is additionally included within the control unit. Signals from the control unit
are connected to each component within the processor to supervise its operation.
The first 16 registers are accessible in user-level mode, the additional registers are
available in privileged software execution. These 16 registers can be split into two
groups: general purpose and special purpose registers.
R0-R12: can be used during common operations to store temporary values, pointers
(locations to memory), etc. R0, for example, can be referred as accumulator during
the arithmetic operations or for storing the result of a previously called function. R7
becomes useful while working with sys calls as it stores the sys call number and R11
helps us to keep track of boundaries on the stack serving as the frame pointer.
Moreover, the function calling convention on ARM specifies that the first four
arguments of a function are stored in the registers r0-r3.
Register Alias Purpose
0 – General purpose
R1 – General purpose
R2 – General purpose
R3 – General purpose
R4 – General purpose
R5 – General purpose
R6 – General purpose
R8 – General purpose
R9 – General purpose
R13: SP (Stack Pointer): The Stack Pointer points to the top of the stack. The stack
is an area of memory used for function-specific storage, which is reclaimed when the
function returns.
R14: LR (Link Register): When a function call is made, the Link Register gets
updated with a memory address referencing the next instruction where the function
was initiated from. Doing this allows the program return to the “parent” function that
initiated the “child” function ca after the “child” function is finished.
R15: PC (Program Counter): The Program Counter is automatically incremented by
the size of the instruction executed. This size is always 4 bytes in ARM state and
2 bytes in THUMB mode. When a branch instruction is being executed, the PC holds
the destination address. During execution, PC stores the address of the current
instruction plus 8 (two ARM instructions) in ARM state, and the current instruction plus
4 (two Thumb instructions) in Thumb (v1) state. This is different from x86 where PC
always points to the next instruction to be executed.
1. Condition Bits
N
If this result is regarded as a two’s complement signed integer, then N = 1
If the result is negative and N = 0 if it is positive or zero.
Z
C
Is set in one of four ways:
For an addition, including the comparison instruction CMN, C is set to 1 if the
addition produced a carry and to 0 otherwise.
For a subtraction, including the comparison instruction CMP, C is set to 0 if the
subtraction produced a borrow (that is, an unsigned underflow), and to 1
otherwise.
For non-addition/subtractions that incorporate a shift operation, C is set to the
last bit shifted out of the value by the shifter.
V
Is set in one of two ways:
For an addition or subtraction, V is set to 1 if signed overflow
occurred, regarding the operands and result as two’s complement
signed integers.
For non-addition/subtractions, V is norma ly left unchanged.
2. Interrupt bit
I - Disables IRQ interrupts when it is set. F - Disables FIQ interrupts when it is set.
T - Thumb mode
4. Mode Bits
M [4:0] Mode
10000 User
10001 FIQ
10010 IRQ
10011 Supervisor
10111 Abort
11011 Undefined
11111 System
The execution state bits, endianness state and current processor state can be
accessed from the SPSR in any exception mode, using the MSR and MRS
instruction.
3. SVC Mode: The Supervisor mode is the software interrupt mode of the
processor to start up or reset.
4. Undefined Mode: The Undefined mode traps when illegal instructions are
executed. The ARM core consists of 32-bit data bus and faster data flow.
5. THUMB Mode: In THUMB mode 32-bit data is divided into 16-bits and increases
the processing speed.
Thumb EE: includes some changes and additions aimed for dynamically generated
code (code compiled on the device either shortly before or during execution).
{S} - An optional suffix. If S is specified, the condition flags are updated on the
result of the operation
While the MNEMONIC, S, Rd and Operand1 fields are straight forward, the condition
and Operand2 fields require a bit more clarification. The condition field is closely tied
to the CPSR register’s value, or to be precise, values of specific bits within the register.
Operand2 is called a flexible operand, because we can use it in various forms – as
immediate value (with limited set of values), register or register with a shift.
1. DATA INSTRUCTION
ADD r0,r1,r2
This instruction sets register r0 to the sum of the values stored in r1 and r2.
In addition to specifying registers as sources for operands, instructions may also
provide immediate operands, which encode a constant value directly in the instruction.
For example,
ADD r0,r1,#2
sets r0 to r1 + 2.
2. ARITHMETIC INSTRUCTION
The arithmetic operations perform addition and subtraction; the with-carry versions
include the current value of the carry bit in the computation.
RSB performs a subtraction with the order of the two operands reversed, so that
ADD Add
ADC Add with carry
SUB Subtract
SBC Subtract with carry
RSB Reverse subtract
RSC Reverse subtract with carry
MUL Multiply
MLA Multiply and accumulate
3. LOGICAL INSTRUCTION
The bit-wise logical operations perform logical AND, OR, and XOR operations (the
exclusive or is ca led EOR).
The BIC instruction stands for bit clear: BIC r0,r1,r2 sets r0 to r1 and not r2. This
instruction uses the second source operand as a mask: Where a bit in the mask is 1,
the corresponding bit in the first source operand is cleared. The MUL instruction
multiplies two values, but with some restrictions: No operand may be an immediate,
and the two source operands must be different registers.
4. SHIFT INSTRUCTIONS
The shift operations are not separate instructions rather, shifts can be applied to
arithmetic and logical instructions. The shift modifier is always applied to the second
source operand.
A left shift moves bits up toward the most-significant bits, while a right shift moves
bits down to the least-significant bit in the word.
The LSL and LSR modifiers perform left and right logical shifts, filling the least-
significant bits of the operand with zeroes.
The arithmetic shift left is equivalent to an LSL, but the ASR copies the sign bit, if the
sign is 0, a 0 is copied, while if the sign is 1, a 1 is copied.
The RRX modifier performs a 33-bit rotate, with the CPSR’s C bit being inserted above
the sign bit of the word; this allows the carry bit to be included in the rotation.
6. COMPARE INSTRUCTIONS
Comparison operands do not modify general purpose registers but only set
the values of the NZCV bits of the CPSR register.
The compare instruction CMP r0, r1 computes r0 – r1, sets the status bits,
and throws away the result of the subtraction.
CMP Compare
7. MOVE INSTRUCTION
The instruction MOV r0,r1 sets the value of r0 to the current value of r1.
The MVN instruction complements the operand bits (one’s complement) during the
move.
MOV Move
LDRB and STRB load and store bytes rather than whole words. LDRH and SDRH
operate on half-words.
An ARM address may be 32 bits long. The ARM load and store instructions do not
directly refer to main memory addresses, because a 32-bit address would not fit
into an instruction that included an opcode and operands. Instead, the ARM uses
register-indirect addressing. In register-indirect addressing, the value stored in the
register is used as the address to be fetched from memory; the result of thatfetch
is the desired operand value.
LDR Load
STR Store
LDRH Load half-word
STRH Store half-word
LDRSH Load half-word signed
LDRB Load byte
STRB Store byte
ADR Set register to address
1. SUBROUTINES
Large programs are hard to handle and so broken into smaller programs called as
subroutines.
A subroutine is a block of code that is called from different places from within a main
program or other subroutines.
Figure 2.13 ARM subroutine.
WRITING SUBROUTINES
When using subroutines, it is necessary to know the fo owing:
Where do we return to? (use RETURN)- When the subroutine is done, remember
to pop out the saved information so that it will be able to return to
the next instruction immediately after the ca ling point.
For instance,
BL foo /*BL- Branch and link instruction, foo is a subroutine/procedure name*/ will
perform a branch and link to the code starting at location.
The branch and link is much like a branch, except that before branching it stores the
current PC value in r14. Thus, to return from a procedure, simply move the value of
r14 (LR) to r15 (PC)
MOV r15,r14
When subroutines are nested, the contents of the link register must be saved on a
stack by the subroutine. Register R13, Stack Pointer is normally used as the pointer
for this stack But this mechanism only lets us ca procedures one level deep.
If, for example, we call a C function within another C function, the second function call
will overwrite r14, destroying the return address for the first function call. The standard
procedure for allowing nested procedure calls (including recursive procedure calls) is
to build a stack, as illustrated in Figure 2.14. The C code shows a series of functions
that call other functions: f1() calls f2(), which in turn calls f3(). The right side of the
figure shows the state of the procedure call stack during the execution of f3(). The
stack contains one activation record for each active procedure. When f3() finishes, it
can pop the top of the stack to get its return address, leaving the return address for
f2() waiting at the top of the stack for its return.
To pass parameters into a procedure, the values can be pushed onto the stack just
before the procedure call. Once the procedure returns, those values must be popped
off the stack by the caller, because they may hide a return address or other useful
information on the stack.
A procedure may also need to save register values for registers it modifies. The
registers can be pushed onto the stack upon entry to the procedure and popped off
the stack, restoring the previous values, before returning.
Procedure stacks are typica ly built to grow down from high addresses.
Assembly language programmers can use any means they want to pass parameters.
Compilers use standard mechanisms to ensure that any function may ca any other.
The compiler passes parameters and return variables in a block of memory known as
a frame. The frame is also used to a locate local variables. The stack elements are
frames.
A stack pointer (sp) defines the end of the current frame, while a frame pointer (fp)
defines the end of the last frame. (The fp is technically necessary only if the stack
frame can be grown by the procedure during execution).
The procedure can refer to an element in the frame by addressing relative to sp.
When
a new procedure is called, the sp and fp are modified to push another frame onto
the stack.
The ARM Procedure Call Standard (APCS) is a good illustration of a typical procedure
linkage mechanism. Although the stack frames are in main memory, understanding
how registers are used is key to understanding the mechanism, as explained below.
• r0-r3 are used to pass the first four parameters into the procedure. r0 is also used
to hold the return value. If more than four parameters are required, they are put on
the stack frame.
int y;
y = x+1;
}
void f1(int a) { f2(a);
}
This function has only one parameter, so x will be passed in r0. The variable y is local
to the procedure so it is put into the stack. The first part of the procedure sets up
registers to manipulate the stack, then the procedure body is implemented.
2.4.2 STACK
The stack is a data structure, known as last in first out (LIFO). In a stack, items entered
at one end and leave in the reversed order.
Stacks in microprocessors are implemented by using register called the stack pointer,
similar to the program counter (PC), to keep track of available stack locations. As items
are added to the stack (pushed), the stack pointer is moving up, and as items are
removed from the stack (pu ed or popped), the stack pointer is moved down.
PUSH: Increment the memory address in the stack pointer (by one) and stores the
contents of the Program counter on the top of the stack
POP: Discards the address of the top of the stack and decrement the stack pointer by
one
STACK TYPES
ARM stacks are very flexible since the implementation is completely left to the software.
Stack pointer is a register that points to the top of the stack. Normally, there are four
different stack implementations depending on which way the stack grows.
1. Ascending stack
An Ascending stack grows upwards. It starts from a low memory address and, as
items are pushed onto it, progresses to higher memory addresses.
2. Descending stack
A Descending stack grows downwards. It starts from a high memory address, and as
items are pushed onto it, progresses to lower memory addresses. The previous
examples have been of a Descending stack.
3. Empty stack
In an Empty stack, the stack pointers points to the next free (empty) location on the
stack, i.e. the place where the next item to be pushed onto the stack will be stored.
4. Full stack
In a Full stack, the stack pointer points to the topmost item in the stack, i.e. the
location of the last item to be pushed onto the stack.
It is advised by many embedded system developers that LPC214X Series is the best
processor to begin ARM based application development. LPC214X Series includes
LPC2141/42/44/46/48 We wi be dealing with LPC2148 Processor.
2.5.1 LPC2148 PROCESSOR
The ARM7 is a 32-bit general-purpose microprocessor, and it offers some of the features
like little power utilization, and high performance. The architecture of an ARM is
depended on the principles of RISC. The associated decode mechanism, as well as the
RISC- instructions set are much easy when we compare with microprogrammed CISC-
Complex Instruction Set Computers.
The Pipeline method is used for processing all the blocks in architecture. In general,
a single instruction set is being performed, then its descendant is being translated, &
a 3rd-instruction is being obtained from the memory.
The LPC2148 is a 16 bit or 32 bit ARM7 family based microcontro ler available in a
Smal LQFP64 package.
ISP (in system programming) or IAP (in application programming) using on-chip
boot loader software.
On-chip static RAM is 8 kB-40 kB, on-chip flash memory is 32 kB-512 kB, the wide
interface is 128 bit, or accelerator a lows 60 MHz high-speed operation.
It takes 400 mi liseconds time for erasing the data in fu chip and 1 mi second
time for 256 bytes of programming.
Embedded Trace interfaces and Embedded ICE RT offers real-time debugging with
high-speed tracing of instruction execution and on-chip Real Monitor software.
It has 2 kB of endpoint RAM and USB 2.0 fu speed device contro ler.
Furthermore, this microcontro er offers 8kB on-chip RAM nearby to USB with
DMA.
One or two 10-bit ADCs offer 6 or 14 analog i/p s with low conversion time as 2.44
μs/ channel.
Low power RTC (real time clock) & 32 kHz clock input.
Several serial interfaces like two 16C550 UARTs, two I2C-buses with 400 kbit/s
speed.
The incorporated osci lator on the chip wi work by an exterior crystal that ranges
from 1 MHz-25 MHz
The modes for power-conserving mainly comprise idle & power down.
For extra power optimization, there are individual enable or disable of peripheral
functions and peripheral CLK scaling.
2.5.3 ARCHITECTURE BLOCK DIAGRAM OF LPC2148
LPC 2148 microcontroller consist of three buses such as ARM7 Local bus, AHB
(Advanced high performance bus) and VPB bus etc. these buses are used for
performing different function and these are also consisting of different functioning
parts such as,
FLASH Memory System: The LPC2148 has 512kB flash memory. This memory may
be used for both code and data storage. The flash memory can be programmed by
various ways
Every peripheral dev ice consists of a single interrupt line allied to the VIC (vector
interrupt controller. All input requests are received by vectored interrupt controller
(VIC) and it converts them into fast interrupt request (FIQ). So, fast interrupt request
and non-fast interrupt requests are defined by programming setting in vectored
interrupt contro ler.
This block perm its chosen pins of the ARM7 based LPC2148 microcontroller for having
several functions. The multiplexers can be controlled by the configuration registers for
allowing the link between the pin as well as on-chip peripherals. Peripherals must be
coupled with the suitable pins previous to being triggered, and previous to any
connected interrupts being permitted. The microcontroller functionality can be defined
by the pin control module by its pin selection of registers in a given hardware
environment. After rearranging all pins of ports (port 0 & port 1) are arranged as i/p
by the given exceptions. If debug is allowed, the pinsof the JTAG will guess the
functionality of JTAG. If a trace is allowed, then the Trace pins will guess the
functionality of trace. The pins connected to the I2C0 and I2C1 pins are open drain.
PERIPHERALS
GPIO (General Purpose Input Output)
ARM based LPC2148 microcontroller has 45 general purpose input output pins. The
operating voltage of these input output pins is 5 volts.
GPIO registers control the dev ice pins which are not linked to a particular peripheral
function. The device pins can be arranged as i/p or o/p. Individual registers allow for
clearing any number of o/p’s concurrently. The output register value can be read back,
& the present condition of the port pins.
LPC2148 has two IO ports each of 32-bit wide, provided by 64 IO pins. Ports are named
as P0 and P1. Pins of each port labelled as Px.y where “x” stands for port number, 0
or 1. Where “y” stands for pin number usually between 0 to 31. Each pin can perform
multiple functions. For example: Pin no.1 which is P0.21 serves as GPIO as well as
PWM5, AD1.6 (A/D converter1, input 6), CAP1.3 (Capture input for Timer1, Channel
3).
This LPC2148 microcontroller has one 10 bit digital to analog converter (DAC). This
converter converts the digital input into analog output. The maximum DAC output
voltages are called VREF voltages. Power down mode and buffered output is also
available in this digital to analog converter.
The microcontrollers like LPC2144/46/48 include two ADC converters ADC0 and ADC1,
and are only 10-bit straight approximation ADC’s. Although ADC0 includes 6- channels
and ADC1 has 8-channels.
This LPC2148 microcontroller has one 10 bit digital to analog converter (DAC). This
converter converts the digital input into analog output. The maximum DAC output
voltages are called VREF voltages. Power down mode and buffered output is also
available in this digital to analog converter.
The bus supports unplugging hot plugging and dynamic collection of the devices. Every
communication is started through the host-controller. These microcontrollers are
designed with a universal serial bus apparatus controller that allows 12 Mbit/sec data
replaced by a host contro ler of USB.
UARTs
LPC2148 include two UARTs whose name are UART 0 and UART 01 for standard
transmit & get data-lines. This LPC2148 microcontroller contains two UART whose name
are UART 0 and UART 01. These UARTs are provided the full mode control handshake
interface during transmitting or receiving the data lines. These are used 16 Byte data
rate during transmitting or receiving the data. For covering wide range of baud rate,
they also contain the built-in functional baud rate generator, therefore there is no need
of any external crystal of specific value.
Serial I/O Controller of I2C-bus
LPC2148 includes two I2C bus controllers, and this is bidirectional. The inter-IC control
can be done with the help of two wires namely an SCL and SDA. Here the SDA & SCL
are serial clock line and the serial data line.
Simply a master & a slave can converse over the interface throughout specified data
transmission. During this, the master constantly transmits a byte-of-data toward the
slave, as we as the slave constantly transmits data toward the master.
These microcontrollers contain single SSP, and this controller is capable of process
on an SPI, Microwire bus or 4-wire SSI. It can communicate with the bus of several
masters as we as slaves
But, simply a particular master, as well as slave, can converse on the bus throughout
a specified data transmit. This microcontroller supports full-duplex transfers, by 4-16
bits data frames used for the flow of data from the master- the slave as well as from
the slave-the master.
Timers/Counters
Timers and counters are designed for counting the PCLK (peripheral clock) cycles &
optiona ly produce interrupts based on 4-match registers.
This LPC2148 microcontroller has two timers or counters. These timers are 32 bitand
are programmable with 32bit pre scaler value as well as it also has one externa l event
counter. Each timer has four 32bit capture channels which take the snapshot of timer
value during the transition of any input signal. With the help of this capture event the
interruption could be also generate.
Watchdog Timer
RTC-Real-time Clock
The RTC in LPC2148 is intended for providing counters to calculate the time when the
idle or normal operating method is chosen. The RTC uses a small amount of power
and designed for appropriate battery power-driven arrangements where the central
processing unit is not functioning constantly.
Power Control
In power down mode, the oscillator is deactivated and the IC gets no inner clocks.
The peripheral registers, processor condition with registers, inner SRAM values are
conserved during Power-down mode & the chip logic levels output pins stay fixed.
This mode can be finished and the common process restarted by specific interrupts
that are capable to work without clocks. Because the chip operation is balanced,
Power-down mode decreases chip power utilization to almost zero.
PWM -Pulse Width Modulator
The PWMs are based on the normal timer-block & also come into all the features,
though simply the pulse width modulator function is fixed out on the microcontrollers
like LPC2141/42/44/46/48.
The timer is intended to calculate PCLK (peripheral clock) cycles & optionally produce
interrupts when particular timer values arise based on 7-match registers, and PWM
function also depends on match register events.
The capability of individually control increasing & decreasing boundary positions allows
the pulse width modulation to be utilized for several applications. For example, the
typical motor control with multi-phase uses 3-non-overlapping outputs of PWM by
separate control of every pulse widths as we as positions.
VPB Bus
The VPB divider resolves the association between the CCLK (processor clock) and the
PCLK (clock used by peripheral dev ices). This divider is used for two purposes. The first
use is to supply peripherals by the preferred PCLK using VPB bus so that they can work
at the selected speed of the ARM processor. In order to accomplish this, this bus
speed can be reduced the clock rate of the processor from 1⁄ 2 -1⁄ 4.
Because this bus must work accurately at power-up, and the default state at RST
(reset) is for the bus to work at 1⁄ 4th of the processor clock rate. The second use of
this is to perm it power savings whenever an application doesn’t need any peripherals
to work at the complete processor rate. Since the VPB-divider is associated with the
output of PLL, this remains active throughout an idle mode.
= 1 / 1000
= 1mi iSecond
i.e. 1000 clock counts provide a time interval of 1 second, and hence we can prov ide1
second delay with these 1000 clock counts.
Now, once we have the time period of one clock, we can use this time period to
generate delays that are integer multiples of it. We can also use the time period to
measure the time interval between specific events of a received signal.
COUNTER
Counter is the unit which is similar to Timers but works in a reverse manner to the
timers. It counts the external events or we can say external clock ticks. It is mostly
used to measure frequency from the counts of clock ticks.
e.g. Let’s say Counter is measuring counts of external clock ticks, and frequently its
count reaches 2000 in one second i.e. 2000 clock ticks/second.
Then, we can calculate external clock frequency as, External clock frequency =
= 2000 / 1
= 2 kHz
There are many applications for which we can use these timers and counters in real
world.
LPC TIMER/COUNTER
LPC2148 has two 32-bit timers/counters: Timer0/Counter0 & Timer1/Counter1.
LPC2148 has match registers that contain count value which is continuously
compared with the value of the Timer register. When the value in the Timer
register matches the value in the match register, specific action (timer reset,
or timer stop, or generate an interrupt) is taken.
Also, LPC2148 has capture registers which can be used to capture the timer
value on a specific external event on capture pins
TIMER 0 REGISTERS
Writing a 1 to any bit of this register wi reset that interrupt. Writing a 0 has
no effect.
When in counter mode, it is used to select the pin and edges for
counting.
Figure 2.20 T0CTCR (Timer0 Counter Control Register)
Note: When TC overflow occurs, it does not generate any overflow interrupt.
Alternatively, we can use match register to detect overflow event if needed.
5. T0PR (Timer0 Prescale Register)
It is a 32-bit register.
MR1, MR2 and MR3 bits function in the same manner as MR0 bits.
A period of a pulse consists of an ON cycle (HIGH) and an OFF cycle (LOW). The
fraction for which the signal is ON over a period is known as duty cycle.
E.g. Consider a pulse with a period of 10ms which remains ON (high) for [Link]
duty cycle of this pulse will be
Through PWM technique, we can control the power delivered to the load by using
ON- OFF signal.
Pulse Width Modulated signals with different duty cycle are shown in figure 2.22.
LPC2148 has PWM peripheral through which we can generate multiple PWM signals
on PWM pins. Also, LPC2148 supports two types of contro led PWM outputs as,
Single Edge Controlled PWM: All the rising (positive going) edges of the output
waveform are positioned/fixed at the beginning of the PWM period. Only falling
(negative going) edge position can be contro led to vary the pulse width of PWM.
Double Edge Controlled PWM: All the rising (positive going) and falling (negative
going) edge positions can be controlled to vary the pulse width of PWM. Both the rising
as we as the fa ling edges can be positioned anywhere in the PWM period.
Also, we can scale this timer clock counts using 32-bit PWM Prescale Register
(PWMPR).
Remaining 6 match registers are used to set PWM width for 6 different PWM
signals in Single Edge Controlled PWM or 3 different PWM signals in Double
Edge Contro led PWM.
Whenever PWM Timer Counter (PWMTC) matches with these Match Registers
then, PWM Timer Counter resets, or stops, or generates match interrupt,
depending upon settings in PWM Match Control Register (PWMMCR).
PWM2 & PWM3 are configured as Single Edge Controlled PWM and PWM5 is
configured as Double Edge Contro led PWM.
Prescaler is set to increment PWM Timer Counter after every two Peripheral
lclocks (PCLK).
Match registers (PWMMR2 & PWMMR3) are used to set falling edge position for
PWM2 & PWM3.
PWMMR4 & PWMMR5 are used to set rising & falling edge positions
respectively for PWM5.
Figure 2.24 LPC2148 PWM signal
The table 2.11 given below shows when the PWM is Set (Rising Edge) and Reset
(Fa ing Edge) for different PWM channels using 7 Match Register.
PWM
Single Edge Controlled Double Edge Controlled
Channel
Set by Reset by Set by Reset by
1 Match 0 Match 1 Match 0 Match 1
2 Match 0 Match 2 Match 1 Match 2
3 Match 0 Match 3 Match 2 Match 3
4 Match 0 Match 4 Match 3 Match 4
5 Match 0 Match 5 Match 4 Match 5
6 Match 0 Match 6 Match 5 Match 6
Table 2.11 PWM set and reset for different PWM channels
P0.8/TXD1/PWM4/AD1.1
P0.21/PWM5/AD1.6/CAP1.3
P0.9/RXD1/PWM6/EINT3
It is a 16-bit register.
It is an 8-bit register.
It is a 32-bit register.
It is a 32-bit register.
It is a 32-bit register.
When it reaches the value in PWM Prescale Register, the PWM Timer Counter
is incremented and PWM Prescale Counter is reset on next PCLK.
The values stored in these registers are continuously compared with the PWM
Timer Counter value.
When the two values are equal, the timer can be reset or stop or an interrupt
may be generated.
It is a 32-bit register.
It is a 16-bit register.
It is used to enable and select each type of PWM.
Bit 2 – PWMSEL2
0 = Single edge contro led mode for PWM2 1 = Double edge contro led
mode for PWM2
All other PWMSEL bits have similar operation as PWMSEL2 above.
Bit 10 – PWMENA2
0 = PWM2 output disabled 1 = PWM2 output enabled
It is an 8-bit register.
It is used to control the update of the PWM Match Registers when they are
used for PWM generation.
When a value is written to a PWM Match Register while the timer is in PWM
mode, the value is held in the shadow register. The contents of the shadow
register are transferred to the PWM Match Register when the timer resets
(PWM Match 0 event occurs) and if the corresponding bit in PWMLER is set.
Load PWMMR0 with a value corresponding to the time period of your PWM wave
Load any one of the remaining six match registers (two of the remaining six
match registers for double edge controlled PWM) with the ON duration of the
PWM cycle. (PWM will be generated on PWM pin corresponding to the match
register you load the value with).
Load PWMMCR with a value based on the action to be taken in the event of a
match between match register and PWM timer counter.
Enable PWM match latch for the match registers used with the help of
PWMLER
Select the type of PWM wave (single edge or double edge controlled) and which
PWMs to be enabled using PWMPCR
UART serial communication protocol uses a defined frame structure for their data
bytes. Frame structure in Asynchronous communication consists:
START bit: It is a bit with which indicates that serial communication has
started and it is always low.
STOP bit: This usua ly is one or two bits in length. It is sent after data
bitspacket to indicate the end of frame. Stop bit is always logic high.
LPC2148 UART
LPC2148 has two inbuilt UARTs available i.e. UART0&UART1. So, we can connect two
UART enabled dev ices (GSM module, GPS module, Bluetooth module etc.) with
LPC2148 at a time.
UART0 and UART1 are identical other than the fact that UART1 has modem interface
included.
FEATURES OF UART0
FEATURES OF UART1
Standard modem interface signals included with flow control (auto-CTS/RTS) fully
supported in hardware
UART0:
UART1:
7. DCD1 (Input pin): Data Carrier Detect signal pin. Active low signal indicates if
the external modem has established a communication link with the UART1
and data may be exchanged.
8. RI1 (Input pin): Ring Indicator signal pin. Active low signal indicates that a
telephone ringing signal has been detected by the modem.
UART0 REGISTERS
UART1 can be used in a similar way by using the corresponding registers for UART1.
MULVAL and DIVADDVAL should have values in the range of 0 to 15. If this is
not ensured, the output of the fractional divider is undefined.
The value of the U0FDR should not be modified while transmitting /receiving
data. This may result in corruption of data.
It provides a status code that denotes the priority and source of a pending
interrupt.
It must be read before exiting the Interrupt Service Routine to clear the
interrupt.
Bit 0 - Interrupt Pending
0 = At least one interrupt is pending 1 = No interrupts pending
Bit 7 : TXEN
0 = Transmission disabled 1 = Transmission enabled
If this bit is cleared to 0 while a character is being sent, the transmission of
that character is completed, but no further characters are sent until this bit is
set again.
PROGRAMMING OF UART0
1. Initialization of UART0
Using U0LCR register, make DLAB = 1. Also, select 8-bit character length and
1 stop bit.
Set appropriate values in U0DLL and U0DLM depending on the PCLK value
and the baud rate desired. Fractional divider can also be used to get different
values of baud rate.
Example, PCLK = 15MHz. For baud rate 9600, without using fractional
divider register, from the baud rate formula, we have,
15000000 𝑀𝑢𝑙𝑉𝑎𝑙
9600 = ∗
16∗(256 ∗ 𝑈0𝐷𝐿𝑀 + 𝑈0𝐷𝐿𝐿𝐿) 𝑀𝑢𝑙𝑉𝑎𝑙+𝐷𝑖𝑣𝐴𝑑𝑑𝑉𝑎𝑙
On reset, MulVal = 1 and DivAddVal = 0 in the Fractional Divider Register.
We can consider it to be 98 or 97. It will make the baud rate slightly less or
more than 9600. This sma change is tolerable. We will consider 97. Since 97
is less than 256 and register values cannot contain fractions, we will take
U0DLM = 0. This wi give U0DLM = 97.
2. Receiving character
Monitor the RDR bit in U0LSR register to see if valid data is available in
U0RBR register.
3. Transmitting character
Monitor the THRE bit in U0LSR register. When this bit becomes 1, it indicates
that U0THR register is empty and the transmission is completed.
int main(void)
{
char receive; UART0_init(); while(1)
{
receive = UART0_RxChar();
UART0_SendString("Received:");
UART0_TxChar(receive);
UART0_SendString("\r\n");
This family enables single processor solution for microcontroller, DSP & JAVA
applications, offering savings in chip area & complex ity, power consumption & time to
market.
ARM9 – enhanced processors are we suited for applications requiring a mix
of DSP+ Microcontro ler performance
• MIPS/MHz: 1.1
• MMU/MPU: Present
MMU:
MPU:
Applications of ARM9
1. Consumer type: Smart phones, PDA, Set-Top box, Electronics Toys, Digital
Cameras, etc.
4. Embedded USB contro lers, Bluetooth contro lers, Medical scanners, etc.
1. ARM920T PROCESSOR
ARM9TDMI (core)
The ARM9TDMI processor core is a Harvard architecture dev ice implemented using a
five-stage pipeline consisting of Fetch, Decode, Execute, Memory, and Write stages. It
can be provided as a standalone core that can be embedded into more complex
devices. The standalone core has a simple bus interface that allows you to design your
own caches and memory systems around it.
The ARM9TDMI family of microprocessors supports both the 32-bit ARM and 16-bit
Thumb instruction sets, allowing you to trade-off between high performance and high
code density.
The ARM920T processor supports the ARM debug architecture and includes logic to
assist in both hardware and software debug. The ARM920T processor also includes
support for coprocessors, exporting the instruction and data buses along with simple
handshaking signals.
The ARM920T interface to the rest of the system is over unified address and data
buses. This interface enables implementation of either an Advanced Microcontroller
Bus Architecture (AMBA), Advanced System Bus (ASB) or Advanced High- performance
Bus (AHB) bus scheme either as a fully -compliant AMBA bus master, or as a slave for
production test. The ARM920T processor also has a Tracking ICE mode which a ows
an approach similar to a conventional ICE mode of operation.
The ARM920T processor incorporates the ARM9TDMI integer core, which implements
the ARM architecture v4T. It executes the ARM and Thumb instruction sets, and
includes Embedded ICE JTAG software debug features.
The ARM920T processor also features an external coprocessor interface that allows
the attachment of a closely-coupled coprocessor on the same chip, for example, a
floating-point unit. Registers and operations provided by any coprocessors attached
to the external coprocessor interface can be accessed using appropriate
coprocessor instructions.
Memory accesses for instruction fetches and data loads and stores can be cached
or buffered.
The MMU page tables that reside in main memory describe the virtual to physical
address mapping, access permissions, and cache and write buffer configuration.
These are created by the operating system software and accessed automatically
by the ARM920T MMU hardware whenever an access causes a TLB miss.
The ARM920T has a Trace Interface Port that allows the use of Trace hardware and
tools for real-time tracing of instructions and data.
The ARM9TDMI processor core implements ARM architecture v4T, and executes the
ARM 32-bit instruction set and the compressed Thumb 16-bit instruction set.
Base updated
ARM7TDMI ARMv4T Address of instruction + 12
Base restored
ARM9TDMI ARMv4T Address of instruction + 12
The ARM9TDMI core implements the base restored Data Abort model. This
significantly simplifies the software Data Abort handler.
The base restored Data Abort model differs from the base updated Data Abort model
implemented by ARM7TDMI.
The difference in the Data Abort models affects only a very sma section of
operating system code, the Data Abort handler. It does not affect user code. With
the base restored Data Abort model, when a Data Abort exception occurs during the
execution of a memory access instruction, the base register is always restored by
the processor hardware to the value the register contained before the instruction
was executed. This removes the requirement for the Data Abort handler to unwind
any base register update that might have been specified by the aborted
instruction.
All ARM processors implement the undefined instruction space as one of the entry
mechanisms for the undefined instruction exception.
ARMv4 and ARMv4T also introduce a number of instruction set extension spaces to
the ARM instruction set. These are:
standard ARMv4 MMU mapping sizes, domains, and access protection scheme
mapping sizes are 1MB (sections), 64KB (large pages), 4KB (sma pages), and
1KB (tiny pages)
access permissions for large pages and sma pages can be specified separately
for each quarter of the page (these quarters are ca led subpages)
Independent lockdown of instruction TLB and data TLB, using CP15 register 10.
Architecture Armv7-M
3x AMBA AHB-Lite interface (Harvard bus architecture)
Bus Interface AMBA ATB interface for Core Sight debug components
ISA Support Thumb/Thumb-2 subset
Pipeline Three-stage
Optional 8 region MPU with sub regions and background
Memory Protection region
Integrated Bit-field Processing Instructions and Bus Level
Bit Manipulation Bit Banding
Non-maskable Interrupt (NMI) + 1 to 240 physical
Interrupts interrupts
Interrupt Priority
Levels 8 to 256 priority levels
Wake-up Interrupt
Controller Optional
32-bit Cortex-M3 designed for low power operation, enabling longer battery life,
especially critical in portable products including wireless networking applications
Cortex-M3 supports 16- and 32-bit instructions available in the Thumb-2 instruction
set. Both can be mixed without extra complexity and without reducing the Cortex-M3
performance. Hardware div ide instructions and a number of multiply instructions give
EFM32 users high data-crunching throughput.
The ARM Cortex-M3 3-stage pipeline includes instruction fetch, instruction decode and
instruction execution. Cortex-M3 also has separate buses for instructions and data.
The Harvard architecture reduces bottlenecks common to shared data- and instruction
buses. Quickly Servicing Critical Tasks and Interrupts. From the low energy modes,
114
EFM32's Cortex-M3 is active within 2 µs and delivers 1.25 DMIPS/MHz on the
Dhrystone 2.1 Benchmark.
The NVIC (Nested Vectored Interrupt Controller) is an integral part of the Cortex-M3
processor and ensures outstanding interrupt handling abilities. It is possible configure
up to 240 physical interrupts with 1-256 levels of priority, and Non- Maskable
Interrupts further increase interrupt handling. For embedded systems this enhanced
determinism makes it possible to handle critical tasks in a known number of cycles.
The Cortex-M3 has a small footprint which reduces system cost. High 32-bit
performance reduces an application's active periods, the periods where the CPU is
handling data. Reducing the active periods increases the application's battery lifetime
significantly, and the EFM32 can spend most of the time in the efficient low energy
modes.
The ARM Cortex-M3 processor has been designed 'from the ground up' to provide
Optimal performance and power consumption within a minimal memory system.
To achieve this the core executes only the Thumb-2 instruction set.
The design is based on a 3-stage pipeline Harvard architecture that maximizes memory
utilization through the support of unaligned date storage, and single cycle atomic bit
manipulation.
Figure 2.42 Cortex M3 architecture
The highly revised architecture implements hardware div ide and single-cycle multiply.
The ARM Cortex-M3 uses 33k gates for the processing core and 60k gates total,
including many closed system peripherals.
The ARM Cortex-M3 processor reduces the number of pins required for debug from
five to one, by implementing a Single Wire Debug.
For system trace, the processor integrates an optional ETM alongside data watch points
that can be configured to trigger on specific system events.
To enable simple and cost-effective profiling of these system events a SWV (Serial Wire
Viewer) can export streams of standard ACSII data through a single pin.
Flash Patch technology offers device and system developers the ability to patch errors
in code from ROM to SRAM or Flash during both debug and run-time.
The Cortex-M3 processor integrates the core with a configurable interrupt controller
to improve interrupt processing performance. In its standard implementation the NVIC
(Nested Vectored Interrupt Controller) supplies a NMI (Non-Maskable Interrupt) plus
32 general purpose physical interrupts with 8 levels of pre-emption priority, however
through simple synthesis choices the controller can be configured down to a single
physical interrupt or up to 244.
This means that no assembler stubs are required to handle the movement of registers.
Moving between active and pending interrupts has been simplified through the use
of Tail-Chaining technology to replace serial stack Pop and Push actions that normally
take over 30 clock cycles with a simple six cycle instruction fetch.
To enhance low power designs the NVIC integrates three sleep modes, including a
Deep Sleep function that may be exported to other system components to enable the
entire device to be rapidly powered down.
The ARM Cortex-M3 processor has two optional components, the MPU (Memory
Protection Unit) and the ETM (Embedded Trace Macrocell). The fine grain MPU design
enables applications to implement security privilege levels, separating code, data and
stack on a task-by-task basis.