Embedded Computing System
• Any device that includes a programmable computer but is not itself intended to be a general-purpose computer.
• A microprocessor is a single-chip CPU. VLSI has allowed us to put a complete CPU on a single chip.
• An 8-bit microcontroller is designed for low-cost applications and includes on-board memory and I/O devices.
E.g.: Intel MCS 51 series like 8051.
• A 16-bit microcontroller is used for longer word lengths or off-chip I/O and memory. E.g.: Intel MCS 96 series
like P8096.
• A 32-bit RISC microprocessor offers very high performance for computation-intensive applications.
Now what is RISC microprocessor and CISC microprocessor.
RISC (Reduced Instruction Set Computer) CISC (Complex Instruction Set Computer)
Have few simple instructions. Have large number of instructions.
Faster operation compared to CISC at a cost of Helps in flexible programming with short and
lower hardware requirement. effective programming.
E.g.: Microchip PIC series like PIC16F877A. E.g.: Intel MCS 51 series like 8051.
1
Major Computer Architecture
HARVARD ARCHITECTURE VON NEUMAN ARCHITECTURE
Picture Courtesy: Google Images
2
EMBEDDED SYSTEM DESIGN PROCESS
A design methodology is important for three reasons:
• optimizing performance or performing functional tests.
• to develop computer-aided design tools.
• makes it much easier for members of a design team to communicate.
During design we need to follow five steps:
• Requirements
• Specifications
• Architecture
• Components
• System Integration
3
EMBEDDED SYSTEM DESIGN PROCESS
Requirements: Specifications:
• Requirements may be functional or • Specification is more precise—it serves as the
nonfunctional. contract between the customer and the architects.
• Way to handle user interface portion of a system’s • It accurately reflects the customer’s requirements.
requirements is to build a mock-up.
• To creating working systems with a minimum of
• Mock-up may use canned data to simulate designer effort.
functionality in a restricted demonstration.
• Specification should be understandable enough so
• Physical & nonfunctional models of devices can that someone can verify.
also give customers a better idea of characteristics
• Designers can run into several different types of
such as size and weight.
problems caused by unclear specifications.
• Requirements analysis for big systems can be
complex and time consuming.
4
EMBEDDED SYSTEM DESIGN PROCESS
Architecture Design: Components and System Integration:
• The architecture is a plan for the overall structure • Component design effort builds components in
of the system that will be used later to design the conformance to the architecture and specification.
components that make up the architecture.
• Some of the components will be ready-made and
• Architectural descriptions must be designed to some needs to be fabricate.
satisfy both functional and nonfunctional
• Creating embedded software modules, expertise is
requirements.
needed to ensure that the system runs.
• when creating the hardware and software
• System integration is difficult because it usually
architectures, one may pay attention to functional
uncovers problems.
requirements first and then to nonfunctional one.
• Careful attention to inserting appropriate
• Estimation of all nonfunctional constraints during
debugging facilities is needed.
the architecture phase are crucial.
5
ARM (Advanced RISC Machines) Processors
• ARM is a family of RISC architectures.
• ARM does not manufacture its own VLSI devices; rather, it licenses its architecture to companies who either
manufacture the CPU itself or integrate the ARM processor into a larger system.
• Different versions of the ARM architecture are identified by different numbers. ARM7 is a Von Neumann
architecture machine, while ARM9 uses a Harvard architecture.
• However, this difference is invisible to the assembly language programmer, except for possible performance
differences.
• The ARM architecture supports two basic types of data: Standard ARM word is 32 bits long while word may be
divided into four 8-bit bytes.
• ARM7 allows addresses up to 32 bits long.
• The ARM processor can be configured at power-up to address the bytes in a word in either little-endian mode or
big-endian mode.
6
ARM (Advanced RISC Machines) Processors (contd.)
• ARM processors used as both high-performance microprocessors (in smartphones/laptops) and low power
embedded microcontrollers.
• ARM processors as microprocessor in smartphones/laptops.
• ARM processors as microcontroller in embedded system, IoT devices and automation.
• ARM Cortex M-series are widely used as 32-bit microcontroller.
• While ARM Cortex A-series are used as microprocessors in complex operating system like Android.
• ARM processors can be found in Apple Silicon laptops or Qualcomm Snapdragon Elite laptops.
• Intel i7 or i9 are high performance processors and NOT controllers.
• Intel uses own x86 (for 32 bit) and x64 (for 64 bit) CPU architecture.
• Intel does not follow ARM processor architecture, while several big companies like Apple, Qualcomm, Amazon
(AWS), Samsung and Nvidia follows ARM processor architectures.
7
Data Operations in ARM Processors
• ARM is a load-store architecture.
• Data operands must first be loaded into the CPU and then stored back to main memory.
• Therefore, arithmetic and logical operations cannot be performed directly on memory locations.
• For this program counter should of course not be overwritten for use in data operations.
• Another basic register in the programming model is the current program status register (CPSR).
• This register is set automatically during every arithmetic, logical, or shifting operation.
• Top four bits of the CPSR hold the following useful information about the results of that arithmetic/logical
operation such as; Negative, Zero , Carry and Overflow.
• Here addressing mode has two other variations: auto-indexing and post-indexing.
• Auto-indexing updates the base register, whereas Post-indexing does not perform the offset calculation until after
the fetch has been performed.
LDR ro.[r1.#20]! (Auto-indexing example) LDR ro.[r1].#20 (Post-indexing example)
8
Instructions in ARM Processors
Some other instructions (Compare and Data transfer)
9
TI C55x (DSP) Processors
• Texas Instruments C55x DSP is a family of digital signal processors e.g.: TMS320C55x.
• It is designed for relatively high-performance signal processing.
• Like many other DSPs, it is an accumulator architecture, meaning that many arithmetic operations the output
will store in accumulator itself after the arithmetic operation.
• As accumulator-oriented instructions are also well-suited to digital signal processing.
• Here assembler mnemonics are case-insensitive, whereas Instruction mnemonics are formed by combining a root
with prefixes and/or suffixes.
• C55x supports several data type: 16 bit word as well as 32 bit longword.
• Instructions are byte addressable with some exceptions where it is bitwise addressed.
• It has memory mapped register with stack pointer, data pointer, single repeat register and repeat counter.
• Addressing modes are Direct, Indirect and Absolute.
• TI C55x provides two types of subroutine returns: fast-return and slow-return.
• These vary on where they store the return address and loop context.
10
TMS320C55x Address Space Architecture
11
CPU Programming Input Output
• The CPU communicates to the device by reading and writing the
registers.
• Data registers hold values that are treated as data by the device, such
as the data read or written by a disk.
• Status registers provide information about the device’s operation,
such as whether the current transaction has completed.
• The UART includes one 8-bit register that buffers characters
between the UART and the CPU bus.
• The data bits are sent as high and low voltages at a uniform rate
known as the baud rate.
• Microprocessors can provide programming support for input and
output in two ways:
• I/O instructions and memory-mapped I/O.
12
Input and Output Primitives
• Intel x86, provide special instructions (in and out in the case of the Intel x86) for input and output. These
instructions provide a separate address space for I/O devices.
• Most common way to implement I/O is by memory mapping.
• Even the CPU that provide I/O instructions can also implement memory-mapped I/O.
• Memory-mapped I/O provides addresses for the registers in each I/O device.
• Busy-wait I/O is another way to use devices in a program.
• Devices are typically slower than the CPU and CPU is performing multiple operations on a single device.
• Then it must wait for one operation to complete before starting the next one.
• Asking an I/O device whether it is finished by reading its status register is often called polling.
• Busy-wait I/O is extremely inefficient—the CPU does nothing but test the device status while the I/O
transaction is in progress.
• The interrupt mechanism allows devices to signal the CPU and to force execution of a particular piece of code.
• When an interrupt occurs, the program counter’s value is changed to point to an interrupt handler routine
13
Interrupt Mechanism
14
Supervisor Mode
• As complex systems programs may run under the command of an operating system, it desirable to
provide hardware checks to ensure that the programs do not interfere with each other.
• In such cases it is often useful to have a supervisor mode provided by the CPU.
• Control of the memory management unit (MMU) is typically reserved for supervisor mode.
• Every CPUs doesn’t have supervisor mode.
• Many DSPs, including the C55x, do not provide supervisor modes. The ARM, however, does have such
a mode.
• ARM uses the command SWI to operate in supervisor mode.
• SWI causes the CPU to go into supervisor mode and sets the PC to 0x08.
• In supervisor mode, the bottom 5 bits of the CPSR are all set to 1 to indicate that it is in supervisor
mode.
• The old value of the CPSR just before the SWI is stored in a register called the saved program status
register (SPSR).
15
Exceptions and Traps
• An exception is an internally detected error.
• The exception mechanism provides a way for the program to react to such unexpected events.
• As interrupts can be seen as an extension of the subroutine mechanism, exceptions are generally implemented as
a variation of an interrupt.
• However, exceptions are generated internally and in general require both prioritization and vectoring.
• Priority of exceptions is usually fixed by the CPU architecture.
• A trap, also known as a software interrupt, is an instruction that explicitly generates an exception condition.
• Most common use of a trap is to enter supervisor mode.
• Entry into supervisor mode must be controlled to maintain security.
• The ARM provides the SWI interrupt for software interrupts.
• This instruction causes the CPU to enter supervisor mode.
16
Co-processors
• CPU architects often want to provide flexibility to implement something in the CPU.
• The way to provide such flexibility at the instruction set level is to allow co-processors.
• Co-processors operation requires certain opcodes which are reserved in the instruction set.
• Co-processor must be tightly coupled to the CPU.
• Most architectures use illegal instruction traps to handle situations.
• The trap handler can detect the co-processor instruction.
• Emulating co-processor instructions in software is slower but provides compatibility.
• E.g.: Floating-point arithmetic was introduced into the Intel architecture by providing separate chips that
implemented the floating-point instructions.
• The ARM architecture provides support for up to 16 co-processors.
• Here co-processors can perform load and store operations on their own registers.
• They can also move data between the co-processor registers and main ARM registers.
17
Memory System Architecture
• Modern microprocessor units (MMUs) perform address
translations that provide a larger virtual memory space in a
small physical memory.
• A cache is a small, fast memory that holds copies of some of
the contents of main memory.
• Caches are widely used to speed up memory system
performance.
• A cache controller mediates between the CPU and the
memory system comprised of the main memory.
• The cache controller sends a memory request to the cache
and main memory.
• The simplest way to implement a cache is a direct-mapped
cache.
18
CPU Pipelining
• Modern CPUs are designed as pipelined machines in which several instructions are executed in parallel.
• The ARM7 has a three-stage pipeline:
• Fetch the instruction is fetched from memory.
• Decode the instruction’s opcode and operands are decoded to determine what function to perform.
• Execute the decoded instruction is executed.
• Each of these operations requires one clock cycle for typical instructions. Thus, a normal instruction requires
three clock cycles to completely execute, known as the latency.
• The DSP C55x includes a seven-stage pipeline: Fetch, Decode, Address, Access 1, Access 2, Read stage and
Execute.
• RISC machines are designed to keep the pipeline busy.
• CISC machines may display a wide variation in instruction timing.
• Pipelined RISC machines typically have more regular timing characteristics.
19