ASSEMBLY LANGUAGE FOR X86
PROCESSORS 6TH EDITION
Kip Irvine
Chapter 2: x86 Processor Architecture
Slides prepared by the author
Revision date: 2/15/2010
(c) Pearson Education, 2010. All rights reserved. You may modify and copy this slide
show for your personal use, or for use in the classroom, as long as this copyright
statement, the author's name, and the title are not changed.
Basic Microcomputer Design
2
Central Processor Unit:
clock synchronizes CPU operations
control unit (CU) coordinates sequence of execution steps
ALU performs arithmetic and logic operations
data bus
registers
I/O I/O
Central Processor Unit Memory Storage
Device Device
(CPU) Unit
#1 #2
ALU CU clock
control bus
address bus
Bus: transfer data between different parts of the computer
Data bus, Control bus, and Address bus
CPU - an inside view
To put everything together and understand how a CPU works,
watch this video on The Central Processing Unit (CPU)
Clock
4
synchronizes all CPU and BUS operations
clock cycle measures time of a single operation
clock is used to trigger events
clock cycle is depicted as the time between one falling edge and the next
Clock cycle duration = 1/(clock speed in Hz)
Ex: if speed is 1GHz then duration = 1 nanosecond
Instruction execution: between 1 to 50+ clock cycles.
Memory access may have empty clock cycles called wait states
one cycle
0
Instruction Execution Cycle
5
Loop:
Fetch next instruction then increment IP (the Instruction Pointer)
Decode the instruction
If memory operand needed then
◼ Fetch operand’s value from memory
Execute the instruction
◼ Update status flags such as Carry or Overflow
If result is memory operand then
◼ Store output to memory
Continue loop
Instruction Execution Cycle
6
Fetch
Decode
Fetch operands
Execute
Store output
How does a computer go from a set of stored
instructions to running them?
✓ For more detail on the Instruction Cycle, see Fetch, Decode
and Execute, This is optional reading.
Reading from memory
8
1. Place the address of the value you want to read on the
address bus.
2. Assert (change the value of) the processor’s RD (read) pin.
3. Wait one clock cycle for the memory chips to respond.
4. Copy the data from the data bus into the destination operand.
All above steps generally need single clock cycle.
Results in 4 clock cycles for reading from memory
Slower than CPU registers
Introduced Cache- recently used instructions and data- high speed
Why cache memory is faster than
conventional RAM ?
For more detail on the computer's memory hierarchy,
see the How Stuff Works pages on computer memory.
This is optional reading.
Loading and Executing a Program
10
Steps:
OS searches for program’s filename
If found, OS retrieves information about the file: file size and In general, a
physical location program loader
loads the
OS loads the program in next available memory location program in
◼ Allocates a block of memory memory before
it can run. Then,
◼ Enters information about the program’s size and location in Descriptor table
OS points CPU
OS executes program’s first machine instruction to program’s
entry point.
◼ Becomes a Process
Process runs itself. OS tracks process execution and supplies system
resources
When process ends, it is removed from memory
The Platform We Will Use
11
Assembly language and machine language are processor specific
We will write code for Intel’s x86
IA-32 family: Intel 80386, 486, … Pentium, …
The assembler places its machine code into an object file which is OS
specific
Our code will run (only) on Windows
◼ And it will crash on DOS
Our programs will be Win32 console applications
◼ These are programs for which all I/O operations are character-based
◼ They run into an MS-DOS box but they are not DOS programs (they do not use DOS calls)
“Getting started with Assembly”
Step-wise instructions for the lab setup.
The Intel X86 Family
13
Pentium ...
80486
80386
80286
8086
The instruction set of the x86 is backward compatible with any
one of its predecessors
New additional instructions are introduced with each new processor
IA-32 Processor Architecture
(Modes of Operation)
14
Protected mode:
programs are given separate memory areas called segments
Processor prevents programs from referencing memory outside their
assigned segments
native mode (Windows, Linux) and supported by all x86 except 8086
Real-address mode: 8086
implements the programming environment of an early Intel processor
Such as ability to switch to other modes
Useful if program requires direct access to system memory
native MS-DOS and supported by all x86
IA-32 Processor Architecture
(Modes of Operation)
15
❑ Virtual-8086 mode
❑ Protected, but can execute directly real-address mode software
❑ Such as MS-DOS in safe environment
❑ each program has its own 8086 computer
❑ Program crash does not affect other programs
❑ System management mode: customized processor
❑ power management, system security, diagnostics
Addressable Memory
16
Protected mode
4 GB
32-bit address
Real-address and Virtual-8086 modes
1 MB space
20-bit address
Basic Program Execution Registers
17
Registers: high-speed memories located in the CPU
• Registers for 8086 and 80286 are 16 bits wide
• Registers for IA-32 family are 32 bits wide
32-bit General-Purpose Registers
EAX EBP
EBX ESP
ECX ESI
EDX EDI
16-bit Segment Registers
EFLAGS CS ES
SS FS
EIP
DS GS
General-Purpose Registers
18
8 registers used for arithmetic and data movement
Use8-bit name, 16-bit name, or 32-bit name
Applies to EAX, EBX, ECX, and EDX only
8 8
AH AL 8 bits + 8 bits
AX 16 bits
EAX 32 bits
Index and Base Registers
19
Some registers have only a 16-bit name for their lower half:
EBP/ESP registers are used as pointers to stack
ESI/EDI registers used for fast memory indexing.
Some Specialized Register Uses (1 of 2)
20
General-Purpose
EAX – accumulator
ECX – loop counter
ESP – stack pointer
ESI, EDI – index registers
EBP – extended frame pointer (stack)
Segment registers:
CS – code segment
DS – data segment
SS – stack segment
ES, FS, GS - additional segments
Segment Registers
21
Each program is subdivided into logical
parts called SEGMENTS
CS
Code segment (CS)
Stack segment (SS) SS
Data segments (DS, ES, FS, and GS)
DS
Real-address mode: segment registers ES
hold the “base address” of these program
segments FS
Protected mode: segment registers hold GS
pointers to segment descriptor table
Segment registers are 16-bit wide
Some Specialized Register Uses (2 of 2)
22
EIP – instruction pointer
Stores the address of the next instruction to be executed
IP for 8086
EFLAGS
control flags:
◼ Controlling the operation of the CPU
status flags:
◼ Reflecting outcome of CPU operations
each flag is a single binary bit
◼ Set flag = 1 and Clear flag = 0
Status Flags
23
Carry
CF: unsigned arithmetic out of range
Overflow
OF: signed arithmetic out of range
Sign
SF: result is negative
Zero
ZF: result is zero
Auxiliary Carry
AF: carry from bit 3 to bit 4 in an 8 bit operand
Parity
PF: the least-significant byte in the result contains an even number of 1 bits
Floating-Point UNIT, MMX, XMM Registers
24
Eight 80-bit floating-point data registers 80-bit Data Registers
ST(0)
ST(0), ST(1), . . . , ST(7)
ST(1)
arranged in a stack ST(2)
ST(3)
used for all floating-point arithmetic
ST(4)
Eight 64-bit MMX registers ST(5)
Eight 128-bit XMM registers for single-instruction ST(6)
multiple-data (SIMD) operations ST(7)
operate in parallel on the data values
contained in MMX registers. Opcode Register
Logical and Physical Addresses
25
Addresses specify the location of instructions and data
Addresses that specify an absolute location in main memory are
physical addresses
They appear on the address bus
Addresses that specify a location relative to a point in the program
are logical (or virtual) addresses
They are addresses used in the code and are independent of the structure
of main memory
Each logical address for the x86 consist of 2 parts:
A segment number used to specify a (logical) part of the program [The
physical address of the segment]
An offset number used to specify a location relative to the beginning of
the segment
Segmented Memory
26
Segmented memory addressing: absolute (linear) address is a combination
of a 16-bit segment value added to a 16-bit offset
F0000
E0000 8000:FFFF
D0000
C0000
one segment
B0000
A0000
90000
80000
70000
60000
8000:0250
50000
0250
40000
30000 8000:0000
20000
10000
seg ofs
00000
Calculating Linear Addresses
27
Given a segment address, add a hexadecimal zero at right-most
position and add it to the offset
Example: convert 08F1:0100 to a linear address
Adjusted Segment value: 0 8 F 1 0
Add the offset: 0 1 0 0
Linear address: 0 9 0 1 0
Your turn . . .
28
What linear address corresponds to the segment/offset
address 028F:0030?
028F0 + 0030 = 02920
Always use hexadecimal notation for addresses.
Your turn . . .
29
What segment addresses correspond to the linear address
28F30h?
Many different segment-offset addresses can produce the
linear address 28F30h. For example:
28F0:0030, 28F3:0000, 28B0:0430, . . .
IA-32 Memory Management
(Real-Address mode)
30
1 MB RAM maximum addressable
Application programs can access any area of memory
Single tasking
Supported by MS-DOS operating system
IA-32 Memory Management
(Protected Mode)
31
Protected mode is the most robust and powerful, but it does restrict
application programs from directly accessing system hardware.
4 GB addressable RAM
(00000000 to FFFFFFFFh)
Each program assigned a memory partition which is protected from
other programs
Designed for multitasking
Supported by Linux & MS-Windows
Protected mode (2 of 2)
32
Segment descriptor tables
Program structure
code, data, and stack areas
CS, DS, SS segment descriptors
global descriptor table (GDT)
MASM Programs use the Microsoft flat memory model
Flat Segment Model
33
Single global descriptor table (GDT).
All segments mapped to entire 32-bit address space
FFFFFFFF
not used
(4GB)
Segment descriptor, in the
Global Descriptor Table
00040000
base address limit access
physical RAM
00000000 00040 ----
00000000
Multi-Segment Model
34
Each program has a local descriptor table (LDT)
holds descriptor for each segment used by the program
RAM
Local Descriptor Table
26000
base limit access
00026000 0010
00008000 000A
00003000 0002 8000
3000
Paging
35
Supported directly by the CPU
Divides each segment into 4096-byte blocks called pages
Sum of all programs can be larger than physical memory
Part of running program is in memory, part is on disk
Virtual memory manager (VMM) – OS utility that manages the
loading and unloading of pages
Page fault – issued by CPU when a page must be loaded from disk
Address Translation in Protected Mode
36
The logical/virtual address of a referenced word is given by a pair of numbers
(segment, offset)
The segment number is contained in a segment register and is used to select (or
index) an entry in a segment table (called a descriptor table)
Hence, a segment register is also called a selector
The selected entry (the descriptor) contains the base address and length of the
referenced segment
The 32-bit base address is added to the 32-bit offset to form a 32-bit linear
address (P1,P2,D)
P1 indexes a directory page table (in memory) to obtain the base address of
a second page table which is indexed by P2 to give the physical address of
the referenced word
Intel 386
P1 P2 D
Address
Translation
37
Early Intel Microprocessors
38
Intel 8080
64K addressable RAM
8-bit registers
CP/M operating system
S-100 BUS architecture
8-inch floppy disks!
Intel 8086/8088
IBM-PC Used 8088
1 MB addressable RAM
16-bit registers
16-bit data bus (8-bit for 8088)
separate floating-point unit (8087)
39 Early Programming
Watch this video on Early Programming to learn about how we
got here and the common underlying computer architecture.
(not just intel)
The IBM-AT
40
Intel 80286
16 MB addressable RAM
Protected memory
several times faster than 8086
introduced IDE bus architecture
80287 floating point unit
Intel IA-32 Family
41
Intel386
4 GB addressable RAM, 32-bit registers, paging
(virtual memory)
Intel486
instruction pipelining
Pentium
superscalar, 32-bit address bus, 64-bit internal
data path
64-bit Processors
42
Intel64
64-bit linear address space
Intel: Pentium Extreme, Xeon, Celeron D, Pendium D, Core 2,
and Core i7
IA-32e Mode
Compatibility mode for legacy 16- and 32-bit applications
64-bit Mode uses 64-bit addresses and operands
Intel Technologies
43
HyperThreading technology
two tasks execute on a single processor at the same time
Dual Core processing
multiple processor cores in the same IC package
each processor has its own resources and communication path
with the bus
Intel Processor Families
44
Currently Used:
Pentium & Celeron – dual core
Core 2 Duo - 2 processor cores
Core 2 Quad - 4 processor cores
Core i7 – 4 processor cores
Core i9- 6 processor cores
CISC and RISC
45
CISC – complex instruction set
large instruction set
high-level operations
requires microcode interpreter
examples: Intel 80x86 family
RISC – reduced instruction set
simple, atomic instructions
small instruction set
directly executed by hardware
examples:
◼ ARM (Advanced RISC Machines)
◼ DEC Alpha (now Compaq)
Intel D850MD Motherboard
Video mouse, keyboard,
parallel, serial, and USB
Audio chip connectors
PCI slots
memory controller hub
Pentium 4 socket
AGP slot
dynamic RAM
Firmware hub
I/O Controller
Speaker Power connector
Battery
Diskette connector
Source: Intel® Desktop Board D850MD/D850MV Technical Product IDE drive connectors
Specification
46
Evolution Of CPU Processing Power
Watch this video on Rise Of The x86 to understand how
x86 architecture dominates the PC world from Intel, IBM to
Microsoft and remains as the primary architecture of PCs
today.
Displaying a String of Characters
48
When a HLL program Application Program Level 3
displays a string of
characters, the following
steps take place: OS Function Level 2
BIOS Function Level 1
Hardware Level 0
Programming levels
49
Assembly language programs can perform input-output at each of
the following levels:
50
42 69 6E 61 72 79
What does this say?