0% found this document useful (0 votes)
22 views73 pages

Desktop Computer Structure Overview

The document outlines the basic structure of desktop computers, detailing key components such as the CPU, motherboard, memory, input/output devices, and power supply. It also discusses computer architecture models, specifically Von Neumann and Harvard architectures, highlighting their features, advantages, and limitations. Additionally, it covers CPU organization, memory management, bus structures, and the role of various registers in data processing.

Uploaded by

yuvrajsharma2937
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views73 pages

Desktop Computer Structure Overview

The document outlines the basic structure of desktop computers, detailing key components such as the CPU, motherboard, memory, input/output devices, and power supply. It also discusses computer architecture models, specifically Von Neumann and Harvard architectures, highlighting their features, advantages, and limitations. Additionally, it covers CPU organization, memory management, bus structures, and the role of various registers in data processing.

Uploaded by

yuvrajsharma2937
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

COA

UNIT 1: BASIC STRUCTURE OF COMPUTER

1. Structure of Desktop Computers

A desktop computer comprises several key components:

 Central Processing Unit (CPU): The brain of the computer, responsible for executing instructions and
processing data.
 Motherboard: The main circuit board that houses the CPU, memory, and other essential components.
 Memory: Includes both primary memory (RAM) for temporary data storage and secondary memory
(HDD/SSD) for permanent data storage.
 Input Devices: Devices like keyboards and mice that allow users to interact with the computer.
 Output Devices: Devices such as monitors and printers that display or produce the results of
computations.
 Power Supply Unit (PSU): Provides the necessary electrical power to all components.

Main Components of a Desktop Computer

1. Central Processing Unit (CPU)

 The CPU is often referred to as the brain of the computer.


 It performs all the arithmetic, logical, and control operations required to execute programs.
 It consists of:
o ALU (Arithmetic Logic Unit): Performs calculations and logic operations.
o Control Unit: Directs the operations of the processor.
o Registers: Small memory locations inside the CPU that temporarily store data and instructions.

The performance of a CPU is influenced by factors such as clock speed, core count, and cache size.

2. Motherboard

 This is the main printed circuit board (PCB) that connects all components of the computer.
 It provides electrical connections so components like the CPU, RAM, storage, and I/O devices can
communicate.
 Contains chipsets, BIOS/UEFI firmware, expansion slots (PCIe), and power connectors.

3. Memory

Memory is divided into two types:

 Primary Memory (RAM - Random Access Memory):


o Temporary memory used to store data and instructions that are actively being used.
o Faster than secondary memory.
o Data is lost when the computer is turned off (volatile memory).
 Secondary Memory:
o HDD (Hard Disk Drive): Uses magnetic storage, larger capacity, slower speed.
o SSD (Solid State Drive): Uses flash memory, faster, more reliable, but often more expensive.

Secondary memory is non-volatile, meaning data remains even after power is turned off.

4. Input Devices

 Devices that allow the user to send data and commands to the computer.
 Examples:
o Keyboard: For typing text and commands.
o Mouse: For graphical navigation and interaction.
o Scanner, Webcam, Microphone: For various forms of input.

5. Output Devices

 These allow the computer to present results to the user.


 Examples:
o Monitor: Displays the user interface, videos, and images.
o Printer: Produces hard copies of digital content.
o Speakers: Provide audio output.

7. Other Components

 Expansion Cards: Such as graphics cards, sound cards, and network interface cards (NICs).
 Cooling Systems: Fans or liquid cooling for temperature control.
 Optical Drives: DVD/CD drives (becoming less common).

Computer Architecture Models

1. Von Neumann Architecture

Named after mathematician John von Neumann, this is the most widely used architecture in general-purpose
computers.

The Von Neumann Architecture is a design model where:

 Data and instructions are stored in the same memory.


 One single bus (pathway) is used to transfer both instructions and data.
 The CPU fetches one thing at a time — either an instruction or data — not both together.
Example:

Think of it like a one-lane road where only one car (either instruction or data) can move at a time. So, traffic
(processing) can get slow.

Key Features:

 Same memory for data and program.


 One bus for both data and instructions.
 Instructions are executed step by step.
 Used in most general-purpose computers like desktops and laptops.
 Shared Memory for Data and Instructions: Both are stored in the same RAM.
 Single Bus System: A common bus is used for transferring data, instructions, and addresses.
 Sequential Execution: Instructions are fetched and executed one by one using the program counter.
 Bottleneck: The CPU cannot fetch and execute instructions at the same time due to the Von Neumann
bottleneck (limited data transfer rate between CPU and memory).

Diagram (conceptually):
Input → CPU ←→ Memory ←→ Output

Control Unit

Advantages:

 Simple and cost-effective.


 Suitable for general-purpose computing.

Limitations:

 Instruction and data share the same bus → slower execution.


 Parallel processing is harder to implement.

2. Harvard Architecture

In Harvard architecture, data and instructions are stored in separate memory units, and they have
separate buses. Data and instructions are stored in separate memories.

 There are two buses: one for data and one for instructions.
 The CPU can fetch both an instruction and data at the same time.

Example:

It’s like having a two-lane road — one lane for cars (data) and one for bikes (instructions). So both can move
at the same time, making things faster.
Key Features:

 Separate memory for data and program.


 Two buses: one for data, one for instructions.
 Faster than Von Neumann.
 Commonly used in embedded systems, like microwaves, washing machines, and microcontrollers.
 separate pathways for data and instructions (two buses
 Can fetch an instruction and data at the same time → faster performance.
 Commonly used in embedded systems and microcontrollers.

Diagram:
Input → CPU ←→ Instruction Memory ←→ Output

Data Memory

Advantages:

 Eliminates the Von Neumann bottleneck.


 Faster execution due to simultaneous fetch of instructions and data.

Limitations:

 More complex and expensive.


 Less flexible for general-purpose computing.

Feature Von Neumann Architecture Harvard Architecture

Memory for Instructions & Same memory Separate memories


Data

Bus Structure One bus (shared for both) Two buses (one for each)

Speed Slower due to single bus Faster due to separate buses

Execution Fetches data and instruction one at a Can fetch both at the same time
time

Complexity Simple design More complex design

Used In General-purpose computers (PCs) Embedded systems


(microcontrollers)

Cost Less expensive More expensive


2. CPU: General Register Organization

The CPU utilizes a set of general-purpose registers to perform operations efficiently:

 Accumulator (AC): Holds intermediate results of arithmetic and logic operations.


 Data Registers (R0–R7): Temporarily store data and addresses.
 Index Registers: Used for indexed addressing modes.
 Program Counter (PC): Holds the address of the next instruction to be executed.

These registers facilitate quick data access and manipulation, enhancing the CPU's performance.

: Program Counter (PC) – Directing Execution

Every CPU operation starts with the Program Counter (PC), which contains the address of the next instruction
to be fetched from memory. This register ensures the sequential flow of the program unless a branch or jump
alters it. For example, if PC = 0x1000, the CPU fetches the instruction from memory location 0x1000 and
increments the PC to 0x1001 (or further, depending on instruction size) to point to the next instruction.

: Memory Address Register (MAR) – Locating Data

Once the PC provides the address, the CPU places it in the Memory Address Register (MAR). The MAR acts
as the interface between the CPU and main memory, holding the specific address that the CPU wants to read
from or write to. For instance, if the CPU needs to read data from address 0x200, this value is first placed in the
MAR, which then sends the address to memory.

Memory Registers

Memory registers act as intermediaries between the CPU and memory:

 Memory Address Register (MAR): Holds the address of the memory location to be accessed.
 Memory Data Register (MDR): Contains the data to be written to or read from memory.

These registers ensure efficient data transfer between the CPU and memory

: Memory Buffer Register (MBR) – Holding Data Temporarily

After the MAR specifies the address, the memory system responds with the data stored at that location. This
data is temporarily held in the Memory Buffer Register (MBR) (also called the Memory Data Register or
MDR in some systems). The MBR ensures that data fetched from or written to memory is temporarily stored
until it's moved to its final destination, such as a general-purpose register or the accumulator.

: Instruction Register (IR) – Decoding Instructions

The Instruction Register (IR) holds the instruction fetched from memory so that it can be decoded and
executed. For instance, an instruction like ADD R1, R2, R3 is fetched into the IR, and the control unit decodes
it to generate signals that direct the CPU to perform the addition of R2 and R3, placing the result in R1. The IR
ensures that instructions are held in place while being interpreted and executed.

: General-Purpose Registers (R0–R7) – Working with Data

The decoded instruction often operates on general-purpose registers, such as R0 through R7, which store
temporary data and operands. These registers allow rapid data access during operations like addition,
subtraction, or logical comparison. For instance, an instruction might specify ADD R4, R2, R3, meaning the
CPU adds the contents of R2 and R3 and stores the result in R4.

: Accumulator (AC) – Performing Arithmetic and Logic

For certain architectures, particularly accumulator-based designs, the Accumulator (AC) is used for most
arithmetic and logic operations. The AC holds operands and results for calculations. If the CPU is to perform
AC = AC + R1, it takes the current value in the accumulator, adds the value from R1, and stores the result back
in the AC. This register simplifies instruction formats and reduces the number of required operands.

: Index Registers – Address Modification for Arrays

In operations involving arrays or loops, the index registers come into play. These registers store offset values
used to modify base addresses dynamically. For example, if you're accessing the 5th element of an array starting
at address 1000, and the index register contains 4 (assuming 0-based indexing), the effective address becomes
1004, enabling efficient traversal of data structures.

: Stack Pointer (SP) – Managing the Call Stack

The Stack Pointer (SP) keeps track of the top of the stack in memory, which is used for function calls, returns,
and local variable storage. When a subroutine is called, the CPU pushes the return address and possibly some
register values onto the stack by decrementing the SP. Upon return, these values are popped off the stack,
restoring the previous state. This mechanism is critical for supporting nested function calls and recursion.

: Status Register / Flags – Recording Results

After every operation, the Status Register (or Flags Register) is updated to reflect the outcome. It contains
individual bits called flags that indicate conditions such as zero result (Z), negative result (N), carry out (C), and
overflow (V). For instance, after a subtraction operation, if the result is zero, the Zero flag is set. These flags are
often used in conditional instructions, like branching if the result was negative or zero.

:Final Update and Next Cycle

Once the instruction has executed—using the IR, data registers, and possibly modifying memory—the CPU
updates the Program Counter again and begins the next fetch-decode-execute cycle. Throughout this process,
the tight cooperation among all these registers allows the CPU to operate efficiently, maintain data integrity,
and follow the logic of the program correctly.
Summary of Additional Registers:

Register Purpose
Instruction Register (IR) Holds current instruction being executed
Memory Address Register (MAR) Holds address of memory to be accessed
Memory Buffer Register (MBR) Temporarily holds data read/written from memory

Stack Pointer (SP) Points to the top of the stack


Status Register / Flags Holds condition flags (zero, carry, overflow, etc.)

# Stack Organization

A stack is a data structure that operates on the Last In, First Out (LIFO) principle. It is used for:

 Function Calls: Storing return addresses.


 Interrupt Handling: Saving the state of the CPU during interrupts.
 Expression Evaluation: Storing operands and operators.

The Stack Pointer (SP) register keeps track of the top of the stack.

# Control Word

A Control Word is a binary code that specifies a micro-operation to be performed by the CPU. It directs the
control unit to generate the appropriate control signals for executing instructions.

# Arithmetic Logic Unit (ALU)

The ALU performs all arithmetic and logic operations. It takes inputs from registers, performs the specified
operation, and stores the result back in a register. The ALU also sets status flags (like Zero, Carry, Sign) that
influence conditional operations.

# I/O System

The Input/Output (I/O) System manages data exchange between the computer and external devices. It
includes:

 I/O Interfaces: Hardware that connects external devices to the computer.


 I/O Controllers: Manage data transfer between devices and the CPU.
 Data Transfer Methods: Programmed I/O, Interrupt-driven I/O, and Direct Memory Access (DMA).

Efficient I/O systems are crucial for overall system performance.


#. Bus Structure

The bus structure refers to the organization of buses in a computer system. It can be:

 Single Bus: All components share a single bus.


 Multiple Buses: Separate buses for data, address, and control signals.

Multiple buses can improve performance by reducing data congestion.

# Bus and Memory Transfer

Bus and memory transfer mechanisms describe how data moves between the CPU and memory through
buses. The Memory Address Register (MAR) holds the address, and the Memory Data Register (MDR)
carries the data. Control signals like Read and Write are used to coordinate these transfers.

What is a Bus in a Computer?

In a computer system, a bus is like a set of wires or pathways that connect different parts of the computer,
such as the CPU, memory, and input/output (I/O) devices. Think of it as a highway system that allows data to
travel between different components. Just like cars use roads to move from one city to another, information in a
computer uses buses to move between components.

Buses are made up of parallel lines, meaning multiple bits (0s and 1s) can travel at the same time, side by side.
The more lines a bus has, the more data it can carry at once — this is called the width of the bus.

There are three main types of buses in a computer:

1. Data Bus – Carries the Actual Information

The data bus is used to transfer actual data between the CPU, memory, and other devices. For example, when
you open a file, the data (like text or images) travels from the hard drive to the CPU through the data bus. If the
data bus is 8 bits wide, it can move 8 bits of data at a time; if it's 32 bits wide, it can move more data in one go,
making the computer faster.

2. Address Bus – Tells here to Go

The address bus is used to carry memory addresses, not data. This tells the computer where the data should
go or come from. For example, if the CPU wants to read some data from memory, it uses the address bus to
send the location of that data. The memory then uses that location to find and send the correct data.

So, if the address bus is 16 bits wide, the CPU can address up to 2¹⁶ = 65,536 memory locations. A wider
address bus means the system can access more memory.
3. Control Bus – Gives Instructions and Signals

The control bus carries control signals that manage how and when data is transferred. It tells the other
components what to do, like whether to read or write data, whether a device is ready, or when to stop a transfer.

For example, if the CPU wants to read data from memory, it will send a "Read" signal on the control bus. If it
wants to send data, it will send a "Write" signal.

Why Does Bus Width and Speed Matter?

 Bus Width: This refers to how many bits the bus can carry at one time. A wider bus (like 64 bits vs. 32
bits) means more data can travel at once, improving speed.
 Bus Speed: This refers to how fast the data moves. A faster bus allows more data to be transferred in
less time.
 Both width and speed of a bus affect overall computer performance. Faster and wider buses help the
CPU communicate with memory and other devices more efficiently, leading to quicker program
execution and better multitasking.

Summary in Simple Terms

A bus is like a highway system in a computer.

 The data bus carries the actual "passengers" (data).


 The address bus tells the "destination" (memory location).
 The control bus gives the "traffic signals" (instructions on what to do).

Faster and wider buses mean the computer can do more in less time — just like having more lanes and higher
speed limits on a highway helps traffic move better.

# CPU and Memory Program Counter

The Program Counter (PC) holds the address of the next instruction to be executed. After each instruction
fetch, the PC is incremented unless modified by control transfer instructions like jumps or branches. This
ensures the sequential execution of instructions.

# Register Transfer Language (RTL)

Register Transfer Lnguage (RTL) is used to describe operations in terms of data transfers between registers
and the operations performed on that data. It uses symbolic notations such as R1 ← R2 + R3, meaning the
contents of R2 and R3 are added and stored in R1.
Let's break down Register Transfer Language (RTL) in a very simple and detailed way:

What is RTL?

Register Transfer Language (RTL) is a way to describe how data moves between registers in a computer,
and what operations are done on that data.

Think of it as a simple language that tells the computer:

 Where to get the data from (which register).


 What to do with the data (like add or subtract).
 Where to store the result (another register).

What is a Register?

A register is a small, fast storage location inside the CPU. It temporarily holds data that the CPU is currently
working on.

Registers are like tiny boxes where you keep important things you need right now.

Examples: R1, R2, R3 are just names of registers.

What Does an RTL Statement Look Like?

Example:

R1 ← R2 + R3

This means:

 Take the value in register R2


 Add it to the value in register R3
 Store the result in register R1

So RTL describes a data transfer with an operation.

Common RTL Operations

Here are a few examples of RTL instructions and what they mean:
RTL Statement Meaning
Copy the value from R2 into
R1 ← R2
R1
R1 ← R2 + R3 Add R2 and R3, store in R1
Subtract R1 from R4, store in
R4 ← R4 - R1
R4
Bitwise AND of R6 and R7,
R5 ← R6 AND R7
store in R5
R3 ← R3 + 1 Increment R3 by 1
R2 ← R2 - 1 Decrement R2 by 1

Why is RTL Important?

RTL is important because it:

 Helps computer engineers design CPUs.


 Makes it easier to visualize and plan how instructions work.
 Shows the flow of data at the hardware level.

It acts like a bridge between the software (like a programming language) and the hardware (the CPU circuits).

Imagine registers as small cups of water and operations as actions like pouring, mixing, or measuring:

 R1 ← R2 + R3 is like pouring water from cup R2 and R3 into cup R1 after mixing.
 R1 ← R2 is like just copying the water from cup R2 to cup R1.
 RTL is a language used to describe how data is transferred between registers and what operations are
done.

 It looks like: R1 ← R2 + R3.


 It's mainly used in CPU design and hardware-level programming.

Instruction Register

The Instruction Register (IR) holds the current instruction being executed. It is loaded with the instruction
fetched from memory and is decoded to determine the operation to be performed. The IR plays a crucial role in
the fetch-decode-execute cycle.

# Instruction Types

Instructions can be categorized based on their function:


 Data Transfer Instructions: Move data between registers, memory, and I/O devices (e.g., MOV,
LOAD, STORE).
 Arithmetic Instructions: Perform mathematical operations (e.g., ADD, SUB).
 Logical Instructions: Perform bitwise operations (e.g., AND, OR).
 Control Instructions: Alter the sequence of execution (e.g., JUMP, CALL, RET).
 I/O Instructions: Manage data exchange with external devices (e.g., IN, OUT).

# Instruction Format

An instruction format defines the layout of bits in an instruction. Common fields include:

 Opcode: Specifies the operation to be performed.


 Operand(s): Specifies the data or address.
 Addressing Mode: Indicates how to interpret the operand.

Instruction formats vary based on the architecture and complexity of the CPU.

What is an Instruction Format?

An instruction is a command given to the CPU, telling it what to do.

The Instruction Format describes how the bits of an instruction are organized inside the CPU. You can think
of it like a sentence made of smaller parts (like subject, verb, and object).

Common Fields in an Instruction Format

Field Meaning Example

Opcode Tells the CPU what operation to perform e.g., ADD, SUB, LOAD

Operands Tells what data to use or which registers/memory to e.g., R1, R2, 5000
use

Addressing Tells the CPU how to interpret the operands (like e.g., direct (use address), immediate
Mode direct, indirect, immediate) (use value directly)
Simple Example

Let’s break this instruction:

ADD R1, 1000

What this means:

 ADD: Operation (Opcode)


 R1: Register to store result (Operand)
 1000: Memory address (Operand)
 Addressing Mode: Direct — get the value from memory address 1000.

Types of Instruction Formats

Instruction formats can vary in size (like 16-bit, 32-bit, or 64-bit instructions), but here are common types:

1. Zero-Address Instruction
o No operands. Often used in stack-based systems.
o Example: ADD (takes values from stack)
2. One-Address Instruction
o One operand plus an implicit register (like the accumulator).
o Example: LOAD 5000 (load from memory address 5000)
3. Two-Address Instruction
o Two operands. One is the source, one is the destination.
o Example: MOV R1, R2 (copy R2 into R1)
4. Three-Address Instruction
o Uses two source operands and one destination.
o Example: ADD R1, R2, R3 (R1 = R2 + R3)

Let’s explain these types of instruction formats — Zero, One, Two, and Three Address Instructions — in
simple and detailed terms. These formats tell us how many operands (data items or addresses) are used in a
single instruction, and how they're processed.

1. Zero-Address Instruction

A zero-address instruction doesn't use any explicit operands. Instead, it works with a stack, where data is
stored in a Last In First Out (LIFO) way.

 The CPU uses a stack to push and pop values.


 Instructions like ADD or MUL operate on the top two items on the stack.
2. One-Address Instruction

A one-address instruction uses one explicit operand, and the other is implied (usually the accumulator, a
special register used for arithmetic).

 You don't need to write the accumulator in the instruction; it’s automatically used.

Example:

LOAD 5000

What it does:

 Loads the value from memory address 5000 into the accumulator.

Other examples:

ADD 6000 ; Acc = Acc + memory[6000]


STORE 7000 ; Store Acc to memory[7000]

Use Case:

 Simple to design and commonly used in early computers and microcontrollers.

3. Two-Address Instruction

A two-address instruction uses two operands:

 One is usually the destination (where the result goes).


 The other is the source (where the data comes from).

Example:

MOV R1, R2

What it does:

 Copies the value from register R2 into register R1.

Other example:

ADD R1, R2 ; R1 = R1 + R2

This means:
 Take R1 and R2
 Add them
 Store the result back into R1

Use Case:

 Reduces instruction size compared to three-address instructions.


 More efficient in space but less flexible than three-address.

4. Three-Address Instruction

A three-address instruction uses three operands:

 Two source operands (data to be used)


 One destination operand (where result is stored)

Example:

ADD R1, R2, R3

What it does:

 Adds values in R2 and R3


 Stores result in R1

So:

R1 = R2 + R3

Other examples:

SUB R4, R5, R6 ; R4 = R5 - R6


MUL R7, R1, R2 ; R7 = R1 × R2

Use Case:

 More flexible and powerful


 Used in RISC (Reduced Instruction Set Computers) architectures

Summary Table

Instruction Type # of Operands Example Description

Zero-Address 0 ADD Uses stack; adds top two stack values

One-Address 1 LOAD 5000 Uses one operand and accumulator


Two-Address 2 MOV R1, R2 Moves data from source to destination

Three-Address 3 ADD R1, R2, R3 Performs operation on two sources, stores in third

What is the Instruction Cycle?

Now that we know what an instruction looks like, let’s see how the CPU executes it step-by-step.

This is called the Instruction Cycle, and it has 4 main steps:

1. Fetch

 The CPU gets (fetches) the next instruction from memory.


 Uses a special register called the Program Counter (PC) to know where the next instruction is.

Example:

 PC = 2000 → fetch instruction from memory[2000]

2. Decode

 The CPU reads the instruction and figures out:


o What operation to do (from the opcode)
o Where to get the data (from operands and addressing mode)

Example:

 Instruction = ADD R1, 1000 → decode this to know it's an addition

3. Execute

 The CPU performs the actual operation.


o It could be adding two numbers, loading data, jumping to another instruction, etc.

Example:

 ADD R1, 1000 → load data from memory[1000], add to R1


4. Store (or Write Back)

 The CPU saves the result of the operation into a register or memory.

Example:

 Store the result of addition into R1

This Cycle Repeats

After finishing one instruction, the CPU moves to the next (by updating the Program Counter) and repeats:

Fetch → Decode → Execute → Store → (next instruction)

Diagram of Instruction Cycle (Simplified)


+--------+
| Fetch |
+---+----+
|
v
+--------+
| Decode |
+---+----+
|
v
+--------+
|Execute |
+---+----+
|
v
+--------+
| Store |
+--------+
Summary

Term Simple Meaning

Instruction Format Layout or structure of an instruction (includes opcode, operands, etc.)

Opcode Tells what action to perform (e.g., ADD, SUB)

Operand The data or the address involved

Addressing Mode Tells how to interpret operands

Instruction Cycle Steps the CPU follows to process an instruction: Fetch → Decode → Execute → Store

15. Addressing Modes

Addressing modes define how operands are accessed in instructions. Common addressing modes include:

 Immediate: Operand is part of the instruction.


 Direct: Address of the operand is given in the instruction.
 Indirect: Instruction contains address of a memory location that holds the address of the operand.
 Register: Operand is in a CPU register.
 Indexed: Effective address is calculated by adding a constant value to the content of an index register.

These modes increase flexibility and efficiency in accessing data during program execution.

What Are Addressing Modes?

In every instruction, the CPU needs to know:

 What to do (the operation – from the opcode)


 Where to get the data from (this is where addressing modes come in!)

Addressing Modes tell the CPU how to interpret the operands: whether the operand is the actual value, a
memory address, in a register, or something else.

1. Immediate Addressing Mode

Meaning:

The operand (actual value) is directly given in the instruction itself.


Example:

MOV R1, #5

What happens:

 The value 5 is immediately loaded into register R1.

The # symbol usually means it's an immediate value.

Simple Explanation:

 "Don’t look anywhere—here’s the value directly!"

2. Direct Addressing Mode

Meaning:

The instruction gives the memory address where the data is located.

🔸 Example:

LOAD R1, 5000

What happens:

 Go to memory address 5000


 Load the data from there into R1

Simple Explanation:

 "Here’s the exact memory address where your data lives."

3. Indirect Addressing Mode

Meaning:

The instruction contains a memory address, but that address holds another address, and that second address
is where the actual data is.

🔸 Example:

LOAD R1, (5000)


What happens:

 Go to memory[5000], get the address stored there (say, 7000)


 Go to memory[7000], and load that data into R1

🔹 Simple Explanation:

 "Go to this address to find another address that tells you where the data is."

4. Register Addressing Mode

Meaning:

The operand is in a CPU register.

🔸 Example:

MOV R1, R2

What happens:

 Copy the value from register R2 into register R1

🔹 Simple Explanation:

 "The data is already in the CPU—just move it from one box (register) to another."

5. Indexed Addressing Mode

Meaning:

The address of the operand is calculated by adding a constant (offset) to the value in an index register.

🔸 Example:

LOAD R1, 1000(R2)

What happens:

 Add 1000 + R2
 Use the result as a memory address
 Load the data from that address into R1
Simple Explanation:

 "Start from a base (index register) and add an offset to find where the data is."

📌 Common in:

 Arrays, loops, and accessing lists in memory.

Other Addressing Modes (Bonus)

Here are a few additional modes you might encounter:

Addressing Explanation Example


Mode
Register Indirect Register holds the memory address where data is LOAD R1, (R2)
stored
Relative Used in branching; address = PC + offset JMP +5 (jump 5 instructions
ahead)
Base + Index Adds base register + index register + offset Used in complex array access

Summary Table
Mode Where is the operand? Example Explanation

Immediat Inside the instruction MOV R1, #10 Use value 10 directly
e
Direct Memory address is given in the LOAD R1, 5000 Go to address 5000
instruction

Indirect Address points to another address LOAD R1, (5000) memory[5000] → 7000 →
memory[7000]

Register Operand is in a register MOV R1, R2 Use data from R2


Indexed Address = base + offset LOAD R1, Effective address = 1000 + R2
1000(R2)

18. Micro Instruction Formats

A microinstruction is a low-level instruction used in a microprogrammed control unit. Its format includes:

Let me explain Microinstruction Formats in a simple and detailed way.


What is a Micro instruction?

A microinstruction is a very basic, low-level instruction used inside the control unit of a CPU that has a
micro programmed control. These microinstructions control the internal hardware by specifying exact
operations like moving data between registers, controlling ALU operations, etc.

Microinstruction Format — Key Parts

A microinstruction usually contains two important fields:

1. Micro-Operation Fields

 These fields specify the micro-operations — the actual hardware-level operations the CPU should
perform during that cycle.
 Examples of micro-operations:
o Load register R1
o Add contents of R2 and R3
o Move data from one register to another
o Enable memory read/write
 There can be multiple micro-operations in one microinstruction, often combined using control signals.

2. Next Address Field

 This field tells the control unit which microinstruction to execute next.
 It directs the flow of microinstructions, allowing sequences or branching in microprograms.
 The next address can be:
o A fixed address (next microinstruction in sequence)
o An address decided by a condition (branching based on flags)
o An address from a register or counter

Why is Microinstruction Format Important?

 It helps the CPU’s control unit to generate the correct signals for each step in executing machine
instructions.
 Enables flexible and programmable control logic instead of hardwired logic.

Simple Example

Imagine a microinstruction might look like:

Micro-Operations Next Microinstruction Address

Load R1 ← Memory[AR], ALU add R2 + R3 1005


Here:

 The CPU will load the content from memory address in AR into R1 and also add R2 and R3.
 After completing this microinstruction, control jumps to microinstruction at address 1005.

# Difference Between Hardwired Control Unit and Microprogrammed Control Unit

1. Hardwired Control Unit

What is it?

 A hardwired control unit is built using fixed electronic circuits like combinational logic gates, flip-
flops, decoders, and multiplexers.
 The control signals are generated by fixed hardware logic.
 It uses logic gates and timing signals to directly produce control signals based on the current
instruction.

Characteristics

Feature Description
Design Fixed, implemented using circuits and gates
Control Signal Generation Signals generated directly by combinational logic
Speed Very fast because control signals are generated by hardware instantly
Flexibility Not flexible — changing control logic requires redesigning the hardware
Complexity Can be complex for complex instruction sets
Examples Early computers and simple processors

Advantages

 Faster instruction execution.


 Efficient for simple and small instruction sets.

Disadvantages

 Difficult to modify or extend.


 Complex to design and troubleshoot for large instruction sets.

2. Microprogrammed Control Unit

 A microprogrammed control unit uses a microprogram (stored in control memory) to generate


control signals.
 Instead of fixed hardware, it uses a sequence of microinstructions to define the control signals.
 It’s like a small program controlling the CPU's internal operations.

Characteristics

Feature Description

Design Control signals generated by executing microinstructions stored in control memory

Control Signal Generation Control signals generated sequentially by microinstructions

Speed Slower compared to hardwired control due to fetching microinstructions

Flexibility Highly flexible — changing control behavior requires changing microprogram only

Complexity Easier to design and modify; handles complex instruction sets easily

Examples Most modern CPUs and complex processors

Advantages

 Easy to modify and extend.


 Easier to implement complex instruction sets.
 Simplifies CPU design.

Disadvantages

 Slower execution speed compared to hardwired control.


 Requires extra memory for control storage.

Feature Hardwired Control Unit Microprogrammed Control Unit

Implementation Fixed hardware circuits Control memory storing microinstructions

Control Signal Source Generated directly by combinational logic Generated by microinstructions

Speed Faster Slower due to microinstruction fetch cycle

Flexibility Not flexible; hardware changes needed Highly flexible; change microprogram only

Complexity Complex for large instruction sets Simpler for complex instruction sets

Ease of Design Difficult Easier to design and modify

Cost Cheaper for simple designs Requires more memory and control hardware

Examples Early CPUs, simple processors Modern CPUs, complex processors


UNIT -2

1. Binary Addition and Subtraction (with Two’s Complement)


2. Booth’s Multiplication Algorithm
3. Restoring Division Algorithm
4. Floating Point Addition
5. Block Diagram of an Arithmetic Unit

1. Binary Addition and Subtraction Using Two’s Complement

Binary Addition Example: Add 13 and 9

 Convert decimal to 5-bit binary:


o 13 → 01101
o 9 → 01001

Add bit by bit:

Bit Position 4 3 2 1 0
13 (01101) 0 1 1 0 1
9 (01001) 0 1 0 0 1
Sum 1 0 1 1 0

 Addition with carry:


o bit 0: 1+1=0 carry 1
o bit 1: 0+0+1=1 carry 0
o bit 2: 1+0=1 carry 0
o bit 3: 1+1=0 carry 1
o bit 4: 0+0+1=1 carry 0

Result = 10110 = 22 decimal.

Problem: Add 13 and 9 using binary addition

Step 1: Convert Decimal Numbers to Binary

We use 5-bit binary representation to hold the value

Decima 5-bit Binary


l
13 01101
9 01001
Step 2: Write the Binary Numbers Vertically

Align them by bit positions (from bit 0 to bit 4):

Bit Pos: 4 3 2 1 0
-------------------
13 0 1 1 0 1
9 0 1 0 0 1

We’ll add them right to left (from LSB → MSB), keeping track of any carry.

Step 3: Add Bit by Bit with Carry

Bit Position Bit Values (13 + 9) Sum Carry


0 (LSB) 1+1 0 1
1 0 + 0 + 1 (carry) 1 0
2 1+0 1 0
3 1+1 0 1
4 (MSB) 0 + 0 + 1 (carry) 1 0

Now we write the sum from bit 4 to bit 0:

 Bit 4: 1
 Bit 3: 0
 Bit 2: 1
 Bit 1: 1
 Bit 0: 0

Step 4: Final Binary Result

Result (binary) = 10110

Now convert it back to decimal:

(1×24)+(0×23)+(1×22)+(1×21)+(0×20)=16+0+4+2+0=22(1 \times 2^4) + (0 \times 2^3) + (1 \times 2^2) + (1 \


times 2^1) + (0 \times 2^0) = 16 + 0 + 4 + 2 + 0 = 22

Final Answer: 10110 = 22

Binary Subtraction Example: Calculate 13 - 9

 Represent 13 and 9 in 5-bit binary:


o 13 → 01101
o 9 → 01001
 Find two’s complement of 9 (subtrahend):
o Invert bits of 01001 → 10110
o Add 1 → 10110 + 1 = 10111
 Add 13 + two’s complement of 9:
Bit Position 4 3 2 1 0
13 (01101) 0 1 1 0 1
-9 (10111) 1 0 1 1 1
Sum 1 0 1 0 0

 Ignore carry (leftmost 1).


 Result = 0100 = 4 decimal (correct).

2. Booth’s Multiplication Algorithm — Step-by-Step


Multiply 3 (0011) × -4 (1100) using 4-bit registers.

Step A (4 bits) Q (4 bits) Q-1 Operation Description

Init 0000 0011 0 — A=0, Q=3 (multiplier), Q-1=0

1 0000 0011 0 Q0Q-1=1 0 → A = A + M Add multiplicand (M = -4 = 1100)

1100 0011 0 Arithmetic right shift (ARS) A,Q,Q-1 = 1110 1001 1

2 1110 1001 1 Q0Q-1=1 1 → No operation ARS: 1111 0100 1

3 1111 0100 1 Q0Q-1=0 1 → A = A - M Subtract multiplicand

0011 0100 1 ARS: 0001 1010 0

4 0001 1010 0 Q0Q-1=0 0 → No operation ARS: 0000 1101 0

Final product: Combine A and Q → 00001101 = 13 decimal (expected product: 3 × -4 = -12)

Note: Since Booth’s algorithm handles signed numbers, the final 8-bit result is in two’s complement
representing -12.

3. Restoring Division Algorithm — Step-by-Step


Divide 13 by 3:

 Dividend: 1101 (13 decimal)


 Divisor: 0011 (3 decimal)
 Initialize:
o Quotient (Q) = 0000
o Remainder (R) = 0000
Iteration 1:

 Shift left (R,Q) ← (0000, 1101) → (0001, 1010)


 Subtract divisor: R = 0001 - 0011 = negative → restore R by adding divisor back → R = 0001
 Set Q0 = 0 → Quotient = 0000

Iteration 2:

 Shift left (R,Q) ← (0001, 1010) → (0011, 0100)


 Subtract divisor: R = 0011 - 0011 = 0000 (≥ 0)
 Set Q0 = 1 → Quotient = 0001

Iteration 3:

 Shift left (R,Q) ← (0000, 0100) → (0000, 1000)


 Subtract divisor: R = 0000 - 0011 = negative → restore R → R = 0000
 Set Q0 = 0 → Quotient = 0010

Iteration 4:

 Shift left (R,Q) ← (0000, 1000) → (0001, 0000)


 Subtract divisor: R = 0001 - 0011 = negative → restore R → R = 0001
 Set Q0 = 0 → Quotient = 0100

Final quotient = 0100 (4 decimal), remainder = 0001 (1 decimal)

4. Floating Point Addition — Step-by-Step

Add 6.5 + 2.25

Step 1: Represent numbers in binary floating form

Number Binary Normalized Form Exponent (base 2)

6.5 110.1 1.101 × 2^2 2

2.25 10.01 1.001 × 2^1 1

Step 2: Equalize exponents

Shift mantissa of smaller exponent (2.25) right by 1 bit (difference 1):

 1.001 → 0.1001

Step 3: Add mantissas

1.101 (6.5 mantissa)+0.1001 (2.25 mantissa shifted) =10.0011


Step 4: Normalize result

 10.0011 = 1.00011 × 2^3 (shift mantissa right 1, increase exponent by 1)

Step 5: Result = 1.00011 × 2^3 ≈ 8.0625 (decimal)

5. Block Diagram of Arithmetic Unit

+-----------------------+
| Control Unit |
+-----------------------+
|
v
+---------------------+
| Instruction Decoder |
+---------------------+
|
v
+---------------------+
| Multiplexer |<--- Inputs from registers
+---------------------+
|
v
+---------------------+
| Arithmetic Logic Unit|
| (ALU) |
+---------------------+
|
v
+---------------------+
| Result Register |
+---------------------+

 Control Unit: Provides signals to select operations (add, subtract, multiply, divide).
 Instruction Decoder: Decodes instruction to identify required arithmetic operation.
 Multiplexer: Selects which operands are fed to the ALU.
 ALU: Performs actual arithmetic and logic operations.
 Result Register: Stores the output/result of the ALU operation.

UNIT -3
I/O Organization - Detailed Student Notes

1. I/O Interface

Definition:

An I/O interface is a communication bridge between the CPU and I/O devices. It enables the system to
send/receive data, control signals, and status signals to/from devices.

An Input/Output (I/O) Interface acts as a mediator between the CPU and I/O devices such as keyboards,
printers, monitors, etc. It facilitates data transfer, status monitoring, and control signal exchange between
the internal system (CPU/memory) and external hardware.

Why Is It Needed?

 CPU and I/O devices differ in data formats, timing, and control methods.
 Devices may work at slower speeds than the CPU.
 I/O devices may use analog signals, while the CPU understands digital data.
 Coordination is needed for data integrity, device selection, and error detection.

Main Functions of I/O Interface

Functions:

 Communication Handling: Facilitates data exchange between CPU and I/O device.
 Signal Conversion: Converts between digital signals and analog voltages as needed.
 Speed Matching: Buffers are used to accommodate the speed mismatch between CPU and I/O.
 Control Signaling: Manages read/write operations, command triggers, and device acknowledgments.
 Error Detection: Detects parity errors, transmission errors, and device malfunctions.

Function Description

Communication Handling Manages the actual data transfer between CPU and I/O devices.

Signal Conversion Converts signal formats (analog to digital or vice versa).

Speed Matching Uses buffers or FIFO to synchronize fast CPUs and slow I/O devices.

Control Signaling Generates necessary signals like READ, WRITE, ACK, etc.

Error Detection Detects transmission faults using parity checks, CRC, etc.

Core Components of an I/O Interface:


Component Description
Data Register Holds data temporarily during input/output operations.
Control Register Holds the control word that determines the operation (e.g., READ/WRITE).
Status Register Indicates device conditions: ready, error, busy, etc.
Address Decoder Identifies the I/O device selected using address lines.

Diagram: I/O Interface Block Diagram

CPU
|
-----------------
| Control Bus |<----------------------> Control Register
| Address Bus |<----------------------> Address Decoder
| Data Bus |<----------------------> Data Register
-----------------
|
[ I/O Interface ]
|
[ I/O Device ]

 CPU sends control signals, address, and data.


 Address Decoder identifies which device to communicate with.
 Control Register interprets instructions (READ/WRITE).
 Data Register is used for actual data transmission.
 Status Register signals when the device is ready, busy, or in error.

Working Example: Keyboard I/O Interface

Operation:

1. Key is pressed on keyboard.


2. Keyboard controller generates scan code.
3. Data is loaded into the data register of the interface.
4. Status register is updated to indicate “data ready”.
5. CPU polls the status register.
6. If ready, CPU reads the data from the data register.
7. Control register is used to reset or re-enable the device.
Block Diagram:
Keyboard
|
Scan Code

[Data Register] ---> CPU (Reads data)

[Status Register] ---> CPU (Checks ready flag)

[Control Register] <--- CPU (Send commands)

Types of I/O Interface Techniques:

Technique Description
Programmed I/O CPU controls all data transfer. Polling-based.
Interrupt-Driven I/O Device interrupts CPU when ready for communication.
DMA (Direct Memory Access) I/O controller transfers data directly to memory without CPU.

Error Detection Features:

 Parity Bits: Simple error checking.


 Checksums: More comprehensive error validation.
 Timeouts: Detects non-responsive devices.

2. Types of I/O Buses

What is an I/O Bus?

An I/O Bus is a communication pathway that connects the CPU and memory subsystem to input/output
devices. It carries data, addresses, and control signals between the components. Each I/O bus supports different
speeds, topologies, and device types.

A. PCI Bus (Peripheral Component Interconnect)

PCI is a high-speed parallel bus used for connecting internal hardware components to the motherboard.

Key Features:

 Bus Width: 32 or 64 bits.


 Clock Speed: Typically 33 MHz or 66 MHz.
 Plug-and-Play: Devices auto-configure using BIOS/OS support.
 Bus Mastering: Devices can directly initiate transactions.
 Shared Bus: Multiple devices share the same bus but use individual configuration space.
Textual Diagram:
[CPU]
|
[Northbridge / Chipset]
|
[PCI Bus]---[Sound Card]
├──[Network Interface Card (NIC)]
└──[Modem]

Real-World Example:

 A user adds a Gigabit Ethernet NIC to a PC using a PCI slot on the motherboard. The OS
automatically detects and configures the device using plug-and-play.

B. SCSI Bus (Small Computer System Interface)

SCSI is a parallel bus architecture designed for high-speed and reliable connection of multiple peripherals,
commonly in enterprise settings.

Key Features:

 Parallel Communication: Sends multiple bits simultaneously.


 Command-Based: Uses protocols like INITIATE, EXECUTE, COMPLETE.
 Daisy-Chained Topology: Devices are linked one after the other.
 Device ID Assignment: Each device gets a unique SCSI ID.
 Device Limit: Typically up to 8 (Narrow SCSI) or 16 (Wide SCSI) devices.

Textual Diagram:
[Host Adapter]
|
[SCSI Bus]---[HDD 1]---[HDD 2]---[Tape Drive]---[CD-ROM]
Real-World Example:

 A file server uses SCSI to connect to multiple hard drives and a backup tape drive, enabling fast,
simultaneous data transactions for enterprise operations.

C. USB (Universal Serial Bus)

USB is a serial communication standard developed to simplify and unify peripheral connections for short-
distance communication.

Key Features:

 Serial Bus: Transfers data one bit at a time.


 Topology: Tiered-Star using hubs.
 Plug-and-Play & Hot-Swapping: Auto-recognition and no reboot needed for device connections.
 Power Delivery: Supplies 5V power along with data (for small devices).
 Backward Compatibility: Each new version supports older ones.
 Transfer Types:
o Control
o Isochronous
o Bulk
o Interrupt

USB Versions and Speeds:


USB Version Speed
USB 1.1 12 Mbps
USB 2.0 480 Mbps
USB 3.0 5 Gbps
USB 3.1 10 Gbps
USB 3.2 20 Gbps
USB4 40 Gbps

Textual Diagram (Tiered-Star Topology):


[CPU]
|
[USB Controller]
|
[USB Hub]
/ | \
[Mouse] [Keyboard] [Pen Drive]

Real-World Example:

 A USB flash drive is inserted into a laptop. It draws power and communicates with the CPU for data
transfer. No restart is needed due to hot-plugging support.
Comparison Table: PCI vs SCSI vs USB

Feature PCI SCSI USB

Type Parallel Parallel Serial

Bus Width 32 / 64 bits 8 / 16 bits 1 bit

Speed (Typical) Up to 533 MB/s Up to 320 MB/s Up to 40 Gbps (USB4)

Devices Supported Limited by slots 8–16 per controller 127 per host controller

Topology Shared bus Daisy chain Tiered-star

Hot Plug Support No No Yes

Power Delivery No No Yes

Usage Area Internal desktop cards Enterprise storage, servers General-purpose devices

3. Data Transfer Modes


Data transfer is the method by which digital data moves between devices or components. It is essential for
communication between CPU, memory, and I/O devices. The transfer mode chosen affects speed,
accuracy, and distance of data communication.

a. Serial Data Transfer:

In serial transfer, data bits are sent one after another over a single wire/channel. One bit is transferred at a
time over a single wire or channel.

Advantages:

 Requires fewer wires → Low cost and simple design.


 Better suited for long-distance communication.
 Less chance of crosstalk and signal skew.

Limitations:

 Slower than parallel transfer for large data volumes.

Example:

 RS-232 port used in older modems and terminals.

Textual Diagram:
Transmitter --[bit1]--[bit2]--[bit3]--> Receiver
(1 wire/channel)
b. Parallel Data Transfer

In parallel transfer, multiple bits (usually 8, 16, or 32) are transmitted simultaneously using separate
[Link] all bits of data (usually 8/16/32) simultaneously.

Advantages:

 High-speed transfer.
 Effective for short distances (within PC boards).

Limitations:

 Signal degradation, skew, and interference in longer cables.


 More expensive due to multiple wires.

Example:

 Data transfer between CPU and RAM using a system bus.

Textual Diagram:

[bit1] [bit2] [bit3] ... [bit8]


| | | |
--------------------------
Parallel Bus

c. Synchronous Data Transfer

Definition:

In synchronous transfer, the sender and receiver share a common clock signal and send data in lockstep
with clock pulses. Sender and receiver operate in lockstep using a shared clock.

Advantages:

 Fast and efficient due to continuous transfer.


 No overhead of start/stop bits.

Drawbacks:

 Requires precise clock synchronization hardware.


 Sensitive to timing mismatches.

Example:

 RAM and CPU use synchronous communication.


Textual Diagram:
Clock: |---|---|---|---|---
Data: | D1| D2| D3| D4| D5|
Sender ↔ Receiver (Using shared clock)

d. Asynchronous Data Transfer

Definition:

In asynchronous transfer, each data unit (typically 1 byte) is sent independently, accompanied by start and
stop bits for synchronization. Each byte of data is self-contained with start and stop bits.

Advantages:

 Simple and cost-effective.


 Works well with intermittent data transmission.

Drawbacks:

 Slower due to overhead from start and stop bits.


 Inefficient for high-speed continuous transfers.

Example:

 Keyboard to CPU using UART (Universal Asynchronous Receiver-Transmitter).

Textual Diagram:
[Start] [Data: 8 bits] [Stop]
| | |
Send →→→→→→→→→→→→→→→→→→→→ Receive

Comparison Table:

Feature Serial Parallel Transfer Synchronous Asynchronous


Transfer Transfer Transfer
Transfer Path One bit per line Multiple bits in One bit per clock pulse Byte with start/stop bits
parallel
Speed Moderate Fast (short distances) Fastest Slower

Distance Suitability Long distance Short distance Short (high sync Medium (simple
needed) devices)
Hardware Complexity Low High Medium-High Low

Example Device RS-232 modem CPU–RAM DDR RAM–CPU Keyboard via UART
4. Direct Memory Access (DMA)

Direct Memory Access (DMA) is a method that allows I/O devices to transfer data directly to/from
memory without continuous CPU involvement. It significantly improves system performance by freeing the
CPU from routine data movement tasks.

Components of a DMA System:

Component Description
DMA Controller A dedicated hardware unit or chip that manages DMA operations.

Control Logic Coordinates timing, signaling, and bus arbitration.


Address Register Holds the memory address where data is read/written.

Byte Count Register Indicates the number of bytes to be transferred.


Control Register Holds mode flags (read/write, priority, interrupt settings, etc.).

Working of DMA (Step-by-Step):

1. DMA Request (DREQ):


The I/O device sends a request to the DMA controller to initiate a transfer.
2. Bus Request (BR):
The DMA controller sends a Bus Request signal to the CPU to gain control of the system bus.
3. Bus Grant (BG):
The CPU finishes its current bus cycle and responds with a Bus Grant signal, temporarily releasing the
bus.
4. Data Transfer:
The DMA controller takes control and transfers data directly between memory and device.
5. Interrupt to CPU:
After the transfer is complete, the DMA controller sends an interrupt to notify the CPU that the task is
done.

Textual Diagram:

[Device] -- (DREQ) --> [DMA Controller] -- (BR) --> [CPU]


<--(BG)--
[Device] <--> [DMA Controller] <--> [Main Memory]
||
[System Bus Control]
DMA Transfer Modes:

Mode Description Example Use Case

Burst Mode Entire block of data is transferred in one go without High-speed transfers like disk-to-
CPU access. RAM

Cycle Stealing DMA takes one bus cycle at a time, interleaved with Printer or sound card data output
CPU access.

Transparent DMA only transfers when CPU is idle, completely Background data loading in video
Mode non-intrusive. playback

Example:

In video rendering, high-resolution image files are fetched from the SSD into RAM using Burst Mode DMA,
allowing the CPU to focus solely on rendering logic and UI response.

Comparison: CPU-Controlled I/O vs. DMA:

Feature CPU-Controlled I/O DMA-Based I/O

Who handles transfer? CPU DMA controller

CPU involvement Fully engaged Minimal

Transfer speed Slower Faster

Efficiency Low (CPU cycles wasted) High (CPU free for other tasks)

Interrupt generation Frequent One-time (at end)

Advantages of DMA:

 Faster data transfer without CPU overhead


 Frees CPU for other operations
 Reduces interrupt frequency
 Ideal for large and repetitive data transfers
5. I/O Processor (IOP)

An IOP is a dedicated processor that manages I/O operations, allowing the CPU to execute only computation
instructions. An I/O Processor (IOP) is a dedicated processing unit designed specifically to manage I/O
device operations. Unlike the CPU, which is optimized for logic and arithmetic, the IOP focuses on
controlling peripherals and handling I/O data traffic.

Structure of an IOP:

Component Function

I/O Instruction Set Executes special I/O-specific commands like device select, read, write.

Internal Memory Holds temporary data, status info, or buffered I/O tasks.

Communication Bus Facilitates interaction between IOP, I/O devices, and system memory.

How IOP Works (Steps):

1. CPU Delegation:
The CPU writes the I/O instructions or a small I/O program to main memory.
2. IOP Fetch:
The IOP reads these instructions independently of the CPU.
3. Execution:
IOP executes the I/O task (e.g., read from disk, write to printer).
4. Interrupt Generation:
After completing the task, the IOP generates an interrupt to inform the CPU.

Advantages of Using an IOP:

 Offloads CPU: Frees up CPU cycles for program execution.


 Efficient Parallelism: CPU and IOP can operate concurrently.
 Improved Performance: Especially in systems with high I/O activity.
 Dedicated Control: Handles complex I/O without frequent CPU interruptions.

Real-Life Example:

IBM System/370 used multiple IOPs to handle:

 Card Readers
 Magnetic Tape Drives
 Printers
These were managed entirely independent of the main CPU, allowing the system to support multiple
I/O tasks simultaneously.

Complete Summary Table: I/O System Components

Component Description

I/O Interface Connects CPU with I/O devices. Handles data, control, and status signals.

PCI Bus High-speed parallel bus used for internal components like NICs, GPUs.

SCSI Bus Command-based parallel bus for daisy-chained high-speed devices.

USB Universal serial bus supporting hot-plug and device power delivery.

Serial Transfer Transfers 1 bit at a time → cost-efficient, suitable for long distances.

Parallel Transfer Transfers multiple bits at once → faster, but limited to short distances.

Synchronous Transfer Uses shared clock → high-speed but requires sync hardware.

Asynchronous Transfer No shared clock → uses start/stop bits, simpler but slower.

DMA Peripheral directly transfers data to memory without CPU → boosts speed.

IOP A dedicated processor managing I/O tasks → improves multitasking.


UNIT -4

Memory Organization –

1. Main Memory

.Main memory is a critical component of a computer's architecture. It is the primary working memory that
stores both data and instructions that the CPU needs for processing. It is much faster than secondary storage but
slower than CPU registers or cache.

Random Access Memory (RAM)

RAM is a volatile memory, meaning that it loses its contents when the power is turned off. It allows both read
and write operations and acts as the working area for the CPU during program execution.
RAM is a volatile memory used to store data and programs that are currently being used by the CPU. It allows
both read and write operations and loses all stored data when power is turned off. RAM is divided into Static
RAM (SRAM), which is faster but costlier, and Dynamic RAM (DRAM), which is slower but cheaper and used
in main system memory.

Characteristics:

 High-speed memory located near the CPU.


 Direct access to any memory location (hence "random").
 Temporarily stores data and instructions that are currently in use.
 Faster than secondary storage but slower than cache and registers.
 Volatile: data is lost when the system shuts down.

Types of RAM:

1. Static RAM (SRAM):


o Uses flip-flop circuits to store each bit.
o Faster access time compared to DRAM.
o More expensive and consumes more power.
o Used in cache memory (L1, L2).
o Does not need refreshing like DRAM.

Structure Diagram (Textual):

+-----------+ +-----------+
| Bit Cell | ---- | Flip-Flop |
+-----------+ +-----------+
2. Dynamic RAM (DRAM):
o Stores each bit as a charge in a capacitor.
o Slower than SRAM and requires refresh cycles to maintain data.
o Denser and cheaper than SRAM.
o Used in main system memory.
o Typically found in DDR, DDR2, DDR3, DDR4, and DDR5 modules.

Working Note:
Each memory cell must be refreshed thousands of times per second, as capacitors leak charge.

Read-Only Memory (ROM)

ROM is a non-volatile memory, meaning it retains its data even when the power is off. It is read-only under
normal operation and is used to store firmware — the essential software required to boot and initialize
hardware.
ROM is a non-volatile memory that stores critical startup instructions and firmware. Unlike RAM, it can only
be read and retains its contents even after the power is off. Variants include PROM, EPROM, and EEPROM
which allow reprogramming under specific conditions

Characteristics:

 Contents are written once (during manufacturing or programming) and then read-only.
 Not used for general storage.
 Typically stores BIOS or bootloader code.
 No data loss during shutdown.
 Cannot be used to store dynamic programs or data.

Types of ROM:
Type Description

PROM Programmable ROM – can be written once using a special device (PROM burner).

EPROM Erasable PROM – can be erased by exposing to UV light, then reprogrammed.

EEPROM Electrically Erasable PROM – erased and written using electrical signals, used in BIOS.
Comparison Between RAM and ROM:

Feature RAM ROM

Volatility Volatile (data lost on power off) Non-volatile (data retained)

Accessibility Read/Write Read-only (or limited write)

Usage Main memory during program execution Boot instructions, firmware

Speed Fast (DRAM slower than SRAM) Slower than RAM

Modifiability Can be modified easily Modification is limited or not allowed

Real-Life Use Cases:

 RAM: Running a browser, editing documents, loading software applications.


 ROM: Storing BIOS in computers, firmware in microwaves, printers, washing machines.

2. Secondary Memory

Secondary memory refers to non-volatile, long-term storage devices. It is used for storing data
[Link] memory refers to non-volatile storage that holds data permanently, even when the
computer is powered off. It is used to store the operating system, software programs, files, documents, and
media. Unlike primary memory (RAM), it is not directly accessed by the CPU. Data must be transferred to
RAM before the CPU can process it.

Key Characteristics:

 Non-volatile: Retains data without power.


 High capacity: Can store terabytes (TB) of data.
 Slower than RAM: But much cheaper.
 Permanent storage: Used for data archiving, backups, and regular storage.
 Portable options: Some types can be removed and transported (e.g., USB drives, DVDs).

Types of Secondary Memory

 Magnetic Tape: Sequential-access storage used mostly for backups and archiving. It is cheap and can
store large amounts of data but is very slow.
 Magnetic Disk (Hard Disk Drives): Uses spinning disks coated with magnetic material to store data in
concentric tracks. It allows random access and is used widely for general data storage.
 Optical Storage (CD/DVD/Blu-ray): Data is stored using laser technology. These are portable, have
moderate capacity, and are used for distribution of software, movies, etc.
1. Magnetic Tape

Magnetic tape is one of the oldest forms of secondary storage, used mainly for sequential data access. It stores
data on a plastic ribbon coated with magnetic material.

Working Principle:

 Data is written in linear order along the tape.


 To read data, the tape must be moved to the correct position (slow).
 It uses sequential access, unlike disks that offer random access.

Advantages:

 Very cheap per GB.


 Excellent for long-term backup.
 High storage capacity (often used for server backups).

Limitations:

 Very slow access time.


 Requires special drives (tape readers).
 Not ideal for everyday use or frequent access.

Real-Life Use:

 Archiving enterprise databases.


 Government and research backups.

2 . Magnetic Disk – Hard Disk Drive (HDD)

A hard disk drive is a widely used random-access secondary storage device. It stores data on a stack of
rotating disks (platters) coated with magnetic material.

Working Principle:

 A read/write head moves across spinning platters to read or write data.


 Data is stored in concentric tracks, divided into sectors.

Advantages:

 High-speed data access (compared to tapes).


 Large storage capacity (from 500GB to several TBs).
 Reliable and durable for long-term use.
Limitations:

 Mechanical parts are prone to wear and tear.


 Slower than SSDs (Solid State Drives).
 Can be damaged by physical shocks.

3 Optical Storage (CD/DVD/Blu-ray)

Optical storage uses laser technology to read and write data. Data is encoded on a reflective disc surface using
pits (low) and lands (high).

Working Principle:

 A laser beam reflects off the disc surface and is interpreted as binary data.
 CD stores about 700 MB, DVD about 4.7–9.4 GB, and Blu-ray up to 25–50 GB or more.

Advantages:

 Portable and inexpensive.


 Great for media distribution (music, movies).
 Long shelf life if stored properly.

Limitations:

 Limited storage capacity.


 Slower than HDD and SSD.
 Easily scratched or damaged.

Comparison Table

Feature Magnetic Tape Magnetic Disk (HDD) Optical Disk

Access Type Sequential Random Sequential/Random

Speed Very slow Moderate Moderate

Cost per GB Very low Low Moderate

Storage Capacity Very High (TBs) High (up to 20TB) Low to Moderate

Portability Low Medium High

Common Use Backups Everyday storage Software/media distro

3. Cache Memory

Cache memory is a small-sized, high-speed memory unit that lies between the CPU and the main memory
(RAM). Its primary purpose is to reduce the time the CPU takes to access data from the main memory by
storing frequently accessed data and instructions. Cache memory is a small, high-speed memory located
close to the CPU to reduce data access time

Structure and Design of Cache Memory

 Divided into multiple levels (L1, L2, L3) with L1 being fastest and smallest.
 Stores frequently used instructions and data for quick access.

Levels of Cache

Cache is typically organized into multiple levels:

 L1 (Level 1) Cache:
o Closest to the CPU.
o Very small (typically 32KB–128KB).
o Very fast access.
 L2 (Level 2) Cache:
o Larger than L1 (256KB–1MB).
o Slower than L1 but still faster than RAM.
 L3 (Level 3) Cache:
o Shared among CPU cores.
o Larger (2MB–64MB).
o Slower than L2 but still faster than RAM.

Cache Structure Diagram

+------------------+
| CPU |
+-----------------+
|
+----------v----------+
| L1 Cache | -- - Smallest and fastest
+-------------------+
|
+----------v----------+
| L2 Cache | --- Larger and slightly slower
+--------------------+
|
+----------v----------+
| L3 Cache | --- Shared and even larger
+---------------------+
|
+----------v----------+
| Main Memory |
+---------------------+

Each level acts as a buffer for the next, storing recently accessed memory blocks.
Mapping Schemes in Cache Memory

Mapping defines how memory blocks from the main memory are placed into cache.

 Direct Mapping: Each block of main memory maps to exactly one cache line.
 Associative Mapping: Any block can be placed in any cache line. Flexible but expensive.
 Set-Associative Mapping: A compromise where blocks are mapped to a set of lines.

1. Direct Mapping

 Each memory block is mapped to a specific cache line.


 Simple and fast.
 Problem: If multiple blocks map to the same line, frequent replacement occurs.

Formula:

Cache Line Number = (Main Memory Block Number) % (Number of Cache Lines)

Example:
If block 10 and block 18 both map to line 2, one must be replaced when the other is accessed.

2. Associative Mapping

 Any memory block can be placed anywhere in the cache.


 Very flexible, but requires complex hardware to search all cache lines in parallel.

3. Set-Associative Mapping

 A compromise between direct and associative mapping.


 Cache is divided into sets, and each block maps to a specific set, but can go in any line within that set.

Example: 4-way set-associative cache → each set has 4 lines.

Cache Replacement Algorithms

When the cache is full, and a new block needs to be loaded, one of the existing blocks must be replaced.

 LRU (Least Recently Used): Replaces the block that hasn't been used for the longest time.
 FIFO (First In First Out): Replaces the oldest block in cache.
 Random: Randomly replaces a block, used for simplicity.

1. LRU (Least Recently Used)

 Replaces the block not used for the longest time.


 Keeps track of access history (complex in hardware).

2. FIFO (First In First Out)

 Replaces the oldest block in the cache.


 Simple and easy to implement.

3. Random

 Chooses a block randomly to replace.


 Simplest, but not optimal in performance.

Improving Cache Performance

To enhance cache efficiency and reduce CPU wait time:

 Use of prefetching.
 Increasing cache size.
 Using write-back instead of write-through policies.
 Employing multi-level caches.

1. Prefetching: Load data into cache before it is actually needed.


2. Increasing Cache Size: More space reduces misses but increases cost and access time.
3. Write-Back vs Write-Through:
o Write-Through: Updates both cache and memory (slower but safer).
o Write-Back: Updates only cache and writes to memory later (faster but riskier).
4. Using Multi-Level Cache: L1 for speed, L2/L3 for size – balances speed and capacity.

4. Virtual Memory

Virtual memory is a memory management technique where secondary storage is used to extend available RAM.
It allows programs to use more memory than what is physically available.

Virtual memory is a memory management technique that gives an application the illusion of a large,
continuous memory space, even if the physical RAM is smaller. It uses secondary storage (like a hard disk
or SSD) as an extension of RAM.

How It Works:

 Programs are written as if they have access to a large, continuous block of memory.
 The operating system (OS) and hardware manage the translation between virtual addresses (used by
programs) and physical addresses (actual RAM).
 The extra space required is taken from secondary storage (called the swap space or page file).

Key Concepts:
 Divides memory into pages.
 Uses paging or segmentation techniques.
 Page Table maps virtual addresses to physical addresses.
 Page Faults occur when data is not in RAM and must be fetched from disk.

Advantages include better multitasking, isolation between programs, and execution of large programs.

1. Paging

 Memory is divided into fixed-size pages (commonly 4 KB).


 Virtual memory is divided into virtual pages, and physical memory into frames.
 A page table is used to map virtual pages to physical frames.

2. Segmentation

 Divides memory into variable-size segments based on logical divisions (code, data, stack).
 Less common than paging, but sometimes combined with it (segmented paging).

3. Page Table

 Maintains a mapping between virtual pages and physical frames.


 Also includes flags like:
o Valid/Invalid Bit
o Access permissions
o Modified bit

4. Page Fault

 Occurs when a program accesses a page not currently in RAM.


 The OS pauses execution, fetches the page from the disk, updates the page table, and resumes
execution.

5. Memory Management Hardware

Memory management hardware assists in address translation and [Link] management hardware
provides the necessary mechanisms to translate addresses, protect memory, and ensure efficient allocation
of RAM. It supports features like virtual memory, process isolation, and address translation.

 MMU (Memory Management Unit): Translates virtual addresses to physical addresses.


 TLB (Translation Lookaside Buffer): A cache for storing recent address translations.
 Base and Limit Registers: Protect memory by ensuring programs only access their own memory space.

Page Table: Maps virtual to physical memory

1. MMU (Memory Management Unit)

Function:
 Converts virtual addresses (used by programs) into physical addresses (used by RAM).
 Works closely with the CPU and page table.

Location:

 Typically integrated within the CPU chip.

How it works:

 When the CPU generates a virtual address, the MMU uses the page table and possibly the TLB to
translate it.

2. TLB (Translation Lookaside Buffer)

Function:

 A special cache that stores recent virtual-to-physical address translations.


 Speeds up address translation by avoiding full page table lookups.

Working:

 When a virtual address is referenced:


o MMU checks the TLB first.
o If TLB has the translation (TLB hit), use it.
o If not (TLB miss), fetch from page table and update the TLB.

3. Base and Limit Registers

Function:

 Provide memory protection by ensuring that a process can access only its own allocated memory.

How they work:

 Base Register: Holds the starting address of a program’s memory.


 Limit Register: Specifies the size of the addressable memory range.
 If a process tries to access memory outside this range, a trap or interrupt occurs.

4. Page Table

Function:

 Used by the MMU to map virtual pages to physical frames.


 Each entry holds:
o Frame number
o Valid/invalid bit
UNIT - 5

1. Characteristics of Multiprocessor Systems

Multiprocessor systems have two or more CPUs that share a common physical memory and are interconnected.
Key characteristics:

 Increased Throughput: Multiple processors can handle more processes simultaneously.


 Fault Tolerance: If one processor fails, others can continue execution.
 Resource Sharing: Processors share memory, I/O devices, and interconnection.
 Scalability: Additional processors can be added to enhance performance.
 Parallel Processing: Tasks can be divided among processors for faster execution.

Characteristics of Multiprocessor Systems

Multiprocessor systems consist of two or more processors (CPUs) working together in a tightly coupled
architecture. These processors are connected through a common system bus and share resources such as main
memory, I/O devices, and peripheral hardware.

Each processor may be assigned different tasks, or they may work on the same task in parallel, depending on
the system design. Below are the key characteristics elaborated in detail:

1. Increased Throughput

Throughput refers to the number of tasks a system can complete in a given time.

 In a multiprocessor system, multiple processors work simultaneously, so more instructions can be


executed in parallel.
 It reduces the processing time for large tasks, especially those that can be divided into smaller sub-tasks
(parallelizable).
 Ideal for multi-user environments like servers or scientific computations.

Example: In a quad-core processor, four independent threads or applications can run concurrently, improving
performance for multitasking and background processes.

2. Fault Tolerance / Reliability

Fault Tolerance means the system's ability to continue functioning even if one or more components fail.

 If one processor fails, the remaining processors can take over its workload (load redistribution), ensuring
system continuity.
 This redundancy increases system reliability and is essential in critical applications (e.g., aerospace,
defense, and banking).
 Software mechanisms (like failover routines) and hardware mechanisms (like redundant paths) are often
implemented to handle such failures.

Example: In a server with two processors, if one fails, the other can continue running critical services with
minimal downtime.

3. Resource Sharing

Processors in a multiprocessor system share hardware resources, particularly:

 Main Memory (RAM): Accessible by all processors, ensuring shared data visibility.
 I/O Devices: Like printers, disk drives, and network interfaces.
 Interconnection Network: The bus or crossbar switch that links the components.

This sharing leads to efficient resource utilization, but it also requires synchronization mechanisms (like
semaphores or mutexes) to prevent conflicts or inconsistencies during access.

Note: While shared resources can increase complexity (like cache coherency issues), modern systems manage it
efficiently through dedicated hardware and protocols.

4. Scalability

Scalability refers to the system's ability to grow or expand by adding more processors without degrading
performance significantly.

 Multiprocessor systems can be scaled up to increase performance when workload increases.


 Hardware and software must support dynamic addition/removal of processors.
 High scalability is important for cloud computing, enterprise servers, and scientific applications.

Challenge: As more processors are added, managing resource contention, bus traffic, and synchronization
overhead becomes more complex.

Solution: Use of advanced interconnects (e.g., NUMA, crossbar switches) and scalable software architecture.

5. Parallel Processing

Parallel Processing is the simultaneous execution of multiple tasks to reduce the overall processing time.

 Tasks can be broken down into subtasks that execute concurrently on multiple processors.
 There are different types of parallelism:
o Data Parallelism: Same operation on different data (e.g., array processing).
o Task Parallelism: Different operations on different or shared data.
 Leads to substantial performance improvements for large-scale problems such as simulations, machine
learning, and video rendering.

Example: A video editing software can render different sections of a video simultaneously using multiple
processors.

Additional Advantages of Multiprocessor Systems:

 Better Utilization of Hardware: Idle CPU cycles are minimized.


 Concurrent Execution: Enhances responsiveness in multi-user and real-time systems.
 Reduced Cost per CPU: Compared to using multiple single-CPU systems.
 Support for Multithreaded Applications: Threads can run on separate processors for true
concurrency.

2. Structure of Multiprocessor

a. Interprocessor Arbitration

It refers to the mechanism by which multiple processors coordinate their access to shared resources (like
memory or bus):

 Bus Arbitration: Determines which processor controls the system bus.


o Centralized Arbitration: A single arbiter decides access.
o Distributed Arbitration: All processors participate in decision-making.

b. Interprocessor Communication

Processors must communicate for coordination and data sharing:

 Shared Memory Communication: Processors communicate by reading/writing in shared memory.


 Message Passing: Processors send and receive messages (often used in loosely coupled systems).
 Interrupts & Signals: Used to notify processors about events or synchronize actions.

c. Synchronization

Ensures that concurrent processes/threads do not interfere with each other while accessing shared resources:

 Semaphores: Variables used to control access.


 Locks/Mutex: Used to protect critical sections.
 Barriers: Force all processors to reach a point before continuing.

Multiprocessor systems have multiple CPUs that are interconnected and work together to execute programs. To
ensure efficient operation, the system must manage how processors access shared resources, communicate
with each other, and synchronize operations.

a. Interprocessor Arbitration
Interprocessor arbitration is the method used to manage access to shared resources (such as the system bus,
memory, or I/O devices) when multiple processors compete for control.

1. Bus Arbitration

The system bus is a critical shared resource. When more than one processor wants to use the bus (for memory
or I/O access), arbitration determines who gets control.

There are two main types of arbitration:

i. Centralized Arbitration

 A single control unit (arbiter) is responsible for deciding which processor gets bus access.
 Processors send a request signal to the arbiter.
 The arbiter grants access based on a predefined algorithm (priority-based, round-robin, etc.).

Advantages:

 Simple design
 Easy to manage

Disadvantages:

 Single point of failure


 Limited scalability

ii. Distributed Arbitration

 All processors participate in arbitration using a coordination protocol.


 Each processor has its own arbiter logic.
 Coordination is done without a central controller, often via polling or priority resolution.

Advantages:

 Better scalability
 No single point of failure

Disadvantages:

 More complex to design and synchronize

b. Interprocessor Communication
Definition:
To execute tasks cooperatively, processors must exchange data, control information, or synchronization
signals. Communication can occur in multiple ways:

1. Shared Memory Communication

 Processors communicate by reading and writing to shared areas in RAM.


 Easy to implement in tightly coupled systems (where processors share physical memory).
 Requires synchronization mechanisms to avoid data corruption.

Example: Two processors updating a shared variable must use locks to prevent race conditions.

2. Message Passing

 Data and control messages are explicitly sent and received between processors.
 Often used in loosely coupled systems (e.g., clusters, distributed systems).
 Each processor has a local memory, and messages are sent via interconnects like Ethernet or custom
buses.

Advantages:

 Clear data ownership


 Scales well for distributed systems

Disadvantages:

 Higher latency than shared memory


 Requires a communication protocol

3. Interrupts & Signals

 Interrupts allow one processor to notify another of an event or request attention.


 Signals are used to manage synchronization or trigger specific actions.

Example: CPU A sends an interrupt to CPU B to indicate that shared data is ready for use.

c. Synchronization

Definition:
Synchronization ensures that multiple processors do not interfere with each other when accessing shared data
or resources, maintaining consistency and correctness.
1. Semaphores

 A semaphore is a special variable used to manage concurrent access.


 It can be binary (0 or 1) or counting.
 Operated using wait() and signal() functions.

Example: Before a processor writes to shared memory, it calls wait(). After it's done, it calls signal() to allow
others access.

2. Locks / Mutex (Mutual Exclusion)

 A lock or mutex allows only one processor at a time to access a critical section of code or data.
 Other processors must wait until the lock is released.

Deadlock and priority inversion are common issues if not managed properly.

3. Barriers

 A barrier ensures that all processors or threads reach a certain point in execution before any can
proceed.
 Useful in parallel algorithms where phases must complete in sync.

Example: In matrix multiplication, all processors must complete one stage before proceeding to the next.

✅ Summary Table:

Component Description Example/Use Case


Bus Arbitration Manages access to system bus Centralized or distributed control
logic
Shared Memory Common area for all processors to read/write Tightly coupled systems
Message Passing Communication via messages, often in distributed MPI in HPC or inter-node
systems communication
Interrupts/Signals Notification mechanism between processors Task completion or alert signals
Semaphores Counters to manage access to shared resources Synchronizing access to shared
buffer
Locks/Mutex Exclusive control over critical section Protecting shared data structure
Barriers Force processors to sync at checkpoints Phase completion in parallel
computing
3. Memory in Multiprocessor Systems

Memory organization is crucial in multiprocessor systems:

 Shared Memory Architecture: All CPUs access a common physical memory.


 Distributed Memory: Each processor has its own local memory.
 NUMA (Non-Uniform Memory Access): Memory access time depends on the memory location
relative to the processor.

Cache Coherency:

 Ensures consistency of data stored in caches of multiple processors.


 Techniques include:
o Write-through / Write-back
o Snoopy Protocol
o Directory-based Protocol

In multiprocessor systems, the design of memory architecture is critical to ensure high performance, efficient
data sharing, and synchronization between processors. Depending on how memory is structured and accessed,
multiprocessor systems use different memory models.

A. Memory Architectures

1. Shared Memory Architecture

 In this model, all processors access a single, shared physical memory.


 It is typical in tightly coupled systems.
 Memory is globally addressable — any processor can read or write to any memory location.
 Synchronization is required to prevent data conflicts during concurrent access.

Advantages:

 Easy data sharing and programming model.


 Lower communication overhead compared to message passing.

Disadvantages:

 Memory contention: Multiple processors accessing memory simultaneously can cause bottlenecks.
 Cache coherence issues arise when each processor has a private cache.

Example: Multi-core processors in desktops and laptops.

2. Distributed Memory Architecture

 In this model, each processor has its own private memory.


 Processors cannot access each other’s memory directly; communication is done via message passing.
 This model is common in loosely coupled systems such as clusters or massively parallel systems.

Advantages:

 Scalable: Easy to add more processors.


 Less memory contention.

Disadvantages:

 Data sharing is complex due to message-passing overhead.


 Programmers must explicitly handle communication and synchronization.

Example: High-performance computing (HPC) systems using MPI (Message Passing Interface).

3. NUMA (Non-Uniform Memory Access)

 A hybrid model where processors have local memory but can also access shared global memory.
 Memory access time varies depending on whether the data is in local memory or remote memory.
 NUMA systems are designed to minimize access to remote memory and optimize local access.

Advantages:

 Combines scalability with a shared memory view.


 Reduces memory bottleneck seen in pure shared memory systems.

Disadvantages:

 Complex memory management.


 Requires NUMA-aware operating systems and applications to perform efficiently.

Example: Modern server-class systems (e.g., AMD EPYC and Intel Xeon).

B. Cache Coherency in Multiprocessor Systems

When multiple processors have private caches and access shared memory, inconsistencies can occur. For
example, if Processor A updates a variable in its cache, Processor B may still see an old value in its cache.

Cache Coherency ensures that all processors have a consistent view of shared memory.

Techniques to Maintain Cache Coherency

1. Write-through and Write-back

 Write-through: Data is written to both the cache and main memory simultaneously.
o Ensures consistency but generates more memory traffic.
 Write-back: Data is written only to the cache initially and later to main memory.
o Reduces memory traffic but needs coherence control to update other caches.

2. Snoopy Protocol

 All caches watch (snoop) the bus to monitor read/write operations by other processors.
 Used in systems with shared bus architecture.

Two types:

 Write-invalidate: When a processor writes to a cache line, it invalidates that line in other caches.
 Write-update: The new value is broadcast to all caches to update their copy.

Advantages:

 Fast and efficient for small-scale systems.

Disadvantages:

 Does not scale well to large systems due to bus saturation.

3. Directory-based Protocol

 Maintains a centralized or distributed directory that keeps track of which caches have copies of a
memory block.
 When a processor wants to read or write, it checks with the directory to maintain coherency.

Advantages:

 More scalable than snoopy protocols.


 Suitable for systems with large numbers of processors.

Disadvantages:

 Increases memory overhead due to directory storage.


 Slightly more complex communication logic.

Summary Table: Memory Architectures & Cache Coherency

Feature Shared Memory Distributed Memory NUMA


Memory Access Global Local only Local & remote
Communication Shared memory Message passing Mixed
Scalability Moderate High High
Programming Model Simple Complex Moderate
Cache Coherency Technique Description Suitable For

Write-through Write to cache and memory Simple systems

Write-back Write to cache, update memory later Needs coherence control

Snoopy Protocol Caches monitor bus for consistency Small-scale shared bus

Directory-based Protocol Centralized/directory tracks cache copies Large multiprocessor systems

4. Concept of Pipelining

Pipelining is a technique where multiple instruction phases are overlapped in execution:

 Stages of Pipeline:
o Instruction Fetch (IF)
o Instruction Decode (ID)
o Execute (EX)
o Memory Access (MEM)
o Write Back (WB)
 Types:
o Instruction Pipeline
o Arithmetic Pipeline
 Hazards:
o Structural: Resource conflicts.
o Data: Data dependency.
o Control: Branch instructions affect flow.

Pipelining is a technique in processor design that allows the overlapping execution of multiple instructions
to improve instruction throughput (number of instructions executed per unit time). It’s similar to an assembly
line in a factory — different stages of instruction execution are divided and handled simultaneously.

Basic Idea

Instead of executing one instruction at a time, pipelining breaks the instruction cycle into separate stages, each
of which is handled by a different unit of the processor. While one instruction is being decoded, another can be
fetched, a third executed, and so on.

A. Stages of Instruction Pipeline

A typical instruction pipeline consists of the following five stages:


Stage Name Description

1 Instruction Fetch (IF) Fetch the instruction from memory.

2 Instruction Decode (ID) Decode the fetched instruction to determine operation and operands.

3 Execute (EX) Perform arithmetic or logical operations in the ALU.

4 Memory Access (MEM) Read/write from/to memory if needed (for load/store).

5 Write Back (WB) Write results back to registers.

Each stage works in parallel with others, processing a different instruction during each clock cycle.

B. Types of Pipelining

1. Instruction Pipeline

 Focuses on speeding up the execution of multiple instructions.


 It handles the fetch → decode → execute → memory → write process.
 Widely used in general-purpose processors (like Intel, AMD CPUs).

2. Arithmetic Pipeline

 Optimizes the execution of complex arithmetic operations, such as multiplication, division, floating-
point calculations.
 Each stage of the pipeline handles part of the arithmetic operation.
 Used in vector processors, DSPs (Digital Signal Processors), and some GPU cores.

C. Pipeline Hazards

Pipelining can improve performance only when instructions flow smoothly through the pipeline. However,
certain issues (called hazards) can disrupt this flow.

1. Structural Hazards

 Occur when hardware resources are insufficient to handle all operations simultaneously.
 Example: If only one memory unit exists for both instruction fetch and data access, a conflict arises.

Solution: Use separate instruction and data caches (Harvard architecture).


2. Data Hazards

 Happen when instructions depend on the results of previous instructions that haven’t completed yet.
 Types of data hazards:
o RAW (Read After Write): Current instruction needs a value that’s still being computed.
o WAR (Write After Read) and WAW (Write After Write) (less common in simple pipelines).

Solution: Data forwarding (bypassing), inserting no-operation (NOP), or stalling the pipeline.

3. Control Hazards

 Occur due to branching instructions (e.g., if-else or loops), which can change the instruction flow.
 The next instruction to fetch may not be known until the branch decision is made.

Solution:

 Branch prediction: Guess the branch direction.


 Delayed branching: Reorder instructions.
 Pipeline flushing: Cancel incorrectly fetched instructions after the branch is resolved.

D. Performance of Pipelining

Element Description

Pipelining Overlapping execution of instruction stages

Stag IF → ID → EX → MEM → WB

Instruction Pipeline Speeds up general instruction execution

Arithmetic Pipeline Optimizes arithmetic/complex calculations

Structural Hazards Resource conflicts

Data Hazards Data dependencies between instructions

Control Hazards Branch instructions affecting instruction flow

Ideal Speedup:
If there are n stages in the pipeline, the ideal speedup is close to n.

However, due to pipeline hazards and overhead, real-world speedup is always less than ideal.

Example: A 5-stage pipeline may provide a 3.5–4x improvement instead of full 5x.
5. Vector Processing

Vector processing deals with operations on vectors (one-dimensional arrays):

 Executes the same instruction on multiple data elements.


 Used in scientific computations, simulations.
 Vector instructions reduce the instruction fetch and decode time.

Example: Adding two arrays A and B to get C:


C[i] = A[i] + B[i]

Vector processing is a computing technique in which a single instruction operates on multiple data
elements simultaneously — usually elements of arrays or vectors. This is part of the SIMD (Single
Instruction, Multiple Data) model, ideal for data-parallel operations.

Key Characteristics:

1. Executes the Same Instruction on Multiple Data Elements

 In vector processing, operations are performed in parallel across entire arrays of data.
 For example, instead of performing 100 separate additions for 100 elements, a vector processor can
perform them with one vector instruction.

2. Suitable for Scientific and Engineering Applications

 Vector processors are highly efficient in scientific computing, weather simulations, matrix
operations, image/video processing, and machine learning tasks.
 Such applications often involve large datasets and repetitive numerical operations.

3. Vector Instructions Improve Efficiency

 Traditional scalar processors fetch and decode one instruction per operation.
 Vector processors reduce instruction fetch and decode time by applying a single instruction to
multiple data elements.
 This minimizes control overhead and enhances throughput.

4. Specialized Hardware Support

 Vector processors use vector registers and vector functional units.


 These are capable of pipelining operations across vector elements, increasing parallelism and
performance.
Example: Vector Addition

Suppose you have three vectors:

A = [2, 4, 6]
B = [1, 3, 5]
C = [?, ?, ?]

Using vector processing:

C[i] = A[i] + B[i] for i = 0 to 2

Instead of using three individual addition instructions:

ADD A[0], B[0] → C[0]


ADD A[1], B[1] → C[1]
ADD A[2], B[2] → C[2]

The vector processor executes one vector instruction:

VADD A, B → C

This adds all corresponding elements in parallel, producing:

C = [3, 7, 11]

Advantages of Vector Processing

Feature Benefit
Parallel Data Processing Increases computational speed
Reduced Control Overhead Fewer instructions fetched/decoded
High Throughput Multiple operations completed in fewer clock cycles
Predictable Performance Suitable for structured, large-scale numerical tasks

Limitations

 Not efficient for irregular or non-uniform data structures.


 Requires data to be organized in vector form.
 Works best for applications with a high degree of data-level parallelism.

Comparison with Scalar Processing


Aspect Scalar Processing Vector Processing
Data Handling One element at a time Entire array at once
Instruction Count High Low
Performance Lower for bulk operations Higher for structured tasks
Complexity Simpler Requires specialized hardware
6. Array Processing

Array processors use multiple processing elements to perform parallel operations on data arrays:

 SIMD Architecture (Single Instruction, Multiple Data)


 Good for tasks like image processing, matrix operations.
 Each processing element performs the same operation on different data.

Array processing refers to a type of parallel computing where multiple processing elements (PEs)
simultaneously perform the same operation on different elements of a large data set, such as an array or matrix.

It follows the SIMD (Single Instruction, Multiple Data) architecture — ideal for highly regular and repetitive
computations.

Architecture: SIMD-Based Array Processors

 SIMD Architecture:
o One control unit sends the same instruction to many processing elements.
o Each PE executes the same instruction but on different data.
 The PEs may share common memory or have local memory.
 Communication between PEs is either direct (neighbor-based) or via interconnection networks.

How It Works:

Imagine you want to apply a filter to an image:

 Each pixel in the image can be processed independently.


 A SIMD array processor can assign each pixel to a separate PE, applying the same filter algorithm to
all pixels in parallel.

Characteristics of Array Processing:

Feature Description
Parallel Processing Elements Many simple processors work in parallel.
Same Operation, Different Data All PEs execute the same instruction on different parts of data.
Central Control Unit Single instruction stream broadcasted to all processors.
Synchronous Execution All PEs execute in lockstep (same clock, same instruction).

Applications of Array Processors

Array processors are especially effective in scientific, engineering, and multimedia fields where tasks are
data-parallel.

Examples:

 Image and Video Processing: Each PE processes one pixel or frame portion.
 Matrix Operations: Matrix multiplication, inversion, etc.
 Signal Processing: Fourier transforms, convolution.
 Weather Simulation: Grid-based climate data.
 Neural Network Computations: Matrix-heavy computations in AI models.

Advantages of Array Processing

Advantage Explanation

High Performance Multiple data items processed simultaneously.

Scalability More PEs can be added for larger workloads.

Efficient for Regular Tasks Ideal for tasks with repetitive data structure like arrays or matrices.
Simplified Control Flow Single instruction stream reduces control logic complexity.

Limitations

 Not suitable for irregular or dynamic control flows.


 Underutilization: If data size is smaller than number of processors, some PEs remain idle.
 Programming can be less flexible due to uniform instruction requirement.

Comparison: Vector vs. Array Processing

Aspect Vector Processing Array Processing


Execution Model SIMD over vector registers SIMD over multiple physical processors
Hardware Uses vector registers and pipelines Uses an array of PEs
Best For Repetitive computations on vectors Regular, structured computations on arrays
Control Mechanism Control per vector instruction Central control unit for all PEs

Real-World Examples

 Cray X-MP: Early supercomputer using array processing.


 GPUs (Graphics Processing Units): Modern example of SIMD-like array processors optimized for
graphics and AI.
 Google TPU: Uses systolic arrays, a specialized form of array processing, for deep learning tasks.

7. RISC and CISC

RISC (Reduced Instruction Set Computer)


 Fewer, simpler instructions.
 Uniform instruction format.
 Emphasis on software.
 High performance due to pipelining.

Examples: ARM, MIPS

CISC (Complex Instruction Set Computer)

 Large set of complex instructions.


 Instruction length varies.
 Emphasis on hardware.
 Fewer instructions per program, but each does more.

Examples: Intel x86

RISC CISC
Feature
Instruction Set Small, simple Large, complex
Performance Faster (due to pipelining) Slower
Code Size Larger Smaller
Example ARM, MIPS x86, Intel

Modern computer processors are designed based on two fundamental architectural philosophies:

 RISC (Reduced Instruction Set Computer)


 CISC (Complex Instruction Set Computer)

Each has distinct design goals, instruction sets, performance characteristics, and hardware-software balance.

RISC (Reduced Instruction Set Computer)

RISC is designed with the philosophy that a small set of simple instructions can execute operations more
efficiently and quickly.

🔹 Key Features:

 Fewer, simpler instructions: Each instruction performs a basic operation.


 Uniform instruction format: All instructions are the same size, simplifying decoding and pipelining.
 Emphasis on software: Complex tasks are handled by multiple simpler instructions, often managed
by the compiler.
 Faster execution: Due to simple instructions and efficient pipelining.
 Registers used more than memory to store intermediate results.

Examples:

 ARM (used in most smartphones)


 MIPS
 RISC-V (emerging open architecture)
 SPARC

Advantages:

 Better performance via pipelining


 Easier to design, test, and optimize hardware
 Lower power consumption
 Simpler instruction decoding logic

Disadvantages:

 Larger code size (more instructions for complex operations)


 Heavier burden on compilers and programmers

CISC (Complex Instruction Set Computer)

CISC architectures aim to accomplish tasks with fewer instructions, each capable of performing multiple low-
level operations (memory access, arithmetic, etc.).

Key Features:

 Large and complex instruction set: Instructions can perform multiple operations.
 Variable instruction length: Different instructions have different sizes and formats.
 Emphasis on hardware: Hardware handles complex instructions, reducing software burden.
 Fewer instructions per program: Since each instruction does more.

Examples:

 Intel x86
 AMD64
 VAX

Advantages:

 Smaller code size


 Rich instruction set makes programming easier (in assembly)
 Legacy compatibility (especially for x86)

Disadvantages:

 Complex instruction decoding increases hardware complexity


 Slower performance due to complex instructions and variable-length decoding
 More power-hungry than RISC

Feature-Wise Comparison:

Feature RISC CISC


Instruction Set Small, simple Large, complex
Instruction Length Fixed Variable
Execution Speed Faster (due to pipelining) Slower
Code Size Larger (more instructions needed) Smaller (each instruction does more)

Control Unit Hardwired (faster, simpler) Microprogrammed (slower, flexible)

Emphasis Software (compiler optimization) Hardware (hardware executes complex tasks)

Memory Access Load/store architecture Memory access within many instructions

Examples ARM, MIPS, RISC-V, SPARC Intel x86, AMD64, VAX

Modern Perspective:

 Modern processors often blend both philosophies.


o Intel x86 is fundamentally CISC but internally translates complex instructions into RISC-like
micro-operations.
o ARM processors are classic RISC but now include complex instruction capabilities for
performance.

Conclusion:

RISC CISC
Fast, efficient for pipelined execution, but requires Rich, compact code with powerful instructions, but slower
more instructions and more complex

Better suited for embedded, mobile, low-power Common in desktops, servers, and legacy software
devices environments

8. Study of Multicore Processor – Intel, AMD

Intel Multicore Processors

 Examples: Core i3, i5, i7, i9, Xeon.


 Use Hyper-Threading (simulates multiple threads per core).
 Employs integrated graphics, smart cache.
 Technologies:
o Turbo Boost: Dynamically increases frequency.
o Intel VT: For virtualization support.

AMD Multicore Processors

 Examples: Ryzen, EPYC, Threadripper.


 Uses Simultaneous Multi-Threading (SMT) similar to Intel’s Hyper-Threading.
 Infinity Fabric: Connects cores and I/O.
 Focus on power efficiency and performance per watt.

Feature Intel AMD

SMT Technology Hyper-Threading SMT

Performance High single-core performance High multi-core efficiency

Use Cases General & gaming Multithreaded & professional

Multicore processors contain multiple processing cores on a single chip. Each core can independently execute
instructions, enabling parallel processing, better multitasking, and improved performance.

Both Intel and AMD design powerful multicore CPUs for various computing needs—from everyday use to
high-performance computing.

Intel Multicore Processors

Intel is a leader in CPU innovation, widely used in personal computers, servers, and laptops.

Common Series:

 Core i3 / i5 / i7 / i9 – Used in desktops and laptops


 Xeon – Used in servers, workstations, data centers

Key Technologies:

Technology Description
Hyper-Threading (HT) Allows one physical core to appear as two logical threads, improving
parallelism.
Turbo Boost Dynamically increases clock speed of cores when fewer threads are active
or thermal limits allow.
Smart Cache Shared L3 cache that adapts dynamically among cores for better
performance.
Intel VT (Virtualization Enables efficient virtual machine operation by isolating guest and host
Technology) systems.
Integrated Graphics (Intel Iris, Eliminates the need for a separate GPU in most basic tasks.
UHD)

Strengths:

 High single-core performance (better for games and real-time apps)


 Strong power management features
 Wide compatibility and ecosystem support
AMD Multicore Processors

AMD offers a highly competitive lineup for both consumer and enterprise use, often focusing on more cores
and threads per processor.

Common Series:

 Ryzen – Mainstream desktops/laptops


 Threadripper – High-end desktop (HEDT)
 EPYC – Servers and enterprise systems

Key Technologies:

Technology Description

SMT (Simultaneous AMD’s version of hyper-threading, allowing each core to run two threads.
Multithreading)
Infinity Fabric A high-speed interconnect that links cores, memory controllers, and I/O
devices across the chip.
Chiplet Architecture AMD uses multiple small dies (chiplets) instead of one large monolithic die,
improving yield and scalability.
Overclocking Support Most Ryzen CPUs are unlocked, allowing manual tuning for extra
performance.

Strengths:

 Better multi-core performance at lower price points


 Great performance per watt (power efficiency)
 Ideal for content creation, 3D rendering, video editing, virtualization

Intel vs AMD: Feature Comparison Table


Feature Intel AMD

SMT Technology Hyper-Threading SMT

Single-Core Performance Excellent (especially for gaming) Competitive, improving rapidly

Multi-Core Performance Good Often better (more cores/threads)


Clock Speeds Turbo Boost support Precision Boost, usually unlocked

Architecture Monolithic die Chiplet + Infinity Fabric

Virtualization Intel VT AMD-V

Integrated Graphics Intel Iris/UHD (in most chips) Only in select models (e.g., G-series)

Price/Performance Ratio Typically higher-priced More cores for lower cost

Power Efficiency Good Excellent in newer Ryzen/EPYC chips

Use Case Comparison:

Use Case Best Choice

General Home/Office Use Intel or AMD (Ryzen 3 / i3)

Gaming Intel (slightly better single-core), AMD (great for multithreaded games)

Content Creation AMD (Threadripper, Ryzen 9)

Professional Workstations AMD EPYC / Intel Xeon

Virtualization & Servers AMD EPYC (more cores), Intel Xeon (robust ecosystem)

Conclusion:

 Intel focuses on high clock speed, lower latency, and mature integration.
 AMD excels in multi-threaded tasks, cost-effectiveness, and modular scalability.

Both companies are now close in terms of performance and innovation, making the choice depend more on
specific use case and budget.

You might also like