Desktop Computer Structure Overview
Desktop Computer Structure Overview
Central Processing Unit (CPU): The brain of the computer, responsible for executing instructions and
processing data.
Motherboard: The main circuit board that houses the CPU, memory, and other essential components.
Memory: Includes both primary memory (RAM) for temporary data storage and secondary memory
(HDD/SSD) for permanent data storage.
Input Devices: Devices like keyboards and mice that allow users to interact with the computer.
Output Devices: Devices such as monitors and printers that display or produce the results of
computations.
Power Supply Unit (PSU): Provides the necessary electrical power to all components.
The performance of a CPU is influenced by factors such as clock speed, core count, and cache size.
2. Motherboard
This is the main printed circuit board (PCB) that connects all components of the computer.
It provides electrical connections so components like the CPU, RAM, storage, and I/O devices can
communicate.
Contains chipsets, BIOS/UEFI firmware, expansion slots (PCIe), and power connectors.
3. Memory
Secondary memory is non-volatile, meaning data remains even after power is turned off.
4. Input Devices
Devices that allow the user to send data and commands to the computer.
Examples:
o Keyboard: For typing text and commands.
o Mouse: For graphical navigation and interaction.
o Scanner, Webcam, Microphone: For various forms of input.
5. Output Devices
7. Other Components
Expansion Cards: Such as graphics cards, sound cards, and network interface cards (NICs).
Cooling Systems: Fans or liquid cooling for temperature control.
Optical Drives: DVD/CD drives (becoming less common).
Named after mathematician John von Neumann, this is the most widely used architecture in general-purpose
computers.
Think of it like a one-lane road where only one car (either instruction or data) can move at a time. So, traffic
(processing) can get slow.
Key Features:
Diagram (conceptually):
Input → CPU ←→ Memory ←→ Output
↑
Control Unit
Advantages:
Limitations:
2. Harvard Architecture
In Harvard architecture, data and instructions are stored in separate memory units, and they have
separate buses. Data and instructions are stored in separate memories.
There are two buses: one for data and one for instructions.
The CPU can fetch both an instruction and data at the same time.
Example:
It’s like having a two-lane road — one lane for cars (data) and one for bikes (instructions). So both can move
at the same time, making things faster.
Key Features:
Diagram:
Input → CPU ←→ Instruction Memory ←→ Output
↑
Data Memory
Advantages:
Limitations:
Bus Structure One bus (shared for both) Two buses (one for each)
Execution Fetches data and instruction one at a Can fetch both at the same time
time
These registers facilitate quick data access and manipulation, enhancing the CPU's performance.
Every CPU operation starts with the Program Counter (PC), which contains the address of the next instruction
to be fetched from memory. This register ensures the sequential flow of the program unless a branch or jump
alters it. For example, if PC = 0x1000, the CPU fetches the instruction from memory location 0x1000 and
increments the PC to 0x1001 (or further, depending on instruction size) to point to the next instruction.
Once the PC provides the address, the CPU places it in the Memory Address Register (MAR). The MAR acts
as the interface between the CPU and main memory, holding the specific address that the CPU wants to read
from or write to. For instance, if the CPU needs to read data from address 0x200, this value is first placed in the
MAR, which then sends the address to memory.
Memory Registers
Memory Address Register (MAR): Holds the address of the memory location to be accessed.
Memory Data Register (MDR): Contains the data to be written to or read from memory.
These registers ensure efficient data transfer between the CPU and memory
After the MAR specifies the address, the memory system responds with the data stored at that location. This
data is temporarily held in the Memory Buffer Register (MBR) (also called the Memory Data Register or
MDR in some systems). The MBR ensures that data fetched from or written to memory is temporarily stored
until it's moved to its final destination, such as a general-purpose register or the accumulator.
The Instruction Register (IR) holds the instruction fetched from memory so that it can be decoded and
executed. For instance, an instruction like ADD R1, R2, R3 is fetched into the IR, and the control unit decodes
it to generate signals that direct the CPU to perform the addition of R2 and R3, placing the result in R1. The IR
ensures that instructions are held in place while being interpreted and executed.
The decoded instruction often operates on general-purpose registers, such as R0 through R7, which store
temporary data and operands. These registers allow rapid data access during operations like addition,
subtraction, or logical comparison. For instance, an instruction might specify ADD R4, R2, R3, meaning the
CPU adds the contents of R2 and R3 and stores the result in R4.
For certain architectures, particularly accumulator-based designs, the Accumulator (AC) is used for most
arithmetic and logic operations. The AC holds operands and results for calculations. If the CPU is to perform
AC = AC + R1, it takes the current value in the accumulator, adds the value from R1, and stores the result back
in the AC. This register simplifies instruction formats and reduces the number of required operands.
In operations involving arrays or loops, the index registers come into play. These registers store offset values
used to modify base addresses dynamically. For example, if you're accessing the 5th element of an array starting
at address 1000, and the index register contains 4 (assuming 0-based indexing), the effective address becomes
1004, enabling efficient traversal of data structures.
The Stack Pointer (SP) keeps track of the top of the stack in memory, which is used for function calls, returns,
and local variable storage. When a subroutine is called, the CPU pushes the return address and possibly some
register values onto the stack by decrementing the SP. Upon return, these values are popped off the stack,
restoring the previous state. This mechanism is critical for supporting nested function calls and recursion.
After every operation, the Status Register (or Flags Register) is updated to reflect the outcome. It contains
individual bits called flags that indicate conditions such as zero result (Z), negative result (N), carry out (C), and
overflow (V). For instance, after a subtraction operation, if the result is zero, the Zero flag is set. These flags are
often used in conditional instructions, like branching if the result was negative or zero.
Once the instruction has executed—using the IR, data registers, and possibly modifying memory—the CPU
updates the Program Counter again and begins the next fetch-decode-execute cycle. Throughout this process,
the tight cooperation among all these registers allows the CPU to operate efficiently, maintain data integrity,
and follow the logic of the program correctly.
Summary of Additional Registers:
Register Purpose
Instruction Register (IR) Holds current instruction being executed
Memory Address Register (MAR) Holds address of memory to be accessed
Memory Buffer Register (MBR) Temporarily holds data read/written from memory
# Stack Organization
A stack is a data structure that operates on the Last In, First Out (LIFO) principle. It is used for:
The Stack Pointer (SP) register keeps track of the top of the stack.
# Control Word
A Control Word is a binary code that specifies a micro-operation to be performed by the CPU. It directs the
control unit to generate the appropriate control signals for executing instructions.
The ALU performs all arithmetic and logic operations. It takes inputs from registers, performs the specified
operation, and stores the result back in a register. The ALU also sets status flags (like Zero, Carry, Sign) that
influence conditional operations.
# I/O System
The Input/Output (I/O) System manages data exchange between the computer and external devices. It
includes:
The bus structure refers to the organization of buses in a computer system. It can be:
Bus and memory transfer mechanisms describe how data moves between the CPU and memory through
buses. The Memory Address Register (MAR) holds the address, and the Memory Data Register (MDR)
carries the data. Control signals like Read and Write are used to coordinate these transfers.
In a computer system, a bus is like a set of wires or pathways that connect different parts of the computer,
such as the CPU, memory, and input/output (I/O) devices. Think of it as a highway system that allows data to
travel between different components. Just like cars use roads to move from one city to another, information in a
computer uses buses to move between components.
Buses are made up of parallel lines, meaning multiple bits (0s and 1s) can travel at the same time, side by side.
The more lines a bus has, the more data it can carry at once — this is called the width of the bus.
The data bus is used to transfer actual data between the CPU, memory, and other devices. For example, when
you open a file, the data (like text or images) travels from the hard drive to the CPU through the data bus. If the
data bus is 8 bits wide, it can move 8 bits of data at a time; if it's 32 bits wide, it can move more data in one go,
making the computer faster.
The address bus is used to carry memory addresses, not data. This tells the computer where the data should
go or come from. For example, if the CPU wants to read some data from memory, it uses the address bus to
send the location of that data. The memory then uses that location to find and send the correct data.
So, if the address bus is 16 bits wide, the CPU can address up to 2¹⁶ = 65,536 memory locations. A wider
address bus means the system can access more memory.
3. Control Bus – Gives Instructions and Signals
The control bus carries control signals that manage how and when data is transferred. It tells the other
components what to do, like whether to read or write data, whether a device is ready, or when to stop a transfer.
For example, if the CPU wants to read data from memory, it will send a "Read" signal on the control bus. If it
wants to send data, it will send a "Write" signal.
Bus Width: This refers to how many bits the bus can carry at one time. A wider bus (like 64 bits vs. 32
bits) means more data can travel at once, improving speed.
Bus Speed: This refers to how fast the data moves. A faster bus allows more data to be transferred in
less time.
Both width and speed of a bus affect overall computer performance. Faster and wider buses help the
CPU communicate with memory and other devices more efficiently, leading to quicker program
execution and better multitasking.
Faster and wider buses mean the computer can do more in less time — just like having more lanes and higher
speed limits on a highway helps traffic move better.
The Program Counter (PC) holds the address of the next instruction to be executed. After each instruction
fetch, the PC is incremented unless modified by control transfer instructions like jumps or branches. This
ensures the sequential execution of instructions.
Register Transfer Lnguage (RTL) is used to describe operations in terms of data transfers between registers
and the operations performed on that data. It uses symbolic notations such as R1 ← R2 + R3, meaning the
contents of R2 and R3 are added and stored in R1.
Let's break down Register Transfer Language (RTL) in a very simple and detailed way:
What is RTL?
Register Transfer Language (RTL) is a way to describe how data moves between registers in a computer,
and what operations are done on that data.
What is a Register?
A register is a small, fast storage location inside the CPU. It temporarily holds data that the CPU is currently
working on.
Registers are like tiny boxes where you keep important things you need right now.
Example:
R1 ← R2 + R3
This means:
Here are a few examples of RTL instructions and what they mean:
RTL Statement Meaning
Copy the value from R2 into
R1 ← R2
R1
R1 ← R2 + R3 Add R2 and R3, store in R1
Subtract R1 from R4, store in
R4 ← R4 - R1
R4
Bitwise AND of R6 and R7,
R5 ← R6 AND R7
store in R5
R3 ← R3 + 1 Increment R3 by 1
R2 ← R2 - 1 Decrement R2 by 1
It acts like a bridge between the software (like a programming language) and the hardware (the CPU circuits).
Imagine registers as small cups of water and operations as actions like pouring, mixing, or measuring:
R1 ← R2 + R3 is like pouring water from cup R2 and R3 into cup R1 after mixing.
R1 ← R2 is like just copying the water from cup R2 to cup R1.
RTL is a language used to describe how data is transferred between registers and what operations are
done.
Instruction Register
The Instruction Register (IR) holds the current instruction being executed. It is loaded with the instruction
fetched from memory and is decoded to determine the operation to be performed. The IR plays a crucial role in
the fetch-decode-execute cycle.
# Instruction Types
# Instruction Format
An instruction format defines the layout of bits in an instruction. Common fields include:
Instruction formats vary based on the architecture and complexity of the CPU.
The Instruction Format describes how the bits of an instruction are organized inside the CPU. You can think
of it like a sentence made of smaller parts (like subject, verb, and object).
Opcode Tells the CPU what operation to perform e.g., ADD, SUB, LOAD
Operands Tells what data to use or which registers/memory to e.g., R1, R2, 5000
use
Addressing Tells the CPU how to interpret the operands (like e.g., direct (use address), immediate
Mode direct, indirect, immediate) (use value directly)
Simple Example
Instruction formats can vary in size (like 16-bit, 32-bit, or 64-bit instructions), but here are common types:
1. Zero-Address Instruction
o No operands. Often used in stack-based systems.
o Example: ADD (takes values from stack)
2. One-Address Instruction
o One operand plus an implicit register (like the accumulator).
o Example: LOAD 5000 (load from memory address 5000)
3. Two-Address Instruction
o Two operands. One is the source, one is the destination.
o Example: MOV R1, R2 (copy R2 into R1)
4. Three-Address Instruction
o Uses two source operands and one destination.
o Example: ADD R1, R2, R3 (R1 = R2 + R3)
Let’s explain these types of instruction formats — Zero, One, Two, and Three Address Instructions — in
simple and detailed terms. These formats tell us how many operands (data items or addresses) are used in a
single instruction, and how they're processed.
1. Zero-Address Instruction
A zero-address instruction doesn't use any explicit operands. Instead, it works with a stack, where data is
stored in a Last In First Out (LIFO) way.
A one-address instruction uses one explicit operand, and the other is implied (usually the accumulator, a
special register used for arithmetic).
You don't need to write the accumulator in the instruction; it’s automatically used.
Example:
LOAD 5000
What it does:
Loads the value from memory address 5000 into the accumulator.
Other examples:
Use Case:
3. Two-Address Instruction
Example:
MOV R1, R2
What it does:
Other example:
ADD R1, R2 ; R1 = R1 + R2
This means:
Take R1 and R2
Add them
Store the result back into R1
Use Case:
4. Three-Address Instruction
Example:
What it does:
So:
R1 = R2 + R3
Other examples:
Use Case:
Summary Table
Three-Address 3 ADD R1, R2, R3 Performs operation on two sources, stores in third
Now that we know what an instruction looks like, let’s see how the CPU executes it step-by-step.
1. Fetch
Example:
2. Decode
Example:
3. Execute
Example:
The CPU saves the result of the operation into a register or memory.
Example:
After finishing one instruction, the CPU moves to the next (by updating the Program Counter) and repeats:
Instruction Cycle Steps the CPU follows to process an instruction: Fetch → Decode → Execute → Store
Addressing modes define how operands are accessed in instructions. Common addressing modes include:
These modes increase flexibility and efficiency in accessing data during program execution.
Addressing Modes tell the CPU how to interpret the operands: whether the operand is the actual value, a
memory address, in a register, or something else.
Meaning:
MOV R1, #5
What happens:
Simple Explanation:
Meaning:
The instruction gives the memory address where the data is located.
🔸 Example:
What happens:
Simple Explanation:
Meaning:
The instruction contains a memory address, but that address holds another address, and that second address
is where the actual data is.
🔸 Example:
🔹 Simple Explanation:
"Go to this address to find another address that tells you where the data is."
Meaning:
🔸 Example:
MOV R1, R2
What happens:
🔹 Simple Explanation:
"The data is already in the CPU—just move it from one box (register) to another."
Meaning:
The address of the operand is calculated by adding a constant (offset) to the value in an index register.
🔸 Example:
What happens:
Add 1000 + R2
Use the result as a memory address
Load the data from that address into R1
Simple Explanation:
"Start from a base (index register) and add an offset to find where the data is."
📌 Common in:
Summary Table
Mode Where is the operand? Example Explanation
Immediat Inside the instruction MOV R1, #10 Use value 10 directly
e
Direct Memory address is given in the LOAD R1, 5000 Go to address 5000
instruction
Indirect Address points to another address LOAD R1, (5000) memory[5000] → 7000 →
memory[7000]
A microinstruction is a low-level instruction used in a microprogrammed control unit. Its format includes:
A microinstruction is a very basic, low-level instruction used inside the control unit of a CPU that has a
micro programmed control. These microinstructions control the internal hardware by specifying exact
operations like moving data between registers, controlling ALU operations, etc.
1. Micro-Operation Fields
These fields specify the micro-operations — the actual hardware-level operations the CPU should
perform during that cycle.
Examples of micro-operations:
o Load register R1
o Add contents of R2 and R3
o Move data from one register to another
o Enable memory read/write
There can be multiple micro-operations in one microinstruction, often combined using control signals.
This field tells the control unit which microinstruction to execute next.
It directs the flow of microinstructions, allowing sequences or branching in microprograms.
The next address can be:
o A fixed address (next microinstruction in sequence)
o An address decided by a condition (branching based on flags)
o An address from a register or counter
It helps the CPU’s control unit to generate the correct signals for each step in executing machine
instructions.
Enables flexible and programmable control logic instead of hardwired logic.
Simple Example
The CPU will load the content from memory address in AR into R1 and also add R2 and R3.
After completing this microinstruction, control jumps to microinstruction at address 1005.
What is it?
A hardwired control unit is built using fixed electronic circuits like combinational logic gates, flip-
flops, decoders, and multiplexers.
The control signals are generated by fixed hardware logic.
It uses logic gates and timing signals to directly produce control signals based on the current
instruction.
Characteristics
Feature Description
Design Fixed, implemented using circuits and gates
Control Signal Generation Signals generated directly by combinational logic
Speed Very fast because control signals are generated by hardware instantly
Flexibility Not flexible — changing control logic requires redesigning the hardware
Complexity Can be complex for complex instruction sets
Examples Early computers and simple processors
Advantages
Disadvantages
Characteristics
Feature Description
Flexibility Highly flexible — changing control behavior requires changing microprogram only
Complexity Easier to design and modify; handles complex instruction sets easily
Advantages
Disadvantages
Flexibility Not flexible; hardware changes needed Highly flexible; change microprogram only
Complexity Complex for large instruction sets Simpler for complex instruction sets
Cost Cheaper for simple designs Requires more memory and control hardware
Bit Position 4 3 2 1 0
13 (01101) 0 1 1 0 1
9 (01001) 0 1 0 0 1
Sum 1 0 1 1 0
Bit Pos: 4 3 2 1 0
-------------------
13 0 1 1 0 1
9 0 1 0 0 1
We’ll add them right to left (from LSB → MSB), keeping track of any carry.
Bit 4: 1
Bit 3: 0
Bit 2: 1
Bit 1: 1
Bit 0: 0
Note: Since Booth’s algorithm handles signed numbers, the final 8-bit result is in two’s complement
representing -12.
Iteration 2:
Iteration 3:
Iteration 4:
1.001 → 0.1001
+-----------------------+
| Control Unit |
+-----------------------+
|
v
+---------------------+
| Instruction Decoder |
+---------------------+
|
v
+---------------------+
| Multiplexer |<--- Inputs from registers
+---------------------+
|
v
+---------------------+
| Arithmetic Logic Unit|
| (ALU) |
+---------------------+
|
v
+---------------------+
| Result Register |
+---------------------+
Control Unit: Provides signals to select operations (add, subtract, multiply, divide).
Instruction Decoder: Decodes instruction to identify required arithmetic operation.
Multiplexer: Selects which operands are fed to the ALU.
ALU: Performs actual arithmetic and logic operations.
Result Register: Stores the output/result of the ALU operation.
UNIT -3
I/O Organization - Detailed Student Notes
1. I/O Interface
Definition:
An I/O interface is a communication bridge between the CPU and I/O devices. It enables the system to
send/receive data, control signals, and status signals to/from devices.
An Input/Output (I/O) Interface acts as a mediator between the CPU and I/O devices such as keyboards,
printers, monitors, etc. It facilitates data transfer, status monitoring, and control signal exchange between
the internal system (CPU/memory) and external hardware.
Why Is It Needed?
CPU and I/O devices differ in data formats, timing, and control methods.
Devices may work at slower speeds than the CPU.
I/O devices may use analog signals, while the CPU understands digital data.
Coordination is needed for data integrity, device selection, and error detection.
Functions:
Communication Handling: Facilitates data exchange between CPU and I/O device.
Signal Conversion: Converts between digital signals and analog voltages as needed.
Speed Matching: Buffers are used to accommodate the speed mismatch between CPU and I/O.
Control Signaling: Manages read/write operations, command triggers, and device acknowledgments.
Error Detection: Detects parity errors, transmission errors, and device malfunctions.
Function Description
Communication Handling Manages the actual data transfer between CPU and I/O devices.
Speed Matching Uses buffers or FIFO to synchronize fast CPUs and slow I/O devices.
Control Signaling Generates necessary signals like READ, WRITE, ACK, etc.
Error Detection Detects transmission faults using parity checks, CRC, etc.
CPU
|
-----------------
| Control Bus |<----------------------> Control Register
| Address Bus |<----------------------> Address Decoder
| Data Bus |<----------------------> Data Register
-----------------
|
[ I/O Interface ]
|
[ I/O Device ]
Operation:
Technique Description
Programmed I/O CPU controls all data transfer. Polling-based.
Interrupt-Driven I/O Device interrupts CPU when ready for communication.
DMA (Direct Memory Access) I/O controller transfers data directly to memory without CPU.
An I/O Bus is a communication pathway that connects the CPU and memory subsystem to input/output
devices. It carries data, addresses, and control signals between the components. Each I/O bus supports different
speeds, topologies, and device types.
PCI is a high-speed parallel bus used for connecting internal hardware components to the motherboard.
Key Features:
Real-World Example:
A user adds a Gigabit Ethernet NIC to a PC using a PCI slot on the motherboard. The OS
automatically detects and configures the device using plug-and-play.
SCSI is a parallel bus architecture designed for high-speed and reliable connection of multiple peripherals,
commonly in enterprise settings.
Key Features:
Textual Diagram:
[Host Adapter]
|
[SCSI Bus]---[HDD 1]---[HDD 2]---[Tape Drive]---[CD-ROM]
Real-World Example:
A file server uses SCSI to connect to multiple hard drives and a backup tape drive, enabling fast,
simultaneous data transactions for enterprise operations.
USB is a serial communication standard developed to simplify and unify peripheral connections for short-
distance communication.
Key Features:
Real-World Example:
A USB flash drive is inserted into a laptop. It draws power and communicates with the CPU for data
transfer. No restart is needed due to hot-plugging support.
Comparison Table: PCI vs SCSI vs USB
Devices Supported Limited by slots 8–16 per controller 127 per host controller
Usage Area Internal desktop cards Enterprise storage, servers General-purpose devices
In serial transfer, data bits are sent one after another over a single wire/channel. One bit is transferred at a
time over a single wire or channel.
Advantages:
Limitations:
Example:
Textual Diagram:
Transmitter --[bit1]--[bit2]--[bit3]--> Receiver
(1 wire/channel)
b. Parallel Data Transfer
In parallel transfer, multiple bits (usually 8, 16, or 32) are transmitted simultaneously using separate
[Link] all bits of data (usually 8/16/32) simultaneously.
Advantages:
High-speed transfer.
Effective for short distances (within PC boards).
Limitations:
Example:
Textual Diagram:
Definition:
In synchronous transfer, the sender and receiver share a common clock signal and send data in lockstep
with clock pulses. Sender and receiver operate in lockstep using a shared clock.
Advantages:
Drawbacks:
Example:
Definition:
In asynchronous transfer, each data unit (typically 1 byte) is sent independently, accompanied by start and
stop bits for synchronization. Each byte of data is self-contained with start and stop bits.
Advantages:
Drawbacks:
Example:
Textual Diagram:
[Start] [Data: 8 bits] [Stop]
| | |
Send →→→→→→→→→→→→→→→→→→→→ Receive
Comparison Table:
Distance Suitability Long distance Short distance Short (high sync Medium (simple
needed) devices)
Hardware Complexity Low High Medium-High Low
Example Device RS-232 modem CPU–RAM DDR RAM–CPU Keyboard via UART
4. Direct Memory Access (DMA)
Direct Memory Access (DMA) is a method that allows I/O devices to transfer data directly to/from
memory without continuous CPU involvement. It significantly improves system performance by freeing the
CPU from routine data movement tasks.
Component Description
DMA Controller A dedicated hardware unit or chip that manages DMA operations.
Textual Diagram:
Burst Mode Entire block of data is transferred in one go without High-speed transfers like disk-to-
CPU access. RAM
Cycle Stealing DMA takes one bus cycle at a time, interleaved with Printer or sound card data output
CPU access.
Transparent DMA only transfers when CPU is idle, completely Background data loading in video
Mode non-intrusive. playback
Example:
In video rendering, high-resolution image files are fetched from the SSD into RAM using Burst Mode DMA,
allowing the CPU to focus solely on rendering logic and UI response.
Efficiency Low (CPU cycles wasted) High (CPU free for other tasks)
Advantages of DMA:
An IOP is a dedicated processor that manages I/O operations, allowing the CPU to execute only computation
instructions. An I/O Processor (IOP) is a dedicated processing unit designed specifically to manage I/O
device operations. Unlike the CPU, which is optimized for logic and arithmetic, the IOP focuses on
controlling peripherals and handling I/O data traffic.
Structure of an IOP:
Component Function
I/O Instruction Set Executes special I/O-specific commands like device select, read, write.
Internal Memory Holds temporary data, status info, or buffered I/O tasks.
Communication Bus Facilitates interaction between IOP, I/O devices, and system memory.
1. CPU Delegation:
The CPU writes the I/O instructions or a small I/O program to main memory.
2. IOP Fetch:
The IOP reads these instructions independently of the CPU.
3. Execution:
IOP executes the I/O task (e.g., read from disk, write to printer).
4. Interrupt Generation:
After completing the task, the IOP generates an interrupt to inform the CPU.
Real-Life Example:
Card Readers
Magnetic Tape Drives
Printers
These were managed entirely independent of the main CPU, allowing the system to support multiple
I/O tasks simultaneously.
Component Description
I/O Interface Connects CPU with I/O devices. Handles data, control, and status signals.
PCI Bus High-speed parallel bus used for internal components like NICs, GPUs.
USB Universal serial bus supporting hot-plug and device power delivery.
Serial Transfer Transfers 1 bit at a time → cost-efficient, suitable for long distances.
Parallel Transfer Transfers multiple bits at once → faster, but limited to short distances.
Synchronous Transfer Uses shared clock → high-speed but requires sync hardware.
Asynchronous Transfer No shared clock → uses start/stop bits, simpler but slower.
DMA Peripheral directly transfers data to memory without CPU → boosts speed.
Memory Organization –
1. Main Memory
.Main memory is a critical component of a computer's architecture. It is the primary working memory that
stores both data and instructions that the CPU needs for processing. It is much faster than secondary storage but
slower than CPU registers or cache.
RAM is a volatile memory, meaning that it loses its contents when the power is turned off. It allows both read
and write operations and acts as the working area for the CPU during program execution.
RAM is a volatile memory used to store data and programs that are currently being used by the CPU. It allows
both read and write operations and loses all stored data when power is turned off. RAM is divided into Static
RAM (SRAM), which is faster but costlier, and Dynamic RAM (DRAM), which is slower but cheaper and used
in main system memory.
Characteristics:
Types of RAM:
+-----------+ +-----------+
| Bit Cell | ---- | Flip-Flop |
+-----------+ +-----------+
2. Dynamic RAM (DRAM):
o Stores each bit as a charge in a capacitor.
o Slower than SRAM and requires refresh cycles to maintain data.
o Denser and cheaper than SRAM.
o Used in main system memory.
o Typically found in DDR, DDR2, DDR3, DDR4, and DDR5 modules.
Working Note:
Each memory cell must be refreshed thousands of times per second, as capacitors leak charge.
ROM is a non-volatile memory, meaning it retains its data even when the power is off. It is read-only under
normal operation and is used to store firmware — the essential software required to boot and initialize
hardware.
ROM is a non-volatile memory that stores critical startup instructions and firmware. Unlike RAM, it can only
be read and retains its contents even after the power is off. Variants include PROM, EPROM, and EEPROM
which allow reprogramming under specific conditions
Characteristics:
Contents are written once (during manufacturing or programming) and then read-only.
Not used for general storage.
Typically stores BIOS or bootloader code.
No data loss during shutdown.
Cannot be used to store dynamic programs or data.
Types of ROM:
Type Description
PROM Programmable ROM – can be written once using a special device (PROM burner).
EEPROM Electrically Erasable PROM – erased and written using electrical signals, used in BIOS.
Comparison Between RAM and ROM:
2. Secondary Memory
Secondary memory refers to non-volatile, long-term storage devices. It is used for storing data
[Link] memory refers to non-volatile storage that holds data permanently, even when the
computer is powered off. It is used to store the operating system, software programs, files, documents, and
media. Unlike primary memory (RAM), it is not directly accessed by the CPU. Data must be transferred to
RAM before the CPU can process it.
Key Characteristics:
Magnetic Tape: Sequential-access storage used mostly for backups and archiving. It is cheap and can
store large amounts of data but is very slow.
Magnetic Disk (Hard Disk Drives): Uses spinning disks coated with magnetic material to store data in
concentric tracks. It allows random access and is used widely for general data storage.
Optical Storage (CD/DVD/Blu-ray): Data is stored using laser technology. These are portable, have
moderate capacity, and are used for distribution of software, movies, etc.
1. Magnetic Tape
Magnetic tape is one of the oldest forms of secondary storage, used mainly for sequential data access. It stores
data on a plastic ribbon coated with magnetic material.
Working Principle:
Advantages:
Limitations:
Real-Life Use:
A hard disk drive is a widely used random-access secondary storage device. It stores data on a stack of
rotating disks (platters) coated with magnetic material.
Working Principle:
Advantages:
Optical storage uses laser technology to read and write data. Data is encoded on a reflective disc surface using
pits (low) and lands (high).
Working Principle:
A laser beam reflects off the disc surface and is interpreted as binary data.
CD stores about 700 MB, DVD about 4.7–9.4 GB, and Blu-ray up to 25–50 GB or more.
Advantages:
Limitations:
Comparison Table
Storage Capacity Very High (TBs) High (up to 20TB) Low to Moderate
3. Cache Memory
Cache memory is a small-sized, high-speed memory unit that lies between the CPU and the main memory
(RAM). Its primary purpose is to reduce the time the CPU takes to access data from the main memory by
storing frequently accessed data and instructions. Cache memory is a small, high-speed memory located
close to the CPU to reduce data access time
Divided into multiple levels (L1, L2, L3) with L1 being fastest and smallest.
Stores frequently used instructions and data for quick access.
Levels of Cache
L1 (Level 1) Cache:
o Closest to the CPU.
o Very small (typically 32KB–128KB).
o Very fast access.
L2 (Level 2) Cache:
o Larger than L1 (256KB–1MB).
o Slower than L1 but still faster than RAM.
L3 (Level 3) Cache:
o Shared among CPU cores.
o Larger (2MB–64MB).
o Slower than L2 but still faster than RAM.
+------------------+
| CPU |
+-----------------+
|
+----------v----------+
| L1 Cache | -- - Smallest and fastest
+-------------------+
|
+----------v----------+
| L2 Cache | --- Larger and slightly slower
+--------------------+
|
+----------v----------+
| L3 Cache | --- Shared and even larger
+---------------------+
|
+----------v----------+
| Main Memory |
+---------------------+
Each level acts as a buffer for the next, storing recently accessed memory blocks.
Mapping Schemes in Cache Memory
Mapping defines how memory blocks from the main memory are placed into cache.
Direct Mapping: Each block of main memory maps to exactly one cache line.
Associative Mapping: Any block can be placed in any cache line. Flexible but expensive.
Set-Associative Mapping: A compromise where blocks are mapped to a set of lines.
1. Direct Mapping
Formula:
Cache Line Number = (Main Memory Block Number) % (Number of Cache Lines)
Example:
If block 10 and block 18 both map to line 2, one must be replaced when the other is accessed.
2. Associative Mapping
3. Set-Associative Mapping
When the cache is full, and a new block needs to be loaded, one of the existing blocks must be replaced.
LRU (Least Recently Used): Replaces the block that hasn't been used for the longest time.
FIFO (First In First Out): Replaces the oldest block in cache.
Random: Randomly replaces a block, used for simplicity.
3. Random
Use of prefetching.
Increasing cache size.
Using write-back instead of write-through policies.
Employing multi-level caches.
4. Virtual Memory
Virtual memory is a memory management technique where secondary storage is used to extend available RAM.
It allows programs to use more memory than what is physically available.
Virtual memory is a memory management technique that gives an application the illusion of a large,
continuous memory space, even if the physical RAM is smaller. It uses secondary storage (like a hard disk
or SSD) as an extension of RAM.
How It Works:
Programs are written as if they have access to a large, continuous block of memory.
The operating system (OS) and hardware manage the translation between virtual addresses (used by
programs) and physical addresses (actual RAM).
The extra space required is taken from secondary storage (called the swap space or page file).
Key Concepts:
Divides memory into pages.
Uses paging or segmentation techniques.
Page Table maps virtual addresses to physical addresses.
Page Faults occur when data is not in RAM and must be fetched from disk.
Advantages include better multitasking, isolation between programs, and execution of large programs.
1. Paging
2. Segmentation
Divides memory into variable-size segments based on logical divisions (code, data, stack).
Less common than paging, but sometimes combined with it (segmented paging).
3. Page Table
4. Page Fault
Memory management hardware assists in address translation and [Link] management hardware
provides the necessary mechanisms to translate addresses, protect memory, and ensure efficient allocation
of RAM. It supports features like virtual memory, process isolation, and address translation.
Function:
Converts virtual addresses (used by programs) into physical addresses (used by RAM).
Works closely with the CPU and page table.
Location:
How it works:
When the CPU generates a virtual address, the MMU uses the page table and possibly the TLB to
translate it.
Function:
Working:
Function:
Provide memory protection by ensuring that a process can access only its own allocated memory.
4. Page Table
Function:
Multiprocessor systems have two or more CPUs that share a common physical memory and are interconnected.
Key characteristics:
Multiprocessor systems consist of two or more processors (CPUs) working together in a tightly coupled
architecture. These processors are connected through a common system bus and share resources such as main
memory, I/O devices, and peripheral hardware.
Each processor may be assigned different tasks, or they may work on the same task in parallel, depending on
the system design. Below are the key characteristics elaborated in detail:
1. Increased Throughput
Throughput refers to the number of tasks a system can complete in a given time.
Example: In a quad-core processor, four independent threads or applications can run concurrently, improving
performance for multitasking and background processes.
Fault Tolerance means the system's ability to continue functioning even if one or more components fail.
If one processor fails, the remaining processors can take over its workload (load redistribution), ensuring
system continuity.
This redundancy increases system reliability and is essential in critical applications (e.g., aerospace,
defense, and banking).
Software mechanisms (like failover routines) and hardware mechanisms (like redundant paths) are often
implemented to handle such failures.
Example: In a server with two processors, if one fails, the other can continue running critical services with
minimal downtime.
3. Resource Sharing
Main Memory (RAM): Accessible by all processors, ensuring shared data visibility.
I/O Devices: Like printers, disk drives, and network interfaces.
Interconnection Network: The bus or crossbar switch that links the components.
This sharing leads to efficient resource utilization, but it also requires synchronization mechanisms (like
semaphores or mutexes) to prevent conflicts or inconsistencies during access.
Note: While shared resources can increase complexity (like cache coherency issues), modern systems manage it
efficiently through dedicated hardware and protocols.
4. Scalability
Scalability refers to the system's ability to grow or expand by adding more processors without degrading
performance significantly.
Challenge: As more processors are added, managing resource contention, bus traffic, and synchronization
overhead becomes more complex.
Solution: Use of advanced interconnects (e.g., NUMA, crossbar switches) and scalable software architecture.
5. Parallel Processing
Parallel Processing is the simultaneous execution of multiple tasks to reduce the overall processing time.
Tasks can be broken down into subtasks that execute concurrently on multiple processors.
There are different types of parallelism:
o Data Parallelism: Same operation on different data (e.g., array processing).
o Task Parallelism: Different operations on different or shared data.
Leads to substantial performance improvements for large-scale problems such as simulations, machine
learning, and video rendering.
Example: A video editing software can render different sections of a video simultaneously using multiple
processors.
2. Structure of Multiprocessor
a. Interprocessor Arbitration
It refers to the mechanism by which multiple processors coordinate their access to shared resources (like
memory or bus):
b. Interprocessor Communication
c. Synchronization
Ensures that concurrent processes/threads do not interfere with each other while accessing shared resources:
Multiprocessor systems have multiple CPUs that are interconnected and work together to execute programs. To
ensure efficient operation, the system must manage how processors access shared resources, communicate
with each other, and synchronize operations.
a. Interprocessor Arbitration
Interprocessor arbitration is the method used to manage access to shared resources (such as the system bus,
memory, or I/O devices) when multiple processors compete for control.
1. Bus Arbitration
The system bus is a critical shared resource. When more than one processor wants to use the bus (for memory
or I/O access), arbitration determines who gets control.
i. Centralized Arbitration
A single control unit (arbiter) is responsible for deciding which processor gets bus access.
Processors send a request signal to the arbiter.
The arbiter grants access based on a predefined algorithm (priority-based, round-robin, etc.).
Advantages:
Simple design
Easy to manage
Disadvantages:
Advantages:
Better scalability
No single point of failure
Disadvantages:
b. Interprocessor Communication
Definition:
To execute tasks cooperatively, processors must exchange data, control information, or synchronization
signals. Communication can occur in multiple ways:
Example: Two processors updating a shared variable must use locks to prevent race conditions.
2. Message Passing
Data and control messages are explicitly sent and received between processors.
Often used in loosely coupled systems (e.g., clusters, distributed systems).
Each processor has a local memory, and messages are sent via interconnects like Ethernet or custom
buses.
Advantages:
Disadvantages:
Example: CPU A sends an interrupt to CPU B to indicate that shared data is ready for use.
c. Synchronization
Definition:
Synchronization ensures that multiple processors do not interfere with each other when accessing shared data
or resources, maintaining consistency and correctness.
1. Semaphores
Example: Before a processor writes to shared memory, it calls wait(). After it's done, it calls signal() to allow
others access.
A lock or mutex allows only one processor at a time to access a critical section of code or data.
Other processors must wait until the lock is released.
Deadlock and priority inversion are common issues if not managed properly.
3. Barriers
A barrier ensures that all processors or threads reach a certain point in execution before any can
proceed.
Useful in parallel algorithms where phases must complete in sync.
Example: In matrix multiplication, all processors must complete one stage before proceeding to the next.
✅ Summary Table:
Cache Coherency:
In multiprocessor systems, the design of memory architecture is critical to ensure high performance, efficient
data sharing, and synchronization between processors. Depending on how memory is structured and accessed,
multiprocessor systems use different memory models.
A. Memory Architectures
Advantages:
Disadvantages:
Memory contention: Multiple processors accessing memory simultaneously can cause bottlenecks.
Cache coherence issues arise when each processor has a private cache.
Advantages:
Disadvantages:
Example: High-performance computing (HPC) systems using MPI (Message Passing Interface).
A hybrid model where processors have local memory but can also access shared global memory.
Memory access time varies depending on whether the data is in local memory or remote memory.
NUMA systems are designed to minimize access to remote memory and optimize local access.
Advantages:
Disadvantages:
Example: Modern server-class systems (e.g., AMD EPYC and Intel Xeon).
When multiple processors have private caches and access shared memory, inconsistencies can occur. For
example, if Processor A updates a variable in its cache, Processor B may still see an old value in its cache.
Cache Coherency ensures that all processors have a consistent view of shared memory.
Write-through: Data is written to both the cache and main memory simultaneously.
o Ensures consistency but generates more memory traffic.
Write-back: Data is written only to the cache initially and later to main memory.
o Reduces memory traffic but needs coherence control to update other caches.
2. Snoopy Protocol
All caches watch (snoop) the bus to monitor read/write operations by other processors.
Used in systems with shared bus architecture.
Two types:
Write-invalidate: When a processor writes to a cache line, it invalidates that line in other caches.
Write-update: The new value is broadcast to all caches to update their copy.
Advantages:
Disadvantages:
3. Directory-based Protocol
Maintains a centralized or distributed directory that keeps track of which caches have copies of a
memory block.
When a processor wants to read or write, it checks with the directory to maintain coherency.
Advantages:
Disadvantages:
Snoopy Protocol Caches monitor bus for consistency Small-scale shared bus
4. Concept of Pipelining
Stages of Pipeline:
o Instruction Fetch (IF)
o Instruction Decode (ID)
o Execute (EX)
o Memory Access (MEM)
o Write Back (WB)
Types:
o Instruction Pipeline
o Arithmetic Pipeline
Hazards:
o Structural: Resource conflicts.
o Data: Data dependency.
o Control: Branch instructions affect flow.
Pipelining is a technique in processor design that allows the overlapping execution of multiple instructions
to improve instruction throughput (number of instructions executed per unit time). It’s similar to an assembly
line in a factory — different stages of instruction execution are divided and handled simultaneously.
Basic Idea
Instead of executing one instruction at a time, pipelining breaks the instruction cycle into separate stages, each
of which is handled by a different unit of the processor. While one instruction is being decoded, another can be
fetched, a third executed, and so on.
2 Instruction Decode (ID) Decode the fetched instruction to determine operation and operands.
Each stage works in parallel with others, processing a different instruction during each clock cycle.
B. Types of Pipelining
1. Instruction Pipeline
2. Arithmetic Pipeline
Optimizes the execution of complex arithmetic operations, such as multiplication, division, floating-
point calculations.
Each stage of the pipeline handles part of the arithmetic operation.
Used in vector processors, DSPs (Digital Signal Processors), and some GPU cores.
C. Pipeline Hazards
Pipelining can improve performance only when instructions flow smoothly through the pipeline. However,
certain issues (called hazards) can disrupt this flow.
1. Structural Hazards
Occur when hardware resources are insufficient to handle all operations simultaneously.
Example: If only one memory unit exists for both instruction fetch and data access, a conflict arises.
Happen when instructions depend on the results of previous instructions that haven’t completed yet.
Types of data hazards:
o RAW (Read After Write): Current instruction needs a value that’s still being computed.
o WAR (Write After Read) and WAW (Write After Write) (less common in simple pipelines).
Solution: Data forwarding (bypassing), inserting no-operation (NOP), or stalling the pipeline.
3. Control Hazards
Occur due to branching instructions (e.g., if-else or loops), which can change the instruction flow.
The next instruction to fetch may not be known until the branch decision is made.
Solution:
D. Performance of Pipelining
Element Description
Stag IF → ID → EX → MEM → WB
Ideal Speedup:
If there are n stages in the pipeline, the ideal speedup is close to n.
However, due to pipeline hazards and overhead, real-world speedup is always less than ideal.
Example: A 5-stage pipeline may provide a 3.5–4x improvement instead of full 5x.
5. Vector Processing
Vector processing is a computing technique in which a single instruction operates on multiple data
elements simultaneously — usually elements of arrays or vectors. This is part of the SIMD (Single
Instruction, Multiple Data) model, ideal for data-parallel operations.
Key Characteristics:
In vector processing, operations are performed in parallel across entire arrays of data.
For example, instead of performing 100 separate additions for 100 elements, a vector processor can
perform them with one vector instruction.
Vector processors are highly efficient in scientific computing, weather simulations, matrix
operations, image/video processing, and machine learning tasks.
Such applications often involve large datasets and repetitive numerical operations.
Traditional scalar processors fetch and decode one instruction per operation.
Vector processors reduce instruction fetch and decode time by applying a single instruction to
multiple data elements.
This minimizes control overhead and enhances throughput.
A = [2, 4, 6]
B = [1, 3, 5]
C = [?, ?, ?]
VADD A, B → C
C = [3, 7, 11]
Feature Benefit
Parallel Data Processing Increases computational speed
Reduced Control Overhead Fewer instructions fetched/decoded
High Throughput Multiple operations completed in fewer clock cycles
Predictable Performance Suitable for structured, large-scale numerical tasks
Limitations
Array processors use multiple processing elements to perform parallel operations on data arrays:
Array processing refers to a type of parallel computing where multiple processing elements (PEs)
simultaneously perform the same operation on different elements of a large data set, such as an array or matrix.
It follows the SIMD (Single Instruction, Multiple Data) architecture — ideal for highly regular and repetitive
computations.
SIMD Architecture:
o One control unit sends the same instruction to many processing elements.
o Each PE executes the same instruction but on different data.
The PEs may share common memory or have local memory.
Communication between PEs is either direct (neighbor-based) or via interconnection networks.
How It Works:
Feature Description
Parallel Processing Elements Many simple processors work in parallel.
Same Operation, Different Data All PEs execute the same instruction on different parts of data.
Central Control Unit Single instruction stream broadcasted to all processors.
Synchronous Execution All PEs execute in lockstep (same clock, same instruction).
Array processors are especially effective in scientific, engineering, and multimedia fields where tasks are
data-parallel.
Examples:
Image and Video Processing: Each PE processes one pixel or frame portion.
Matrix Operations: Matrix multiplication, inversion, etc.
Signal Processing: Fourier transforms, convolution.
Weather Simulation: Grid-based climate data.
Neural Network Computations: Matrix-heavy computations in AI models.
Advantage Explanation
Efficient for Regular Tasks Ideal for tasks with repetitive data structure like arrays or matrices.
Simplified Control Flow Single instruction stream reduces control logic complexity.
Limitations
Real-World Examples
RISC CISC
Feature
Instruction Set Small, simple Large, complex
Performance Faster (due to pipelining) Slower
Code Size Larger Smaller
Example ARM, MIPS x86, Intel
Modern computer processors are designed based on two fundamental architectural philosophies:
Each has distinct design goals, instruction sets, performance characteristics, and hardware-software balance.
RISC is designed with the philosophy that a small set of simple instructions can execute operations more
efficiently and quickly.
🔹 Key Features:
Examples:
Advantages:
Disadvantages:
CISC architectures aim to accomplish tasks with fewer instructions, each capable of performing multiple low-
level operations (memory access, arithmetic, etc.).
Key Features:
Large and complex instruction set: Instructions can perform multiple operations.
Variable instruction length: Different instructions have different sizes and formats.
Emphasis on hardware: Hardware handles complex instructions, reducing software burden.
Fewer instructions per program: Since each instruction does more.
Examples:
Intel x86
AMD64
VAX
Advantages:
Disadvantages:
Feature-Wise Comparison:
Modern Perspective:
Conclusion:
RISC CISC
Fast, efficient for pipelined execution, but requires Rich, compact code with powerful instructions, but slower
more instructions and more complex
Better suited for embedded, mobile, low-power Common in desktops, servers, and legacy software
devices environments
Multicore processors contain multiple processing cores on a single chip. Each core can independently execute
instructions, enabling parallel processing, better multitasking, and improved performance.
Both Intel and AMD design powerful multicore CPUs for various computing needs—from everyday use to
high-performance computing.
Intel is a leader in CPU innovation, widely used in personal computers, servers, and laptops.
Common Series:
Key Technologies:
Technology Description
Hyper-Threading (HT) Allows one physical core to appear as two logical threads, improving
parallelism.
Turbo Boost Dynamically increases clock speed of cores when fewer threads are active
or thermal limits allow.
Smart Cache Shared L3 cache that adapts dynamically among cores for better
performance.
Intel VT (Virtualization Enables efficient virtual machine operation by isolating guest and host
Technology) systems.
Integrated Graphics (Intel Iris, Eliminates the need for a separate GPU in most basic tasks.
UHD)
Strengths:
AMD offers a highly competitive lineup for both consumer and enterprise use, often focusing on more cores
and threads per processor.
Common Series:
Key Technologies:
Technology Description
SMT (Simultaneous AMD’s version of hyper-threading, allowing each core to run two threads.
Multithreading)
Infinity Fabric A high-speed interconnect that links cores, memory controllers, and I/O
devices across the chip.
Chiplet Architecture AMD uses multiple small dies (chiplets) instead of one large monolithic die,
improving yield and scalability.
Overclocking Support Most Ryzen CPUs are unlocked, allowing manual tuning for extra
performance.
Strengths:
Integrated Graphics Intel Iris/UHD (in most chips) Only in select models (e.g., G-series)
Gaming Intel (slightly better single-core), AMD (great for multithreaded games)
Virtualization & Servers AMD EPYC (more cores), Intel Xeon (robust ecosystem)
Conclusion:
Intel focuses on high clock speed, lower latency, and mature integration.
AMD excels in multi-threaded tasks, cost-effectiveness, and modular scalability.
Both companies are now close in terms of performance and innovation, making the choice depend more on
specific use case and budget.