Topic 1 – Introductory Concepts
1.1 Introduction to Computer Organization
1. Definition – Computer Organization: The operational structure describing how hardware
components interact to implement architectural functions.
2. Architecture vs Organization Principle: Architecture → what the computer does (ISA);
Organization → how it is built to do it.
3. Organization deals with internal details like buses, registers, memory hierarchy, and control
signals that realize instruction execution.
4. Efficient organization reduces delay and cost while increasing throughput.
5. Example: Two systems may share the same ISA but differ in pipeline depth, cache size, or
control logic—giving different performance.
6. The key principle is abstraction—separating logical design from physical implementation.
Fig. 1 – Basic Computer Organization Block Diagram (Input, Output, Memory, ALU, Control Unit
connected via buses)
1.2 Basic Structure of a Computer
1. A computer performs data processing through five major units—Input, Output, Memory,
ALU, and CU.
2. Input Unit: Converts external data into machine-readable binary and transfers it to main
memory.
3. Output Unit: Translates processed binary information into user-understandable form.
4. Memory Unit: Temporarily or permanently stores instructions & data; divided into primary
(RAM, ROM) and secondary (storage drives).
5. ALU (Arithmetic Logic Unit): Executes all arithmetic and logic operations; sets condition flags
(zero, carry, overflow).
6. CU (Control Unit): Fetches, decodes, and issues control signals to coordinate operations.
7. The three bus types—data, address, control—enable communication between all units.
8. Principle of Stored Program: Instructions are stored in memory and executed sequentially by
fetching and decoding them.
Fig. 2 – Basic Computer System Interconnection Diagram
1.3 Von Neumann and Harvard Architectures
1. Von Neumann Architecture: Single shared memory for both instructions and data.
Sequential fetch–decode–execute cycle.
2. The shared bus restricts simultaneous access, creating the Von Neumann bottleneck—a key
limiting principle.
3. Harvard Architecture: Uses separate instruction and data memories with individual buses,
enabling parallel access.
4. Improves speed but increases hardware complexity.
5. Modified Harvard: Combines shared main memory with separate caches for code & data
(used in modern CPUs).
6. Principle of Parallelism: Speed improves when instruction fetch and data access occur
simultaneously.
Fig. 3 – Comparison of Von Neumann and Harvard Architectures
1.4 Performance Metrics
1. Performance Definition: The speed at which a computer completes tasks; inversely
proportional to execution time.
2. Clock Speed (f): Number of clock cycles per second (Hz). Higher f → faster processing if CPI
constant.
3. CPI (Cycles per Instruction): Average cycles to execute one instruction. Lower CPI → better
efficiency.
4. Execution Time Principle:
𝑇 = Instruction Count × 𝐶𝑃𝐼 × Clock Cycle Time
Instruction Count
5. MIPS: . Simple but ISA-dependent.
Execution Time×106
6. Throughput: Total work done per unit time — useful for parallel systems.
𝑇
7. Speedup Principle: 𝑆 = 𝑇 old .
new
8. Efficiency: How well resources are utilized for performance gain.
Fig. 4 – Relationship between Clock Speed, CPI, and Execution Time
1.5 Number Representation
1. Definition: Method of representing data and numbers in binary form for digital processing.
2. Binary System: Base-2 (0 & 1); foundation for all computation.
3. Octal / Hexadecimal: Compact binary forms used for addressing and debugging.
4. Fixed-Point: Integer representation with fixed decimal position.
5. Floating-Point: For real numbers with variable exponent; follows IEEE 754 standard.
6. Single Precision: 1 sign, 8 exponent, 23 mantissa bits; Double Precision: 1 + 11 + 52 bits.
7. Normalization Principle: Keeps mantissa in range [1, 2) to maintain accuracy.
8. Rounding Modes: Nearest-even, toward zero, etc. defined by IEEE 754.
9. Errors: Rounding & overflow handled by exception flags.
Fig. 5 – IEEE 754 Single-Precision Floating-Point Format
1.6 Booth’s Multiplication Algorithm
1. Definition: Algorithm for signed binary multiplication using 2’s complement encoding.
2. Reduces arithmetic operations by encoding consecutive 1’s in the multiplier.
3. Rule: Pair (current, previous) bits: 01 → Add; 10 → Subtract; 00/11 → No op.
4. After each operation, perform Arithmetic Right Shift (ARS).
5. Handles positive and negative operands uniformly.
6. Principle of Encoding Efficiency: Replace multiple operations by simpler equivalent
sequences.
Fig. 6 – Flow of Booth’s Multiplication Algorithm
1.7 Division Algorithms
1. Restoring Division: Subtract divisor; if remainder < 0 → restore (previous value) & record 0;
else record 1.
2. Non-Restoring Division: If remainder negative → add divisor next cycle instead of restoring;
reduces steps.
3. Hardware Principle: Use shift-subtract operations for bit-wise division.
4. Both algorithms yield quotient and remainder efficiently in binary hardware.
Fig. 7 – Restoring and Non-Restoring Division Process Flow
1.8 Moore’s Law
1. Definition: Observation by Gordon Moore that transistor density on chips doubles
approximately every two years.
2. Principle of Exponential Growth: Performance ∝ number of transistors.
3. Resulted in faster processors and smaller devices at lower cost.
4. Physical limits (heat, quantum effects) now slow this trend.
5. Drove shift to multi-core and parallel architectures for continued performance scaling.
Fig. 8 – Moore’s Law Graph (Transistor Count vs Year)
1.9 Floating-Point Arithmetic
1. Principle: Align exponents → operate on mantissas → normalize → round result.
2. Hardware FPU (Floating Point Unit) executes these operations through pipelines.
3. Handles overflow, underflow, NaN, and rounding errors.
4. Essential for scientific calculations requiring wide dynamic range and precision.
Fig. 9 – Floating-Point Operation Stages
1.10 Summary
• Organization = how architecture is implemented.
• Five basic units work via bus systems under stored-program principle.
• Harvard architecture overcomes Von Neumann bottleneck.
• Performance depends on Clock Rate, CPI, Instruction Count.
• Binary and floating-point representations form numerical basis of computing.
• Booth’s and division algorithms optimize arithmetic.
• Moore’s Law explains technology scaling and parallel growth.
Important Definitions & Principles
• Computer Organization: Physical structure implementing architecture.
• Computer Architecture: Logical specification of system capabilities (ISA).
• Stored Program Concept: Instructions stored in memory and executed sequentially.
• Von Neumann Bottleneck: Single bus limits instruction/data throughput.
• CPI (Cycles per Instruction): Average cycles to complete one instruction.
• Throughput: Total work done per unit time.
• IEEE 754 Standard: Defines floating-point representation & rounding.
• Booth’s Principle: Encode runs of 1’s to reduce operations.
• Moore’s Law: Transistor count doubles ≈ every two years.
• Normalization Principle: Maintain mantissa in fixed range for precision.
Important Questions – Topic 1
1. Explain the functional units of a computer with a neat diagram.
2. Differentiate between computer architecture and organization.
3. Discuss the Von Neumann and Harvard architectures and explain the bottleneck.
4. Derive the relation among instruction count, CPI, and clock rate.
5. Define and compute MIPS; list its limitations.
6. Explain IEEE 754 single precision floating-point format with example.
7. Describe Booth’s multiplication algorithm step by step.
8. Explain restoring and non-restoring division methods.
9. State Moore’s Law and its impact on computer design.
10. Write short notes on throughput, latency, and speedup.
11. Explain the stored-program principle with a diagram.
12. Define normalization and rounding in floating-point arithmetic.
13. Explain binary, octal, and hexadecimal number systems.
14. Write differences between fixed and floating-point numbers.
15. What are the main principles that affect CPU performance?
Topic 2 – Processor Organization
(From Hamacher | Stallings | Patterson & Hennessy | Tanenbaum)
(Times New Roman 11 pt – for Word → Save as PDF)
2.1 Introduction to Processor Organization
1. The processor (CPU) is the brain of the computer that fetches, decodes, and executes
instructions.
2. It coordinates all hardware components and manages data transfer between memory and
I/O devices.
3. The CPU internally consists of functional blocks such as the Arithmetic Logic Unit (ALU),
Control Unit (CU), and Register File.
4. A processor executes a sequence of micro-operations each cycle, controlled by timing and
control signals.
5. Organization of these blocks determines execution speed, pipeline efficiency, and
instruction throughput.
6. Principle – Hierarchical Design: Divide CPU into smaller modules (ALU, CU, Registers) for
better control and parallelism.
7. The processor also manages data paths, the routes by which operands travel between
registers, ALU, and memory.
Fig 2.1 – Simplified Processor Organization (Block Diagram showing Registers, ALU, Control Unit
connected via Internal Buses)
2.2 CPU Functional Units
1. Arithmetic Logic Unit (ALU): Performs arithmetic (add, subtract, multiply) and logic (AND,
OR, NOT, XOR) operations.
2. ALU also sets status flags (Zero, Carry, Sign, Overflow) stored in a Program Status Word
(PSW) register.
3. Control Unit (CU): Directs the entire operation by decoding instructions and issuing control
signals.
4. Registers: Temporary storage inside CPU for fast data access. Includes general-purpose,
special, and status registers.
5. Data Bus: Transfers operands between registers and memory.
6. Address Bus: Carries address of memory location being accessed.
7. Control Bus: Carries signals like read/write, interrupt request, and clock.
8. Principle – Locality of Reference: Frequently accessed data kept in registers to minimize
memory access delay.
Fig 2.2 – CPU Functional Unit Block Diagram
2.3 Registers and Register Organization
1. Registers are high-speed memory cells inside CPU; faster than cache and main memory.
2. General Purpose Registers (GPRs): Hold intermediate arithmetic or logic results.
3. Special Purpose Registers: Include Program Counter (PC), Instruction Register (IR), Memory
Address Register (MAR), Memory Data Register (MDR), and PSW.
4. Program Counter (PC): Contains the address of the next instruction to be fetched.
5. Instruction Register (IR): Holds the current instruction being decoded/executed.
6. Stack Pointer (SP): Points to top of stack used for subroutine calls and returns.
7. Base and Index Registers: Used for address calculation in complex addressing modes.
8. Register design follows Principle of Speed Hierarchy – keep frequently used data closest to
ALU.
9. Larger register sets increase performance but complicate instruction decoding.
Fig 2.3 – Register Organization with Data and Address Paths
2.4 Instruction Execution Cycle
1. Every instruction undergoes the Fetch–Decode–Execute sequence.
2. Fetch: Control Unit sends PC to MAR → memory → instruction copied to IR → PC + 1.
3. Decode: CU interprets opcode and determines the required operands and operations.
4. Execute: ALU performs operation; results stored in register or memory.
5. Memory Access: For load/store instructions, MDR interfaces between CPU and memory.
6. Write-Back: Final result returned to destination register.
7. Modern processors overlap these phases through pipelining to improve throughput.
8. Principle – Temporal Parallelism: While one instruction executes, another is being fetched.
Fig 2.4 – Instruction Cycle Flow Diagram (Showing Fetch, Decode, Execute, Write-Back Stages)
2.5 Instruction Formats and Addressing Modes
1. Instruction Format: Defines layout – opcode + operand fields + address specifiers.
2. Common types – 0-address (stack), 1-address, 2-address, and 3-address formats.
3. Addressing Modes: Define how operand addresses are computed.
o Immediate: Operand in instruction itself.
o Register: Operand in CPU register.
o Direct: Address part of instruction.
o Indirect: Address field points to another address.
o Indexed: Effective Address = Base + Index + Displacement.
o Relative: Address = PC + Offset (for branches).
4. Principle – Flexibility in Addressing: Enables compact code and faster execution.
Fig 2.5 – Example Instruction Format with Opcode and Operand Fields
2.6 Data Path and Control Path
1. Data Path: Hardware that moves and processes data (registers, ALU, multiplexers, buses).
2. Control Path: Generates timing and control signals directing the data path.
3. Data Path performs actual computation; Control Path ensures correct sequencing.
4. Control Unit can be Hardwired (fixed logic) or Microprogrammed (firmware).
5. Hardwired Control: Fast but inflexible; changes require redesign.
6. Microprogrammed Control: Uses control memory storing microinstructions; slower but
easier to modify.
7. Principle – Separation of Data and Control Flow improves modularity and debugging.
Fig 2.6 – Data Path and Control Path Interaction Diagram
2.7 Micro-operations and Control Signals
1. A micro-operation is a basic operation on data stored in registers (transfer, shift, add, etc.).
2. Example: R2 ← R1 + R3 represents three micro-operations (fetch operands, add, store).
3. Control signals trigger specific micro-operations every clock cycle.
4. Micro-operations classified as:
o Register Transfer (e.g., R1 ← R2)
o Arithmetic / Logic (ADD, AND)
o Shift (Right/Left)
o Memory Transfer (Read/Write)
5. Principle – Synchronization: All micro-operations occur in fixed time slots defined by clock
pulses.
Fig 2.7 – Sequence of Micro-operations per Clock Cycle
2.8 Control Unit Organization
1. Hardwired Control Unit: Implemented using combinational logic – fast and suitable for RISC.
2. Microprogrammed Control Unit: Control signals stored as microinstructions in control
memory.
3. Microinstruction = control word specifying all signals for one micro-operation.
4. Horizontal Microprogramming: Many bits (one per signal) → parallel control.
5. Vertical Microprogramming: Encoded signals → compact but needs decoder.
6. Principle – Trade-off between Speed and Flexibility.
Fig 2.8 – Microprogrammed Control Unit Structure (Control Memory, Decoder, Sequencer)
2.9 Pipeline Organization
1. Pipelining: Technique of overlapping execution of multiple instructions.
2. Each instruction divided into stages (fetch, decode, execute, memory, write-back).
3. While one stage executes for instruction N, next stage fetches instruction N+1.
4. Improves throughput ≈ number of stages × single-instruction time.
5. Hazards: Data (operand dependency), Control (branch delay), and Structural (resource
conflict).
6. Solutions: Forwarding, branch prediction, and pipeline stalling.
7. Principle – Temporal Overlapping: Execution speed increases by concurrent stage
processing.
Fig 2.9 – 5-Stage Instruction Pipeline Diagram
2.10 Interrupt and Exception Handling
1. Interrupt: External event requesting CPU attention (e.g., I/O completion).
2. Exception: Internal event like division by zero or page fault.
3. On interrupt, CPU saves current state (PC, flags) and transfers control to ISR (Interrupt
Service Routine).
4. After servicing, CPU restores state and resumes execution.
5. Vectored Interrupts: Each device has a unique service address.
6. Principle – Asynchronous Event Handling ensures system responsiveness.
Fig 2.10 – Interrupt Cycle Flow Diagram
Important Definitions & Principles
• Processor Organization: Internal arrangement of CPU functional units.
• ALU: Performs arithmetic and logical operations.
• Control Unit: Directs operation of processor by issuing control signals.
• Register File: Set of high-speed storage elements for temporary data.
• Instruction Cycle: Sequence Fetch–Decode–Execute–Write Back.
• Addressing Mode: Method to calculate effective operand address.
• Hardwired Control: Fixed logic circuit for control signals.
• Microprogramming: Control through microinstructions stored in ROM.
• Pipelining Principle: Overlap execution to increase throughput.
• Interrupt: Mechanism for asynchronous event handling by CPU.
Important Questions – Topic 2 (Processor Organization)
1. Draw and explain the functional block diagram of a CPU.
2. Describe the role and types of registers in processor organization.
3. Explain the fetch–decode–execute cycle with neat diagram.
4. Differentiate between hardwired and microprogrammed control units.
5. Define micro-operations and classify them with examples.
6. What is pipelining? Explain its stages and hazards.
7. Discuss various addressing modes with examples.
8. Explain the functions of control path and data path in a processor.
9. Describe horizontal and vertical microprogramming schemes.
10. Explain interrupt handling cycle with flow diagram.
11. What is the difference between instruction format and addressing mode?
12. Write short notes on pipeline hazards and their remedies.
13. Explain the principle of locality and its impact on CPU design.
14. Define control signals and their role in micro-operations.
15. Compare RISC and CISC organization in terms of control unit design.
Topic 3 – Memory Organization
3.1 Introduction to Memory Organization
1. The memory unit is a core component of computer architecture responsible for storing data,
instructions, and intermediate results.
2. It acts as a bridge between the CPU and input/output devices.
3. The organization of memory defines its structure, hierarchy, access time, and capacity.
4. Efficient memory organization ensures the CPU gets data quickly, minimizing idle time.
5. The Principle of Locality (temporal and spatial) is key — programs access a small portion of
memory repeatedly over short intervals.
6. The memory system is arranged hierarchically to optimize speed and cost.
7. Fast memory (registers, cache) is small and expensive, while large memory (disk, secondary)
is slower but cheaper.
Fig 3.1 – General Memory Organization (CPU ↔ Cache ↔ Main Memory ↔ Secondary
Storage)
3.2 Memory Hierarchy
1. Definition: Memory hierarchy arranges storage devices based on speed, cost, and capacity.
2. Typical hierarchy: Registers → Cache → Main Memory (RAM) → Secondary (HDD/SSD) →
Tertiary (Optical/Tape).
3. As we go down the hierarchy, capacity increases, cost per bit decreases, but access time
increases.
4. Registers: Fastest and smallest; located inside CPU.
5. Cache: Stores frequently accessed data; invisible to user.
6. Main Memory (RAM): Volatile and directly accessible by CPU.
7. Secondary Memory: Non-volatile, stores OS, programs, and data.
8. Principle – Speed vs Cost Trade-off: Achieve best performance using multi-level storage.
9. Each level acts as a buffer for the level below it.
Fig 3.2 – Memory Hierarchy Pyramid Diagram
3.3 Cache Memory Organization
1. Cache memory is a small, high-speed memory placed between the CPU and main memory.
2. Stores copies of frequently used instructions and data to reduce access time.
3. Hit: Requested data found in cache; Miss: Not found, fetched from main memory.
4. Cache performance measured by Hit Ratio = Hits / (Hits + Misses).
5. Cache can be L1 (internal), L2, or L3 (external/shared).
6. Mapping Techniques:
o Direct Mapping: Each block maps to one cache line (simple, fast).
o Associative Mapping: Block can go anywhere in cache (flexible, slower).
o Set-Associative: Hybrid of the two (e.g., 4-way associative).
7. Replacement Policies: LRU (Least Recently Used), FIFO, Random.
8. Write Policies: Write-through (updates both cache & memory) or Write-back (update later).
9. Principle – Temporal & Spatial Locality: Cache effectiveness relies on repeated access to
nearby data.
Fig 3.3 – Cache Memory Structure and Mapping Example
3.4 Main Memory Organization
1. Main Memory (Primary Memory): Directly accessible by CPU; volatile storage used for active
processes.
2. RAM (Random Access Memory):
o Static RAM (SRAM): Uses flip-flops, faster, used in cache.
o Dynamic RAM (DRAM): Uses capacitors, slower but higher density.
3. ROM (Read Only Memory): Non-volatile, stores firmware (BIOS). Types: PROM, EPROM,
EEPROM, Flash ROM.
4. Memory Cell Organization: Each cell stores 1 bit; arranged in matrix form (rows = word lines,
columns = bit lines).
5. Principle – Random Access: Any location can be accessed directly by its address.
6. Word: Fixed number of bits processed together by CPU (e.g., 32-bit word).
7. Byte Addressable Memory: Each byte has unique address.
Fig 3.4 – Basic Memory Cell Array Diagram
3.5 Memory Addressing and Interleaving
1. Addressing: Process of identifying and accessing specific memory locations.
2. Physical address lines are used to locate bytes/words in memory modules.
3. Memory Interleaving: Technique to increase throughput by dividing memory into banks
accessed simultaneously.
4. High-order interleaving: Sequential addresses go to different banks.
5. Low-order interleaving: Consecutive addresses go to consecutive banks.
6. Advantage: Allows parallel access, increasing effective bandwidth.
7. Principle – Parallel Access: Multiple memory modules working together reduce access time.
Fig 3.5 – Interleaved Memory Bank Diagram
3.6 Virtual Memory
1. Definition: A memory management technique that gives illusion of large continuous memory
using secondary storage.
2. Uses paging or segmentation to map virtual addresses to physical ones.
3. Page Table: Maintains mapping between virtual pages and physical frames.
4. When a required page isn’t in memory, a page fault occurs — OS loads it from disk.
5. Principle – Address Translation: Each virtual address is converted to a real memory address
using hardware support (MMU).
6. Advantages: Efficient use of memory, allows running large programs.
7. Disadvantages: Higher latency due to swapping.
8. TLB (Translation Lookaside Buffer): Cache for page table entries to speed up translation.
Fig 3.6 – Virtual Memory with Page Table and TLB
3.7 Memory Mapping Techniques
1. Mapping defines how data from main memory is placed into cache.
2. Direct Mapping: Each block maps to one fixed cache location.
o Simple, fast lookup but prone to conflicts.
3. Associative Mapping: Block can be stored anywhere in cache; flexible but slower search.
4. Set-Associative Mapping: Cache divided into sets, each holding multiple blocks (balanced
approach).
5. Replacement Policies: Decide which block to replace on a miss.
o LRU (Least Recently Used): Replaces least used block.
o FIFO (First In, First Out): Oldest block replaced.
o Random: Simple but less efficient.
6. Principle – Optimal Replacement: Choose block whose next use is farthest in future.
Fig 3.7 – Mapping Methods Comparison Diagram
3.8 Memory Performance
1. Performance measured using Access Time, Cycle Time, and Bandwidth.
2. Access Time: Time between read request and data availability.
3. Cycle Time: Time before next operation can start.
4. Effective Access Time (EAT):
𝐸𝐴𝑇 = (𝐻𝑖𝑡𝑅𝑎𝑡𝑖𝑜 × 𝐶𝑎𝑐ℎ𝑒𝑇𝑖𝑚𝑒) + (𝑀𝑖𝑠𝑠𝑅𝑎𝑡𝑖𝑜 × 𝑀𝑒𝑚𝑜𝑟𝑦𝑇𝑖𝑚𝑒)
5. Bandwidth: Data transfer rate between CPU and memory (bytes/sec).
6. To improve memory speed, designers use interleaving, cache, and TLB.
7. Principle – Performance Optimization: Minimize average access time through hierarchy and
caching.
Fig 3.8 – Memory Performance Metrics Illustration
3.9 Secondary Memory
1. Definition: Non-volatile storage used for permanent data (HDDs, SSDs).
2. Accessed through I/O controllers; not directly by CPU.
3. Hard Disk Drive (HDD): Uses magnetic storage on rotating platters.
4. Solid-State Drive (SSD): Uses NAND flash memory; faster, no moving parts.
5. Optical Discs: CDs, DVDs for backup and distribution.
6. Principle – Non-Volatility: Data retained even after power off.
7. Secondary memory complements main memory by providing long-term storage.
Fig 3.9 – HDD and SSD Structure Comparison Diagram
3.10 Memory Protection and Reliability
1. Memory Protection: Ensures one process cannot access another’s memory space.
2. Implemented by base and limit registers or page protection bits.
3. Parity Bit: Detects single-bit errors by ensuring even/odd parity.
4. Error Correction Codes (ECC): Detect and correct multiple-bit errors.
5. Dynamic Refresh: Required for DRAM cells to maintain charge.
6. Principle – Reliability and Integrity: Maintain data correctness during storage and transfer.
Fig 3.10 – ECC and Parity Bit Logic Diagram
Important Definitions & Principles
• Memory Organization: Arrangement of memory for efficient data storage and retrieval.
• Memory Hierarchy: Arrangement based on speed, cost, and capacity.
• Cache Memory: High-speed buffer between CPU and RAM.
• Virtual Memory: Technique to simulate larger memory using secondary storage.
• Page Table: Structure mapping virtual to physical addresses.
• Hit Ratio: Fraction of memory accesses found in cache.
• TLB: Translation Lookaside Buffer; speeds up address translation.
• Interleaving: Parallel access through multiple memory modules.
• Parity & ECC: Error detection and correction methods.
• Principle of Locality: Recently/frequently used data is likely to be reused soon.
Important Questions – Topic 3 (Memory Organization)
1. Explain the memory hierarchy in detail with a neat diagram.
2. Define cache memory. Describe its mapping techniques and replacement policies.
3. Differentiate between SRAM and DRAM with applications.
4. Explain the concept of virtual memory and its working with a block diagram.
5. What is TLB? How does it improve virtual memory performance?
6. Describe the structure and working of main memory cell array.
7. Explain interleaved memory organization and its advantages.
8. Define memory hit, miss, and effective access time. Derive the formula for EAT.
9. Write short notes on: (a) Cache write policies (b) Memory protection methods.
10. Explain parity bit and ECC memory systems.
11. Discuss the principle of locality of reference and its importance.
12. Compare direct, associative, and set-associative mapping techniques.
13. Explain the structure and functions of RAM and ROM.
14. What is memory bandwidth? How can it be improved?
15. Describe different types of secondary memories with examples.
Topic 4 – Input/Output Organization
4.1 Introduction to I/O Organization
1. The Input/Output (I/O) system allows a computer to communicate with the external
environment.
2. Input devices (keyboard, mouse, sensors) provide data; output devices (display, printer)
present results.
3. Since CPU and peripherals operate at different speeds, I/O organization handles
synchronization and data transfer efficiently.
4. I/O operations are managed using I/O controllers, which act as intermediaries between CPU
and devices.
5. I/O systems are crucial for overall performance — poor I/O handling can cause CPU idling.
6. Principle – Parallel Device Management: Separate I/O modules allow multiple devices to
operate concurrently.
7. I/O interfaces convert signals, handle interrupts, and perform buffering for smooth data flow.
Fig 4.1 – Basic I/O System Block Diagram (CPU ↔ I/O Module ↔ Devices)
4.2 I/O Interface and Modules
1. The I/O Interface provides a communication link between CPU and peripheral devices.
2. Each interface includes data registers, status/control registers, and a controller.
3. Data Register: Temporarily stores data being transferred.
4. Status Register: Indicates device condition (ready, busy, error).
5. Control Register: Holds control information (e.g., start, stop, read, write).
6. I/O Module: Contains interface logic and controller to manage one or more devices.
7. Principle – Device Independence: CPU interacts with I/O devices using standard protocols
regardless of device type.
8. Communication is usually done via memory-mapped I/O or isolated I/O.
Fig 4.2 – I/O Interface Block Diagram Showing Registers and Control Signals
4.3 I/O Techniques
1. I/O operations can be handled using three major techniques:
o Programmed I/O (Polling): CPU waits until device is ready.
o Interrupt-Driven I/O: Device interrupts CPU when ready, saving CPU time.
o Direct Memory Access (DMA): Data transferred directly between memory and
device without CPU intervention.
2. Programmed I/O: Simple but inefficient since CPU remains busy checking device status.
3. Interrupt-Driven I/O: CPU executes other instructions until interrupt received; more
efficient.
4. DMA: Uses a DMA controller (DMAC) that transfers blocks of data autonomously.
5. Principle – CPU Offloading: Delegate repetitive I/O tasks to hardware (DMA) for better
performance.
6. DMA increases throughput especially in high-speed data transfers like disks and networks.
Fig 4.3 – Comparison of Programmed, Interrupt, and DMA I/O
4.4 Interrupts and Priority
1. Interrupt: A signal that temporarily halts CPU execution to service an event.
2. Interrupt sources: I/O devices, hardware faults, timers, software traps.
3. CPU saves current state (PC, flags) and transfers control to Interrupt Service Routine (ISR).
4. Vectored Interrupts: Each device has a unique ISR address.
5. Non-Vectored: Common ISR entry point; device identified via polling.
6. Interrupt Priority: Determines order of servicing multiple simultaneous requests.
7. Priority handled by hardware (priority encoder) or software (masking).
8. Principle – Fast Event Handling: Quick response to asynchronous events ensures real-time
behavior.
Fig 4.4 – Interrupt Processing Sequence Diagram
4.5 Direct Memory Access (DMA)
1. Definition: Technique allowing peripherals to read/write memory directly, bypassing CPU.
2. DMA Controller: Handles address generation, data transfer, and synchronization.
3. During DMA transfer, the CPU is temporarily halted (cycle stealing).
4. DMA operates in three modes:
o Burst Mode: Transfers entire block in one go.
o Cycle Stealing Mode: Alternates between CPU and DMA access.
o Transparent Mode: Operates only when CPU is idle.
5. Improves performance in high-volume data operations like disk I/O.
6. Principle – Autonomous Data Transfer: Peripheral devices transfer data independently of
CPU.
Fig 4.5 – DMA Data Transfer Mechanism Diagram
4.6 Input/Output Processor (IOP)
1. IOP is a specialized processor that handles complex I/O operations without CPU assistance.
2. It has its own local memory, instruction set, and control logic.
3. CPU issues commands to IOP which performs I/O tasks concurrently.
4. Example: IBM mainframes use IOPs for parallel peripheral control.
5. Principle – I/O Parallelism: Separate processors handle I/O while CPU executes computation
tasks.
6. Improves system throughput by reducing CPU-I/O dependency.
Fig 4.6 – I/O Processor and CPU Communication Diagram
4.7 Memory-Mapped and Isolated I/O
1. Memory-Mapped I/O: I/O device registers share the same address space as memory.
o Device control via normal load/store instructions.
2. Isolated I/O: Uses separate address space; special IN/OUT instructions used.
3. Memory-mapped I/O provides uniform addressing and easier programming.
4. Isolated I/O avoids address conflicts between devices and memory.
5. Principle – Unified Addressing: Simplify control by treating I/O devices like memory
locations.
Fig 4.7 – Comparison of Memory-Mapped and Isolated I/O Addressing
4.8 I/O Bus and Interface Standards
1. I/O Bus: Connects CPU, memory, and peripherals; allows data and control signal
transmission.
2. Bus lines: Data Bus, Address Bus, and Control Bus.
3. Common standards:
o PCI (Peripheral Component Interconnect)
o USB (Universal Serial Bus)
o SCSI (Small Computer System Interface)
o SATA (Serial ATA)
4. Each standard defines speed, protocol, and voltage levels for compatibility.
5. Principle – Modularity: Use of standard interfaces ensures device interchangeability.
6. Bus arbitration manages multiple device access requests.
Fig 4.8 – Common I/O Bus Structure
4.9 Asynchronous and Synchronous Data Transfer
1. Synchronous Transfer: Data transfer synchronized by a common clock signal.
o Faster and predictable but needs tight timing.
2. Asynchronous Transfer: Each transfer initiated by request/acknowledge signals; allows
variable timing.
3. Asynchronous systems are more flexible but slower.
4. Used where devices operate at different speeds.
5. Principle – Handshaking Protocol: Ensures reliable data transfer between unsynchronized
devices.
Fig 4.9 – Synchronous vs Asynchronous Data Transfer (Handshake Signals)
4.10 Error Detection in I/O Communication
1. Transmission errors can occur due to noise, signal distortion, or interference.
2. Parity Bit: Adds 1 bit to data for simple error detection (even/odd parity).
3. Checksum: Sum of data blocks used to detect errors in data transmission.
4. CRC (Cyclic Redundancy Check): Polynomial division method for strong error detection.
5. Principle – Data Integrity: Ensures accurate and reliable I/O communication.
6. In modern systems, DMA and network I/O often use CRC-based verification.
Fig 4.10 – Error Detection Techniques in I/O Communication
Important Definitions & Principles
• I/O Organization: Method of connecting peripherals for efficient data exchange.
• I/O Module: Hardware interface between CPU and external devices.
• Programmed I/O: CPU-controlled I/O via polling.
• Interrupt: Signal to CPU requesting service from I/O device.
• DMA (Direct Memory Access): Transfers data directly between memory and device.
• IOP (Input/Output Processor): Processor dedicated to managing peripheral devices.
• Memory-Mapped I/O: Devices share memory address space.
• Synchronous Transfer: Data transfer with shared clock.
• Asynchronous Transfer: Transfer controlled by handshake signals.
• Principle of Parallelism: Multiple I/O operations executed concurrently for efficiency.
Important Questions – Topic 4 (I/O Organization)
1. Explain the structure and function of an I/O module with a neat block diagram.
2. Compare programmed I/O, interrupt-driven I/O, and DMA in detail.
3. What is DMA? Explain its working with different modes of data transfer.
4. Define interrupts. Describe interrupt processing and priority mechanism.
5. Discuss the role of I/O Processor (IOP) in computer systems.
6. Differentiate between memory-mapped and isolated I/O.
7. Explain synchronous and asynchronous data transfer with timing diagrams.
8. What is bus arbitration? Explain various bus standards (PCI, USB, SCSI).
9. Describe handshaking in asynchronous communication.
10. Explain error detection techniques used in I/O communication.
11. What are the advantages of DMA over interrupt-driven I/O?
12. Explain the principle of device independence in I/O system design.
13. Write short notes on:
• (a) I/O interface registers
• (b) Control and status flags
• (c) I/O synchronization methods.
14. Explain the working of an interrupt-driven I/O cycle with flowchart.
15. Compare synchronous vs asynchronous data transfer with advantages.
Topic 5 – Parallel Processing
5.1 Introduction to Parallel Processing
1. Parallel Processing refers to the simultaneous execution of multiple instructions or
operations to increase system performance.
2. It utilizes multiple processors or functional units working together to solve a problem faster
than a single processor.
3. Modern computing systems—from desktops to supercomputers—use parallelism for high
performance and efficiency.
4. Parallel processing exploits concurrency in programs to reduce total execution time.
5. Principle – Divide and Conquer: Divide a task into subtasks that can be processed
concurrently.
6. Parallelism can be implemented at instruction, data, or processor level.
7. Benefits include higher throughput, reduced latency, and better resource utilization.
8. Used in scientific computing, machine learning, simulations, and large database systems.
Fig 5.1 – Basic Parallel Processing Concept (Multiple Processors Executing Concurrent Tasks)
5.2 Types of Parallelism
1. Bit-Level Parallelism: Increases word size of processor (e.g., 8-bit → 16-bit → 64-bit) to
process more bits per instruction.
2. Instruction-Level Parallelism (ILP): Multiple instructions executed simultaneously via
pipelining or superscalar design.
3. Data-Level Parallelism (DLP): Same operation applied to multiple data elements (SIMD).
4. Task-Level Parallelism (TLP): Independent tasks executed concurrently (MIMD).
5. Thread-Level Parallelism: Programs divided into multiple threads running in parallel on
different cores.
6. Principle – Concurrency: Utilize multiple execution paths to increase system throughput.
Fig 5.2 – Levels of Parallelism (Bit, Instruction, Data, Task)
5.3 Flynn’s Taxonomy
1. Flynn’s Classification categorizes computer architectures based on instruction and data
streams.
2. SISD (Single Instruction Single Data): Sequential processing; traditional uniprocessor
systems.
3. SIMD (Single Instruction Multiple Data): Executes one instruction on multiple data (e.g.,
vector processors, GPUs).
4. MISD (Multiple Instruction Single Data): Rare; multiple instructions operate on the same
data stream.
5. MIMD (Multiple Instruction Multiple Data): Each processor executes different instructions
on different data.
6. MIMD includes shared-memory (tightly coupled) and distributed-memory (loosely coupled)
systems.
7. Principle – Structural Classification: Defines how processors and data interact in a parallel
system.
Fig 5.3 – Flynn’s Taxonomy Diagram
5.4 Pipelining and Vector Processing
1. Pipelining: Technique where multiple instruction stages (fetch, decode, execute, etc.) overlap
in time.
2. Improves throughput but not individual instruction latency.
3. Vector Processing: Operates on entire arrays of data using single instruction (SIMD).
4. Used in scientific and graphical computations where same operation applies to many data
elements.
5. Vector Registers: Store arrays of data for fast arithmetic operations.
6. Principle – Temporal Overlapping: Multiple stages execute concurrently in pipeline for
efficiency.
7. Pipelining can be scalar (per instruction) or vector (per data element).
Fig 5.4 – Instruction Pipeline and Vector Processing Structure
5.5 Multiprocessor Systems
1. A multiprocessor system has two or more CPUs sharing memory and peripherals.
2. Processors communicate through a common bus or interconnection network.
3. Types:
o Symmetric Multiprocessing (SMP): All processors share same memory and I/O
equally.
o Asymmetric Multiprocessing (AMP): One master CPU controls other slave CPUs.
4. Advantages: High reliability, scalability, and throughput.
5. Shared memory allows faster inter-processor communication.
6. Principle – Cooperative Execution: Multiple CPUs execute tasks collaboratively for better
performance.
Fig 5.5 – Symmetric Multiprocessing System Diagram
5.6 Interconnection Networks
1. Interconnection networks link multiple processors and memory modules for communication.
2. Classified as:
o Static Networks: Fixed connections (e.g., mesh, ring, hypercube).
o Dynamic Networks: Switch-based (crossbar, multistage, bus).
3. Bus Interconnection: Simple but prone to contention as CPUs increase.
4. Crossbar Switch: Provides dedicated path between CPU and memory.
5. Multistage Networks: Combine performance and cost efficiency (e.g., Omega network).
6. Principle – Communication Efficiency: Design topology to minimize latency and contention.
Fig 5.6 – Common Interconnection Network Topologies (Bus, Ring, Mesh, Crossbar)
5.7 Shared Memory and Distributed Memory
1. Shared Memory Systems: All processors share a common memory space; communicate via
shared variables.
2. Easy to program, but synchronization needed to prevent data conflicts.
3. Distributed Memory Systems: Each processor has its own local memory; communicate via
message passing.
4. Suitable for large-scale parallel systems and clusters.
5. Principle – Memory Access Synchronization: Manage shared resources efficiently to avoid
inconsistency.
6. Hybrid systems combine both shared and distributed features for better scalability.
Fig 5.7 – Shared vs Distributed Memory System Diagram
5.8 Synchronization and Communication
1. Synchronization ensures correct ordering and timing among concurrent processes.
2. Barriers: Force all processors to reach a point before continuing.
3. Locks/Semaphores: Used for mutual exclusion in shared resources.
4. Message Passing: Exchange of data among processors in distributed systems.
5. Principle – Coordination: Necessary to maintain data consistency and prevent race
conditions.
6. Efficient synchronization mechanisms minimize idle time in parallel systems.
Fig 5.8 – Synchronization Using Barrier and Message Passing
5.9 Parallel Computer Architecture Models
1. Shared-Memory Model: All processors access the same memory; used in SMP systems.
2. Distributed-Memory Model: Each processor has local memory; communicates via network
links.
3. Hybrid (NUMA) Model: Combination of both; non-uniform memory access time.
4. Dataflow Architecture: Execution driven by data availability, not control flow.
5. Multithreaded Architecture: Multiple threads share CPU resources to hide latency.
6. Principle – Parallel Design Variety: Different models suit different application workloads.
Fig 5.9 – Classification of Parallel Computer Architectures
5.10 Performance Metrics for Parallel Systems
1. Speedup (S): Ratio of execution time on single processor to multiple processors:
𝑇1
𝑆=
𝑇𝑝
2. Efficiency (E): Ratio of speedup to number of processors:
𝑆
𝐸=
𝑝
3. Amdahl’s Law: Limits speedup based on sequential fraction of code:
1
𝑆𝑚𝑎𝑥 =
𝑓
(1 − 𝑓) + 𝑝
4. Scalability: Ability of system performance to improve as processors increase.
5. Principle – Parallel Performance Trade-off: Parallel systems achieve diminishing returns
beyond certain processor counts.
Fig 5.10 – Amdahl’s Law Speedup Graph
Important Definitions & Principles
• Parallel Processing: Simultaneous execution of multiple operations to increase performance.
• Pipelining: Overlapping instruction execution stages.
• Flynn’s Taxonomy: Classification of architectures based on instruction/data streams.
• SISD, SIMD, MISD, MIMD: Fundamental categories of computer architectures.
• Multiprocessor System: Multiple CPUs sharing memory and I/O.
• Synchronization: Coordination among concurrent tasks to maintain correctness.
• Amdahl’s Law: Theoretical limit on parallel speedup.
• Shared vs Distributed Memory: Different models for data sharing in parallel systems.
• Scalability: Measure of how well performance scales with added resources.
• Principle of Concurrency: Perform multiple computations simultaneously to reduce
execution time.
Important Questions – Topic 5 (Parallel Processing)
1. Define parallel processing. Explain its need and advantages.
2. Discuss various levels of parallelism with suitable examples.
3. Explain Flynn’s taxonomy and classify different architectures.
4. What is pipelining? Explain with neat diagram and advantages.
5. Differentiate between SIMD and MIMD systems.
6. Describe the structure and working of a multiprocessor system.
7. Explain the types and significance of interconnection networks.
8. Compare shared-memory and distributed-memory systems.
9. Explain synchronization techniques used in parallel processing.
10. Derive Amdahl’s Law and explain its significance.
11. What are the main performance metrics of parallel systems?
12. Describe vector processing and its applications.
13. Write short notes on:
• (a) Speedup and efficiency
• (b) Scalability in parallel systems
• (c) Dataflow and multithreaded architectures.
14. Explain the concept of message passing and synchronization barriers.
15. Discuss the trade-off between parallelism and overhead in large systems.