COMPUTER SYSTEM DESIGN
Computer system design is the process of structuring and organizing the hardware and software components
of a computer so that they work together efficiently. It involves balancing performance, cost, power
consumption, and reliability while meeting user requirements.
Purpose
To ensure optimal performance of applications and processes.
To design a system that is scalable (can be upgraded or expanded).
To make computing systems reliable and fault-tolerant.
To provide a user-friendly and programmable environment.
Key Considerations
Performance
o Faster execution of instructions, lower latency.
Cost
o Hardware affordability and cost-efficiency.
Reliability
o Protection against hardware/software failures.
Energy Efficiency
o Lower power consumption (important in mobile devices).
Flexibility & Scalability
o Ability to support new features and growth.
COMPONENTS OF COMPUTER ARCHITECTURE
Following are the Components of Computer Architecture:
1. CPU (Central Processing Unit)
The brain of the computer that executes instructions and performs calculations.
Contains registers for temporary storage of data and instructions.
Fetches instructions from memory, decodes, and executes them.
Coordinates all operations of the computer system.
It includes:
o ALU (Arithmetic Logic Unit) to perform arithmetic and logical operations.
o CU (Control Unit) to control the sequence of operations and manage data flow.
2. Memory Unit
Stores data and instructions needed by the CPU.
Works closely with the CPU during instruction fetch and execution.
Helps improve overall system speed by providing quick access to data.
It includes:
o Primary memory (RAM, cache) provides fast access for current tasks.
o Volatile memory: Loses data when power is off (e.g., RAM).
o Non-volatile memory: Retains data permanently (e.g., ROM).
3. Input/Output (I/O) Devices
Input devices bring data and instructions into the computer (e.g., keyboard, mouse).
Output devices display or provide results from the computer (e.g., monitor, printer).
Allows the computer to interact with the user and external devices.
Some devices can function as both input and output (e.g., touch screen).
Managed through I/O controllers to coordinate data transfer.
Improves usability and interaction with the computer system.
4. Buses
Acts as a communication pathway between CPU, memory, and peripherals.
Ensures synchronized and proper data transfer.
Critical for overall system operation and performance.
It includes:
o Data Bus: Transfers actual data between CPU, memory, and I/O devices.
o Address Bus: Carries the memory or I/O address to specify where data should go.
o Control Bus: Carries control signals to manage operations (e.g., read, write).
5. Secondary Storage / Components
Provides permanent storage for programs and data.
Includes devices like hard drives, SSDs, optical discs, and flash drives.
Stores large amounts of data compared to primary memory.
Non-volatile → retains data even when power is off.
Acts as a backup or long-term storage for important data.
Works with the memory unit to load data into RAM when needed by the CPU.
PROCESSOR DESIGN
Processor design is the process of creating the CPU’s internal architecture, deciding how it will execute
instructions, handle data, and communicate with memory and I/O devices.
Key Aspects of Processor Design
1. Datapath Design:
Determines how data flows inside the CPU.
Includes ALU, registers, buses, and interconnections.
2. Control Unit Design:
Directs execution of instructions by generating control signals.
Two approaches:
Hardwired Control:
o Control signals are generated by combinational logic circuit gates.
o Fast but not flexible
o Used in RISC processor
Microprogrammed Control:
o Control signals are stored in special memory called the control memory.
o Each instruction is broken into microprogram.
o Used in CISC architecture
3. Pipeline Design:
Breaks instruction execution into stages (fetch, decode, execute, memory access, write-back).
Increases instruction throughput.
4. Parallelism:
Superscalar processors:
o Execute multiple instructions per cycle.
Multicore processors:
o Multiple CPUs integrated on one chip.
5. Performance Considerations:
Clock speed (GHz):
o Determines instruction execution rate.
CPI (Cycles Per Instruction):
o Lower CPI means faster execution.
Power efficiency:
o Balancing performance with energy use.
INSTRUCTION SET ARCHITECTURE
ISA is the part of the computer architecture that defines the set of instructions a processor can execute,
including operations, data types, addressing modes, registers, and memory architecture. It acts as an
interface between software and hardware, allowing programmers to write programs without worrying about
the internal hardware design.
Example of Instruction Set Architecture (ISA):
1. Arithmetic Instructions:
o ADD, SUB, MUL, DIV → perform basic math operations.
2. Data Transfer Instructions:
o MOV, LOAD, STORE → move data between memory and registers.
3. Control Instructions:
o JMP, CALL, RET → control the flow of program execution.
4. Logical Instructions:
o AND, OR, NOT, XOR → perform logical operations on data.
Working:
Programmer writes instructions in assembly or high-level language.
ISA defines how instructions are interpreted by the CPU.
CPU decodes instructions according to ISA rules.
CPU executes the operations on data in registers or memory.
Compiler converts high-level programs into machine language instructions compatible with the ISA.
Provides an interface between software and hardware, allowing programs to run without knowing
CPU internals.
Operates in the Fetch-Decode-Execute cycle:
o Fetch: CPU retrieves the instruction from memory.
o Decode: CPU interprets the instruction to understand the operation and operands.
o Execute: CPU performs the operation on data in registers or memory
o Result: CPU stores the result of the operation in a register or memory location, or sends it to
an output device if required.
Types
1. RISC
Reduced Instruction Set Computer
Uses a small, simple set of instructions.
Each instruction executes in one clock cycle.
Emphasizes register-to-register operations rather than memory operations.
Simpler hardware design.
Easier to implement pipelining
Example: ARM, MIPS processors.
2. CISC
Complex Instruction Set Computer
Has a large set of complex instructions.
Instructions can perform multiple operations in one instruction.
Some instructions take multiple clock cycles.
Reduces the number of instructions per program.
More complex hardware design.
Example: Intel x86 processors.
3. VLIW
Very Long Instruction Word
Packs multiple operations into a single long instruction.
Each instruction executes in parallel on multiple functional units.
Relies on compiler to schedule instructions efficiently.
Simple CPU hardware → complexity shifted to compiler.
Supports parallel execution of instructions.
Example: Itanium processors.
4. EPIC
Explicitly Parallel Instruction Computing
Compiler specifies which instructions run in parallel.
Enhances parallelism and performance.
Supports speculation and predication.
Reduces CPU hardware complexity.
Enables efficient instruction-level parallelism.
Example: Intel Itanium EPIC architecture.
ADDRESSING MODE
When the CPU executes an instruction like ADD or MOV, it needs to know where the data is.
Addressing tells the CPU where to find that data.
What is addressing mode?
It is a specific technique used to specify the operand location, that is where the data is:
Inside the instruction itself
In a register
In main memory
Computed using some combination of registers and memory
Types
1. Immediate Addressing Mode
The operand is given directly in the instruction itself.
No need to access memory or registers.
CPU doesn’t fetch data from memory.
The constant value is put inside the instruction.
Fastest addressing mode.
Example: MOV R1, 5
2. Register Addressing Mode
The operand is stored inside a CPU register.
Both operands are in registers.
No memory access, so it is very fast.
Often used for arithmetic and logical operations.
Example: MOV R1, R2
3. Direct Addressing Mode
The memory address of the operand is given directly in the instruction.
Example: MOV R1, [1000]
CPU reads the operand directly from memory location 1000.
The address is directly specified.
Slower than register and immediate addressing modes.
Accesses memory.
4. Register Indirect Addressing Mode
The instruction contains a register or memory location that stores the address of the operand (Not the
operand itself).
Example: MOV R1, [R2]
R2 contains the memory address where the actual operand is stored.
If R2 = 2000 and Memory[2000] = 50, then after execution R1 = 50.
Add one level of instruction (address of address).
Used for accessing data structures like array or linked list.
5. Indexed Addressing mode
The effective address of the operand is calculated by the adding a constant (index) to a base register.
Formula: EA = Base Register + Index Value
Example: MOV R1, [R2 + 5]
Commonly used for array element or separate data structures.
The index allows moving from one element to the next operand is in memory at address,
contents of R2 + 5.
R2 = 1000, then EA = 1005.
6. Base Register Addressing Mode
Similar to index addressing mode, but the base register contains the starting address of the memory
block, and a displacement (offset) is added.
Formula: EA=Base register+Displacement
Example: MOV R1 ,[Bx+10]
Access the memory address = value in Bx+10.
Commonly in system segmented memory used for relocatable code.
Programs that can move in memory.
7. Relative Addressing Mode
Relative addressing mode is an addressing mode in which the effective address (EA) is calculated by:
EA = Program Counter + Offset
The offset (also called displacement) is given inside the instruction.
Instead of specifying an absolute memory address, it tells the CPU to move relative to the current
instruction.
Use of Relative Addressing Mode
Branch instructions (conditional or unconditional)
Loops
Function calls
It allows jumps such as:
o Jump forward (+ offset)
o Jump backward (− offset)
o Jump conditionally
Example: Steps:
Current address = 1000
EA = PC + offset PC = 1002 (2 byte size)
= Jum + 10 Offset = +10
EA = 1002 + (+10) = 1012
8. Register Indirect Addressing Mode
Similar to indirect addressing, but the register holds the address of the operand (not a memory location).
Example: ADD R1(R2)
Add the value pointed by R2 to R1
If R2 = 3000 and memory [3000] = 25, then the result
R1 = 25
Indirect addressing mode
MOV R1, [R2]
CONTROL STRUCTURE AND MICROPROGRAMMING
What is control unit?
The control unit is a part of CPU that directs the operations of the processor.
The interrupts information and generates control signals to coordinate data movement and operation
is:
o ALU
o Register
o Memory
o I/O Device
o Buses
Types
1. Hardwired CPU
Control signals are generated by combinational logic circuit gates (flipflop).
Characteristics
Fast but not flexible
Used in RISC processor
Pros
Very high speed
Low control memory requirement
Cons
Difficult to modify
Complex logic for large instruction set
2. Microprogrammed CU
Control signals are stored in special memory called the control memory (CM).
Characteristics
Each instruction is broken into microprogram.
Used in CISC architecture
Pros
Easy to modify
Easier to design
Cons
Slower than hardwired control
Require control memory
Microprogramming
Writing small programs (microprograms) that control how each machine instruction works internally.
Micro-instructions
Smallest instruction is the smallest operation in the control unit.
It contains bits (fields) that directly generates control signals.
It may controls
o ALU operations
o Register transfer
o Memory read/write
o Program control update
o Condition checking
MAR R1
MDR Memory [MAR]
ALU R2 + MDR
R1 ALU Output
CACHE MEMORY
Cache memory is a small, high-speed memory located between the CPU and main memory. It stores
frequently used data and instructions to provide faster access for the CPU and improve overall system
performance. Modern CPUs may have separate caches for data and instructions.
Processor Cache Memory RAM
Types/Levels of Cache Memory
CPU cache memory is categorized into three levels:
1. L1 Cache
Fastest memory in computer, located closest to the CPU.
Stores the CPU's frequently accessed data and instructions.
Size determined by CPU, varies by processor.
High-speed but expensive and limited in size.
Typically divided into two sections:
o Data Cache: Stores operation data.
o Instruction Cache: Stores CPU operation instruction.
2. L2 Cache
Second level of cache.
Larger but slower than L1 cache.
May be inside or outside the CPU.
If not present inside the core, it can be shared between cores.
Connected to CPU via high-speed bus.
Size typically 256KB to 32MB.
Provide intermediate storage for frequently used data.
3. L3 Cache
Third level of cache
Largest but slowest
Usually outside CPU cores and shared by all cores
Helps improve performance of L1 and L2 caches
Size range from 1MB to 128MB depending on CPU
Modern CPUs have on-chip L3 cache, earlier CPUs had it on the motherboard.
Cache miss/hit
If data is found in cache or their any level, it is called hit.
Otherwise it is called miss.
And then moves it another level.
Cache Hit = Hit / (Hit + miss)
Example:
Total memory access = 100, Hit = 80, Miss = 20.
Cache hit = 80 / (20 + 80)
Cache hit = 80 / 100
Cache hit = 0.8
Cache miss = 1 - Ratio of hit
Cache miss = 1 - 0.8
Cache hit = 0.2
MEMORY HIERARCHY
The memory hierarchy organizes different types of memory into layers based on speed, size, and cost to
balance performance and efficiency in a computer system.
Register
It is the top layer in memory hierarchy.
Registers are very small memory units inside the CPU, holding 32 or 64 bit data.
They are the fastest and most expensive memory used in very small amounts for quick data access.
Types include:
o General purpose register
o Special purpose register etc.
Cache
Cache memory is also known as SRAM.
It is measured in KB or MB.
It stores frequently used data to speed up the CPU tasks.
It is also built into the internal CPU, making it part of the internal and very fast memory.
It has three levels:
o L1 Cache
o L2 Cache
o L3 Cache
Main Memory
Main memory also called DRAM
It is the most commonly used memory.
It stores data and instructions that the CPU needs while working.
It is slower and cheaper than cache and measured in GB.
RAM connects directly with the CPU and other devices
Secondary Memory
It is the bottom layer.
Secondary memory is a non-volatile storage used to store data and programs permanently.
It has the largest storage capacity and is the slowest memory in the hierarchy but is also the cheapest.
It is measured in GB or TB
It is not directly accessed by the CPU data must first be loaded into RAM before processing.
Examples
o Hard Disk Drives (HDD)
o Solid State Drives (SSD)
Why we need memory hierarchy?
Due to very large gap.
CPU speed Memory speed.
CPU operates in ns.
DRAM operates in ten/hundred second.
Disk without operates like in ms.
Without hierarchy, CPU would be idle.
Most of the time waiting for data.
Memory hierarchy reduce waiting time using:
o Locality of reference
o Caching
o Multilevel memories
Registers are located inside the CPU.
Cache memory is located inside the CPU.
RAM is located near the CPU.
Cache miss/hit
If data is found in cache or their any level, it is called hit.
Otherwise it is called miss.
And then moves it another level.
Cache Hit = Hit / (Hit + miss)
Example:
Total memory access = 100, Hit = 80, Miss = 20.
Cache hit = 80 / (20 + 80)
Cache hit = 80 / 100
Cache hit = 0.8
Cache miss = 1 - Ratio of hit
Cache miss = 1 - 0.8
Cache hit = 0.2
INTERRUPT
An interrupt is a signal sent to the CPU by hardware or software indicating that immediate attention is
required. The CPU temporarily halts the current process, saves its state, and executes the interrupt service
routine.
Classes of Interrupt
1. Supervised Call (Software Interrupt / Supervisor Call)
Generated intentionally by a program to request operating system services.
Example:
o System calls like reading from a file or printing to a screen.
2. I/O Interrupt
Generated by input/output devices to signal the CPU that an operation is complete or needs attention.
Example:
o Keyboard key press, disk read/write completion.
3. External Interrupt
Caused by external hardware devices outside the CPU.
Example:
o Timer interrupts or signals from peripheral devices.
4. Restart (Reset Interrupt)
Occurs when the system needs to restart or reset operations due to failures or manual reset.
Helps in reinitializing the system to a known state.
5. Program Check (Exceptions / Traps)
Generated by the CPU when a program executes an illegal or exceptional instruction.
Example:
o Division by zero, invalid memory access.
6. Machine Check (Hardware Error Interrupt)
Generated by the CPU or system hardware when a hardware failure or malfunction occurs.
Example:
o Parity error, bus error, or memory fault.
7. Timer Interrupt
Generated by a hardware timer at regular time intervals to notify the CPU.
Example:
o After a fixed time slice expires, the timer generates an interrupt so the OS can switch from
one process to another.
Procedure of Interrupt
The device raise an interrupt. The Processor provides the requested services by executing an appropriate
interrupt service routine.
I/O STRUCTURE
I/O (Input/Output) structure defines how a computer system communicates with external devices such as
keyboard, disk, printer, and network devices. It explains the interaction between CPU, main memory, and
I/O devices.
Since I/O devices are much slower than the CPU, efficient I/O mechanisms are required to prevent the CPU
from remaining idle and to improve overall system performance.
I/O Techniques
1. Programmed I/O (Polling)
In programmed I/O, the CPU actively checks the status of the I/O device in a loop to see whether the device
is ready.
CPU directly manages data transfer.
CPU wastes time if the device is slow.
Suitable for simple and low-speed devices.
2. Interrupt-Driven I/O
In interrupt-driven I/O, the I/O device sends an interrupt signal to the CPU when it is ready.
CPU can perform other tasks while waiting.
Reduces CPU waiting time.
More efficient than programmed I/O.
3. Direct Memory Access (DMA)
In DMA, data is transferred directly between memory and the I/O device without continuous CPU
involvement.
CPU only initializes the transfer.
DMA controller handles data movement.
Suitable for high-speed devices like disks and network interfaces.
I/O Bus and Controllers
I/O Bus
o The I/O bus is a communication pathway that connects the CPU, memory, and I/O devices.
o Examples: PCIe, USB
I/O Controller
o An I/O controller is a hardware interface that manages data exchange between I/O devices
and CPU/memory.
o Examples: Disk controller, Network Interface Card (NIC)
I/O Structure Types
1. Memory-Mapped I/O
I/O devices share the same address space as memory.
CPU uses normal load/store instructions for I/O.
Simple and uniform design.
2. Isolated I/O (Port-Mapped I/O)
Separate address space is used for I/O devices.
Special instructions like IN and OUT are used.
Memory address space is not reduced.
Performance Considerations
Efficient I/O structure improves system performance by considering:
Latency:
o Time delay between request and completion.
Throughput:
o Amount of data transferred per unit time.
CPU Utilization:
o CPU should remain busy and not wait unnecessarily for I/O operations.
PIPELINING
Pipelining is a technique in processor design where instruction execution is divided into multiple stages
(e.g., Fetch → Decode → Execute → Memory Access → Write Back), allowing multiple instructions to be
processed simultaneously in an assembly-line fashion.
Increases instruction throughput (number of instructions executed per unit time).
Does not reduce the execution time of a single instruction, but multiple instructions overlap in
execution.
Instead of completing one instruction at a time, the processor begins a new instruction before the
previous one finishes.
Example
Imagine making burgers:
1. One person grills the patty.
2. One adds vegetables.
3. One wraps the burger.
All workers works at the same time on different burgers.
More burgers made in less time.
Same idea in CPU pipelining.
Stages of a Typical Instruction Pipeline
1. Instruction Fetch (IF):
o Fetch instruction from memory.
2. Instruction Decode (ID):
o Decode instruction, read registers and identify operands.
3. Execute (EX):
o Perform the required operation in ALU.
4. Memory Access (MEM):
o Access memory if needed (load/store).
5. Write Back (WB):
o Store result in register.
How Pipeline Works?
Clock Cycle Instruction 1 Instruction 2 Instruction 3
1 IF
2 ID IF
3 EX ID IF
4 MEM EX ID
5 WB MEM EX
Components
The components are given below:
1. Data In:
o This is the input data that enters the pipeline.
2. Stages:
o Each stage performs part of the operation. The pipeline is divided into multiple stages (S1,
S2, ..., Sm), and each stage handles a specific operation.
3. Registers:
o These are pipeline registers. They temporarily hold data between stages to ensure smooth
transfer and isolation between operations. s
4. Computation Units:
o These perform the actual processing (like arithmetic or logical operations). Each computation
unit corresponds to a specific stage.
5. Control Unit:
o Manages the timing and control signals for each stage. Ensures that each stage operates in
sync and processes the correct data at the right time.
6. Data Out:
o The final output after processing is complete across all pipeline stages.
Hurdles in Pipeline Processor
1. Structural Hazards
Cause:
o Occur when hardware resources are insufficient to handle multiple instructions
simultaneously.
Example:
o If CPU has only 1 memory unit, simultaneous instruction fetch and data access may conflict.
Solution:
o Duplicate resources (separate instruction and data caches).
o Use efficient resource allocation.
2. Data Hazards
Cause: Occur when instructions depend on the results of previous instructions still in the pipeline.
Types of Data Hazards:
1. RAW (Read After Write):
Instruction needs a value that has not yet been written.
Example:
I1: ADD R1, R2, R3
I2: SUB R4, R1, R5 ← depends on R1 result
Solution: Forwarding/Bypassing, Pipeline stalls.
2. WAR (Write After Read)
Instruction writes to a register before a previous instruction has read it.
Rare in simple pipelines.
3. WAW (Write After Write)
Two instructions write to the same register in different orders.
Common in superscalar processors.
3. Control Hazards (Branch Hazards)
Cause:
o Occur when the pipeline makes wrong decisions about instruction flow due to branch
instructions (e.g., if, loop).
Problem:
o The next instruction address is not known until the branch is resolved.
Solution:
o Branch Prediction:
CPU guesses the outcome of the branch.
o Delayed Branching:
Execute next instruction regardless of branch.
o Speculative Execution:
Fetch and execute multiple possible paths.
4. Pipeline Stalls (Bubbles)
Definition:
o Idle cycles inserted into the pipeline to resolve hazards.
Problem:
o Reduce efficiency and throughput.
Solution:
o Advanced techniques like forwarding, prediction, and out-of-order execution.
Techniques to Overcome Pipeline Issues
Operand Forwarding (Bypassing):
o Use result directly from pipeline stage instead of waiting for write-back.
Dynamic Scheduling:
o Reorder instructions to avoid stalls.
Out-of-Order Execution:
o Execute independent instructions before dependent ones finish.
Superscalar Design:
o Multiple instructions issued per cycle.
Branch Prediction Algorithms:
o sStatic prediction, dynamic (history-based) prediction.
EXCEPTION HANDLING
In computer architecture refers to the mechanism by which a computer detects and responds to abnormal or
special conditions that occur during program. These condition interrupt the normal flow of instruction and
require attention from hardware or OS.
What is exception?
It is an event that occurs during program execution that alters the normal sequence of instruction execution.
Example
Divide by zero Accessing invalid memory
Page fault Arithmetic overflow
Types of Exception Handling
1. Hardware Exceptions
Hardware exceptions occur due to hardware failures or malfunctions in the system.
Caused by faults in physical components.
Usually critical and may stop system execution.
Examples:
o Power failure
o Hardware malfunction
o Memory parity error
2. Software Exceptions
Software exceptions occur when a program executes an illegal or abnormal instruction.
Generated by the CPU during program execution.
Handled by the operating system.
Examples:
o Divide by zero
o Invalid opcode
o Arithmetic overflow
3. Interrupt Exceptions
Interrupt exceptions are generated by external events or devices to get CPU attention.
Occur asynchronously.
Used to handle I/O and timing events.
Examples:
o Keyboard input
o I/O device signals
o Timer interrupt
Purposes of Exception Handling
1. Error Detection
To identify abnormal conditions such as divide-by-zero, invalid memory access, or hardware faults.
2. System Stability
To prevent system crashes by handling errors in a controlled manner.
3. Error Recovery
To allow the system or program to recover from errors and continue execution when possible.
4. Resource Management
To enable the operating system to manage CPU, memory, and I/O devices efficiently.
5. Program Control
To transfer control to the operating system or exception handler for proper action.
6. Security and Protection
To prevent illegal operations and protect system resources from unauthorized access.
Synchronous Exceptions
These occur as a direct result of instruction execution, for example division by zero or page fault.
Asynchronous Exceptions
These occur independently of program execution, such as keyboard input or I/O device interrupts.
Exception Handling Process
The CPU detects the exception
The current program state is saved
Control is transferred to the exception handler
The exception is handled
Execution resumes or the program is terminated.
Exception Handler
An exception handler is a special routine (usually in the operating system) that determines the cause of the
exception and decides the appropriate response.
Interrupt Vector Table (IVT)
The IVT stores the addresses of exception and interrupt handlers, allowing the CPU to quickly locate the
correct handler.
Exceptions vs Interrupts
Exceptions are caused by internal CPU events, while interrupts are caused by external hardware devices.
Importance
Exception handling ensures reliable execution, system protection, efficient resource management, and
smooth multitasking.
PARALLELISM
It is a technique of executing multiple instructions/operations at the same time to improve performance,
speed and efficiency. Instead of relying on sequential execution, modern computers exploit. Parallelism at
different levels of hardware and software.
Need of Parallelism:
Increasing CPU clock speed has physical limit.
Application require faster processing of large data.
Better utilization of available hardware resources.
Improve throughput and reduce execution time.
Level/Types of Parallelism:
1. Bit level (BLP)
Increasing number of bits processed for instruction.
2. Instruction level (ILP)
Allow multiple instructions to be executed simultaneously with a single processor
Superscalar Processors Pipelining
Out-of-order Execution (OoO)
3. Data level (DLP):
Perform the same operation on multiple data items simultaneously.
Vector Processors (Image processing, machine learning, scientific computation)
GPUs (Graphics Processing Units) SIMD
4. Task level (TLP)
Execute multiple independent tasks or threads at the same time
Multicore Processors Multithreading
Multiprocessor Systems
Parallel Processing Architectures:
Multicore
Multiprocessors
Distributed Systems
GPU
Advantages of Parallelism
Faster execution
Higher throughput
Efficient resource utilization
Scalability
Supports modern applications (AI, big data)
Challenges of Parallelism
Complex hardware design
Synchronization and communication overhead
Load balancing issues
Difficult programming models
Increased cost and power consumption
Importance in Modern Computer Systems
Parallelism is fundamental in:
High-performance computing
Cloud and data centers
Artificial intelligence and deep learning
Real-time and embedded systems
MULTIPROCESSOR
A multiprocessor is a computer system with two or more CPUs that work together to share the workload.
Its main aim is to increase processing speed and overall performance. Multiprocessors are widely used in
servers and high-performance systems.
Example
A server handling many users at a same time.
Modern laptops and smartphones with multicore processors.
How it works?
Tasks are divided into smaller parts.
Each processor works on a part simultaneously.
Results are combined to complete the job faster.
Types
1 . Based on Control
There are two types of multiprocessor based on control.
I. Symmetric Multiprocessors (SMP)
All processors are equal.
Share the same memory.
Same OS runs on each CPU.
Processors work together on same task.
Better load balancing.
Example: UNIX systems.
II. Asymmetric Multiprocessors (AMP)
One processor acts as master while other processors act as slaves.
Master processor controls task allocation.
Each processor performs specific task.
Simple and low cost.
Used in embedded systems.
Example: Old mainframe system
2 . Based on Memory
There are two types of multiprocessor based on memory.
I. Shared Memory Multiprocessor
All processors use a single common main memory.
Any CPU can directly access shared data.
Communication between processors is fast.
Performance may reduce under heavy memory access.
Example: Multi-core computers.
II. Distributed Memory Multiprocessor
Each processor has its own private memory.
CPUs communicate through message passing.
Suitable for large-scale systems.
Memory access time is not uniform.
Example: Computer clusters.
Interconnection Methods
Interconnection models describe how processors, memory and I/O devices are connected and how they
communicate with each other in a multiprocessor system. Main models are describe following:
1. Time Shared Common Bus
All processors shared a single common bus to communicate with memory.
One common bus connects all CPUs and memory.
Only one processor can use the bus at a time.
Other processors must wait.
Simple and low cost [Link]
Slow data transfer.
Example: Small shared memory multiprocessors.
2. Multiport Memories
Memory provides separate ports for processors.
Many processors access the same memory simultaneously.
Each processor is connected to the memory through its own port.
Higher cost than common bus.
Improves parallel processing.
Priority resolves memory conflicts.
Example: Cache memory shared by multiple CPU.
3. Crossbar Switch Network
Connect multiple processors to multiple memory modules.
Supports parallel CPU–memory access.
Each processor connect to any memory module.
Multiple connections can occur at the same time.
High data transfer rate.
Very flexible architecture.
Complex and expensive.
Example: Super computer
Memory Access Models
1. Shared Memory (UMA)
All CPUs share one memory.
Equal access time.
Easy to program.
Limited scalability.
Suitable for small systems.
Example: Small shared memory multiprocessor systems.
2. Distributed Memory (NUMA)
Each CPU has local memory.
Access time varies by location.
Faster local access.
Scalable design.
Used in high-performance computing.
Example: Large servers
Applications
SISD (Single Instruction on Single Data).
o Early computers
SIMD (Same Instruction on Multiple Data).
o GPU
MISD (Multiple Instructions on Same Data).
o Pipelined systms
MIMD (Multiple Instructions on Multiple Data).
o Multicore servers