Introduction: Memory Systems
Memory systems in computer organization refer to the storage hierarchy used to store data and
instructions required for program execution. Since different types of memory vary in speed, cost,
and capacity, a hierarchical arrangement is used to achieve an optimal balance between
performance and cost. The memory hierarchy typically includes registers, cache memory, main
memory (RAM), and secondary storage. Registers are the fastest and smallest storage units located
inside the CPU, while secondary storage devices such as hard disks provide large capacity but
slower access.
Cache memory plays a crucial role in improving system performance by reducing the average
memory access time. It is a small, high-speed memory located between the CPU and main memory.
Frequently accessed data and instructions are stored in the cache, thereby minimizing the need to
access slower main memory. Cache performance is measured using parameters such as hit, miss,
and hit ratio. Different cache mapping techniques, including direct mapping, associative mapping,
and set-associative mapping, determine how data is placed in the cache.
Another important concept in memory systems is virtual memory, which allows the execution of
large programs that may not fit entirely into main memory. It provides an illusion of a large
memory by using secondary storage as an extension of RAM. Virtual memory uses techniques
such as paging and segmentation to manage memory efficiently. Overall, the memory system is
designed to ensure high speed, efficient utilization, and seamless data access for the CPU.
Introduction: Input/Output (I/O) Systems
Input/Output systems are responsible for communication between the computer system and
external devices such as keyboards, monitors, printers, and storage devices. Since I/O devices
operate at different speeds compared to the CPU, special mechanisms are required to ensure
efficient data transfer. The I/O system consists of I/O devices, I/O interfaces, and I/O controllers,
which manage the flow of data between the CPU and peripherals.
There are three main techniques for performing I/O operations. In Programmed I/O, the CPU
continuously checks the status of the device and directly controls data transfer, which leads to
inefficient CPU utilization. In Interrupt-driven I/O, the CPU initiates the I/O operation and
continues executing other tasks until the device sends an interrupt signal indicating completion.
This method improves efficiency compared to programmed I/O. The most efficient technique is
Direct Memory Access (DMA), where data is transferred directly between memory and the I/O
device without continuous CPU involvement. DMA significantly reduces CPU overhead and
increases system performance.
Interrupts are an essential part of I/O systems, allowing devices to signal the CPU when they
require attention. Interrupts can be classified as maskable and non-maskable, depending on
whether they can be ignored by the CPU. Proper handling of interrupts ensures smooth and
efficient operation of the system.
Memory Systems
A computer system consists of:
CPU (processing)
Memory system (storage)
I/O system (interaction with external devices)
The performance of a computer depends heavily on how efficiently the CPU communicates with
memory and I/O devices.
A memory system stores data and instructions required by the CPU for execution. In a computer
system, different types of memory are used because no single memory can satisfy all the following
at the same time:
Very fast
Very large capacity
Very low cost
To overcome this limitation, computer systems use a Memory Hierarchy, which organizes memory
into multiple levels based on speed, cost, and capacity.
Key Objectives of Memory hierarchy
Reduce average memory access time
Bridge the speed gap between CPU and memory
Provide large memory capacity at low cost
Memory Hierarchy
Memory hierarchy is the organization of storage levels in a computer system such that the fastest,
smallest, and most expensive memory is closest to the CPU, while slower, larger, and cheaper
memory is farther away. In simple terms, it organizes memory so that:
Frequently used data is kept in faster, smaller memory
Less frequently used data is kept in slower, larger memory
General Hierarchy (Top → Bottom)
1. Registers
2. Cache Memory
3. Main Memory (RAM)
4. Secondary Memory (Virtual Memory: Disk/SSD)
Characteristics
Upper levels: Faster, smaller, expensive
Lower levels: Slower, larger, cheaper
Relies on Locality of Reference
Memory isn't a single entity; it’s a system designed to balance cost, capacity, and speed.
Registers: Fastest, smallest, located inside the CPU.
Cache (L1, L2, L3): High-speed SRAM; stores frequently used data to reduce access time.
Main Memory (RAM): DRAM; the primary workspace for the CPU.
Secondary Storage: HDD/SSD; non-volatile and large capacity.
Memory hierarchy works efficiently because of the Principle of Locality.
Types of Locality
1. Temporal Locality
o Recently accessed data is likely to be accessed again
o Example: Loop variables
2. Spatial Locality
o Nearby memory locations are likely to be accessed
o Example: Arrays
Cache memory exploits both types of locality.
Cache Memory: The Speed Demon
Cache memory is a small, high-speed storage area located directly on or very close to the CPU. It
utilizes Static RAM (SRAM), which is significantly faster than the DRAM used in main memory.
Purpose: To bridge the "processor-memory gap." CPUs operate at GHz speeds, while Main
Memory operates much slower. Cache holds the data the CPU is likely to need next.
Principle of Locality: * Temporal Locality: If data is used once, it will likely be used again
soon (e.g., a loop counter).
o Spatial Locality: If data is used, data at nearby addresses will likely be used soon
(e.g., an array).
Levels: Most modern CPUs use a multi-level approach (L1, L2, and L3) to balance speed
and size.
Cache Hit → Data found in cache → Fast access
Cache Miss → Data fetched from main memory → Slower access
Main Memory (RAM):
Main Memory is the "Primary Memory" of the system, typically implemented using Dynamic
RAM (DRAM).
Function: It stores the programs and data currently being executed by the CPU. While much
slower than cache, it is much faster than secondary storage (SSD/HDD).
Characteristics: It is volatile, meaning it loses its contents when power is removed. It is
byte-addressable, allowing the CPU to access specific pieces of data directly.
Virtual Memory: The Great Illusion
Virtual Memory is a memory management technique that provides an "idealized" view of storage
to a process. It makes a process think it has a contiguous, private block of memory, even if that
memory is fragmented across Physical RAM and the Hard Disk.
The Mechanism: It uses a hardware unit called the Memory Management Unit (MMU) to
map Virtual Addresses (used by the software) to Physical Addresses (the actual hardware
location).
Paging: Memory is divided into fixed-size blocks called "pages." When the RAM is full,
the OS moves inactive pages to the disk (swap space) to free up room for active ones.
Key Benefit: It allows systems to run programs larger than the actual physical RAM
available.
Summary Comparison Table
Cache Main Memory
Feature Virtual Memory
Memory (RAM)
Hardware + Software (Disk-
Technology SRAM DRAM
backed)
Theoretically limited by CPU
Size KBs to MBs GBs
bit-width
Cache Main Memory
Feature Virtual Memory
Memory (RAM)
$\approx 100-200$ Milliseconds (if a "Page Fault"
Latency $1-10$ cycles
cycles occurs)
Managed Hardware
Hardware/OS OS and MMU
by (CPU)
When the CPU needs a piece of data, the following sequence occurs:
1. Check Cache: If found (Cache Hit), the CPU processes it immediately.
2. Check RAM: If not in cache (Cache Miss), the CPU looks in Main Memory.
3. Check Disk: If the address isn't in RAM, a Page Fault occurs. The OS fetches the required
page from the Disk, places it in RAM, and updates the Page Table.
Cache Mapping Techniques
Cache mapping techniques define how blocks of main memory are placed into cache memory.
Since cache is much smaller than main memory, an efficient mapping technique is required to
decide the location of data in cache. The three primary mapping techniques are Direct Mapping,
Associative Mapping, and Set-Associative Mapping, each offering different trade-offs between
speed, cost, and flexibility.
Direct Mapping
In direct mapping, each block of main memory is mapped to exactly one specific cache line. The
mapping is performed using a simple modulo operation, where the cache line number is obtained
by dividing the main memory block number by the total number of cache lines and taking the
remainder. This technique is very simple and allows fast access because the cache line is
predetermined. However, it suffers from a major drawback known as conflict miss, where multiple
memory blocks compete for the same cache line.
For example, suppose a cache has 8 lines. If we want to map main memory block 10, the cache
line is calculated as: 10 mod 8 = 2. So, block 10 will be stored in cache line 2. Similarly, block 18
will also map to line 2 (18 mod 8 = 2), causing replacement of block 10. This frequent replacement
reduces cache efficiency, especially when such blocks are repeatedly accessed.
Associative Mapping (Fully Associative)
In associative mapping, a block of main memory can be placed in any cache line, providing
maximum flexibility. Unlike direct mapping, there is no fixed position for a memory block.
Instead, the cache controller searches all cache lines simultaneously using associative (parallel)
comparison to find a match. This eliminates conflict misses and improves cache utilization.
For example, if a cache has 8 lines and we want to store block 10, it can be placed in any of the 8
lines. If block 18 also needs to be stored, it can be placed in another free line instead of replacing
block 10. However, when the cache becomes full, a replacement policy such as LRU (Least
Recently Used) is applied to decide which block to remove. Although associative mapping
provides better performance, it is expensive to implement due to the need for complex hardware
for parallel searching.
Set-Associative Mapping
Set-associative mapping is a compromise between direct and associative mapping. In this
technique, the cache is divided into several sets, and each set contains a fixed number of cache
lines. A memory block is first mapped to a specific set using a modulo operation, but within that
set, it can be placed in any available line. This reduces conflict misses while keeping the hardware
complexity manageable.
For example, consider a cache with 8 lines organized as a 2-way set-associative cache. This means
there are 4 sets, each containing 2 lines. If we want to map block 10, the set number is calculated
as 10 mod 4 = 2. Therefore, block 10 can be placed in either of the two lines in set 2. If another
block, say block 14 (14 mod 4 = 2), also maps to set 2, it can occupy the second line instead of
replacing block 10 immediately. Only when both lines are full will a replacement policy be applied.
This approach significantly reduces conflict misses compared to direct mapping and is widely used
in modern processors.
Associative Memory (Content Addressable Memory - CAM)
Associative Memory, also known as Content Addressable Memory (CAM), is a special type of
memory that is accessed based on content rather than address. Unlike conventional memory where
we provide an address to retrieve data, in CAM we provide the data (or part of it), and the memory
searches all stored entries simultaneously to find a match. This parallel searching capability makes
CAM extremely fast and efficient for applications requiring quick lookups.
For example, consider a CAM storing multiple data words such as 1010, 1100, and 1111. If we
input the value 1100, the CAM will compare it with all stored entries at once and immediately
return the matching location. This eliminates the need for sequential searching, which is common
in regular memory. CAM is widely used in cache memory (for tag comparison), networking
devices (like routers for routing tables), and TLB (Translation Lookaside Buffer). However, due
to its parallel comparison hardware, CAM is expensive and consumes more power, which limits
its size and usage.
Overview of I/O Interfaces
I/O (Input/Output) interfaces are the communication links between the CPU and external devices
such as keyboards, printers, disks, and network devices. Since these devices operate at different
speeds and data formats compared to the CPU, an I/O interface acts as a bridge to ensure proper
data transfer and synchronization.
An I/O interface typically consists of data registers, control registers, and status registers. The CPU
communicates with the I/O device by sending commands through control registers and checking
device status through status registers. There are different methods of data transfer using I/O
interfaces, such as Programmed I/O, where the CPU actively waits for the device, and Interrupt-
driven I/O, where the device interrupts the CPU when it is ready. For example, when a user types
on a keyboard, the keystroke is sent through the I/O interface to the CPU, which processes it and
may display it on the screen.
I/O interfaces play a crucial role in ensuring efficient communication between the system and
peripherals, handling issues like speed mismatch, data conversion, and device control.
Direct Memory Access (DMA)
Direct Memory Access (DMA) is a technique that allows data transfer between I/O devices and
main memory without continuous involvement of the CPU. In traditional I/O operations, the CPU
is responsible for transferring each byte of data, which consumes significant processing time.
DMA overcomes this limitation by using a DMA controller, which takes control of the system bus
to perform data transfer directly.
For example, when a large file is being read from a disk, instead of the CPU transferring each byte,
the DMA controller directly moves the data from the disk to main memory. The CPU only
initializes the DMA operation by providing details such as memory address, data size, and
direction of transfer. Once the transfer is complete, the DMA controller sends an interrupt to the
CPU to indicate completion.
DMA significantly improves system performance by reducing CPU overhead and enabling parallel
processing. It is commonly used in high-speed data transfer devices such as disk drives, graphics
cards, and network interfaces.