Computer Architecture Lec 7
Internal Memory
The basic element of a semiconductor memory is the memory cell. Although a
variety of electronic technologies are used, all semiconductor memory cells share
certain properties:
• They exhibit two stable (or semistable) states, which can be used to
represent binary 1 and 0.
• They are capable of being written into (at least once), to set the state. Or
being read to sense the state.
The cell has three functional terminals capable of carrying an electrical signal.
The select terminal, as the name suggests, selects a memory cell for a read or write
operation.
The control terminal indicates read or write. And the other terminal, which for
writing, the other terminal provides an electrical signal that sets the state of the cell
to 1 or 0. For reading, that terminal is used for output of the cell’s state.
For our purposes, we will take it as given that individual cells can be selected for
reading and writing operations.
DRAM and SRAM
All of the memory types that we will explore in this lecture are random access. That
is, individual words of memory are directly accessed through wired-in addressing
logic.
The most common semiconductor memory is referred to as random-access memory
(RAM). This is, of course, a misuse of the term, one distinguishing characteristic of
RAM is that it is possible both to read data from the memory and to write new data
into the memory easily and rapidly. Both the reading and writing are accomplished
through the use of electrical signals.
The other distinguishing characteristic of RAM is that it is volatile. A RAM must be
provided with a constant power supply. If the power is interrupted, then the data
are lost. Thus, RAM can be used only as temporary storage. The two traditional
forms of RAM used in computers are DRAM and SRAM.
DYNAMIC RAM: RAM technology is divided into two technologies: dynamic and
static. A dynamic RAM (DRAM) is made with cells that store data as charge on
capacitors. The presence or absence of charge in a capacitor is interpreted as a
binary 1 or 0.
1
Because capacitors have a natural tendency to discharge, dynamic RAMs require
periodic charge refreshing to maintain data storage. The term dynamic refers to this
tendency of the stored charge to leak away, even with power continuously applied.
Figure 7.1a is a typical DRAM structure for an individual cell that stores 1 bit.
The address line is activated when the bit value from this cell is to be read or written.
The transistor acts as a switch that is closed (allowing current to flow) if a voltage is
applied to the address line and open (no current flows) if no voltage is present on
the address line.
For the write operation, a voltage signal is applied to the bit line; a high voltage
represents 1, and a low voltage represents 0. A signal is then applied to the address
line, allowing a charge to be transferred to the capacitor.
For the read operation, when the address line is selected, the transistor turns on and
the charge stored on the capacitor is fed out onto a bit line and to a sense amplifier.
The sense amplifier compares the capacitor voltage to a reference value and
determines if the cell contains a logic 1 or a logic 0. The readout from the cell
discharges the capacitor, which must be restored to complete the operation.
Figure 7.1 typical memory cell structures
Although the DRAM cell is used to store a single bit (0 or 1), it is essentially an analog
device. The capacitor can store any charge value within a range; a threshold value
determines whether the charge is interpreted as 1 or 0.
2
STATIC RAM In contrast, a static RAM (SRAM) is a digital device that uses the same
logic elements used in the processor. In a SRAM, binary values are stored using
traditional flip-flop logic-gate configurations
A static RAM will hold its data as long as power is supplied to it. Figure 7.1b is a
typical SRAM structure for an individual cell. Four transistors (T1,T2,T3,T4) are cross
connected in an arrangement that produces a stable logic state.
In logic state 1, point C1 is high and point C2 is low; in this state,T1 and T4 are off
and T2 and T3 are on.1 In logic state 0, point C1 is low and point C2 is high; in this
state,T1 and T4 are on and T2 and T3 are off. Both states are stable as long as the
direct current (dc) voltage is applied. Unlike the DRAM, no refresh is needed to
retain data.
As in the DRAM, the SRAM address line is used to open or close a switch. The
address line controls two transistors (T5 and T6).When a signal is applied to this line,
the two transistors are switched on, allowing a read or write operation. For a write
operation, the desired bit value is applied to line B, while its complement is applied
to line . This forces the four transistors (T1, T2, T3, T4) into the proper state.
For a read operation, the bit value is read from line B.
SRAM VERSUS DRAM Both static and dynamic RAMs are volatile; that is, power must
be continuously supplied to the memory to preserve the bit values. A dynamic
memory cell is simpler and smaller than a static memory cell. Thus, a DRAM is more
dense (smaller cells more cells per unit area) and less expensive than a
corresponding SRAM. On the other hand, a DRAM requires the supporting refresh
circuitry.
For larger memories, the fixed cost of the refresh circuitry is more than
compensated for by the smaller variable cost of DRAM cells. Thus, DRAMs tend to be
favored for large memory requirements. A final point is that SRAMs are generally
somewhat faster than DRAMs. Because of these relative characteristics, SRAM is
used for cache memory (both on and off chip), and DRAM is used for main memory.
Types of ROM
As the name suggests, a read-only memory (ROM) contains a permanent pattern of
data that cannot be changed. A ROM is nonvolatile; that is, no power source is
required to maintain the bit values in memory. While it is possible to read a ROM, it
is not possible to write new data into it. An important application of ROMs is
microprogramming.
For a modest-sized requirement, the advantage of ROM is that the data or program
is permanently in main memory and need never be loaded from a secondary storage
device.
3
A ROM is created like any other integrated circuit chip, with the data actually wired
into the chip as part of the fabrication process. This presents two problems:
1- The data insertion step includes a relatively large fixed cost, whether one or
thousands of copies of a particular ROM are fabricated.
2- There is no room for error. If one bit is wrong, the whole batch of ROMs must
be thrown out.
When only a small number of ROMs with a particular memory content is needed, a
less expensive alternative is the programmable ROM (PROM). the PROM is
nonvolatile and may be written into only once
Another variation on read-only memory is the read-mostly memory, which is useful
for applications in which read operations are far more frequent than write
operations but for which nonvolatile storage is required. There are three common
forms of read-mostly memory: EPROM, EEPROM, and flash memory.
The optically erasable programmable read-only memory (EPROM) is read and
written electrically, as with PROM. However, before a write operation, all the
storage cells must be erased to the same initial state by exposure of the packaged
chip to ultraviolet radiation. More attractive form of read-mostly memory is
electrically erasable programmable read-only memory (EEPROM). This is a read-
mostly memory that can be written into at any time without erasing prior contents;
only the byte or bytes addressed are updated
Another form of semiconductor memory is flash memory (so named because of the
speed with which it can be reprogrammed). Like EEPROM, flash memory uses an
electrical erasing technology. An entire flash memory can be erased in one or a few
seconds, which is much faster than EPROM.
Error correction:
A semiconductor memory system is subject to errors. These can be categorized as
hard failures and soft errors. A hard failure is a permanent physical defect so that
the memory cell or cells affected cannot reliably store data but become stuck at 0 or
1 or switch erratically between 0 and 1. Hard errors can be caused by harsh
environmental abuse, manufacturing defects, and wear. A soft error is a random,
nondestructive event that alters the contents of one or more memory cells without
damaging the memory. Soft errors can be caused by power supply problems or alpha
particles.
These particles result from radioactive decay and are distressingly common because
radioactive nuclei are found in small quantities in nearly all materials. Both hard and
4
soft errors are clearly undesirable, and most modern main memory systems include
logic for both detecting and correcting errors.
Figure 7.2 illustrates in general terms how the process is carried out. When data are
to be read into memory, a calculation, depicted as a function f, is performed on the
data to produce a code. Both the code and the data are stored. Thus, if an M-bit
word of data is to be stored and the code is of length K bits, then the actual size of
the stored word is M+ K bits.
When the previously stored word is read out, the code is used to detect and possibly
correct errors. A new set of K code bits is generated from the M data bits and
compared with the fetched code bits. The comparison yields one of three results:
• No errors are detected. The fetched data bits are sent out.
• An error is detected, and it is possible to correct the error. The data bits plus error
correction bits are fed into a corrector, which produces a corrected set of M bits to
be sent out.
• An error is detected, but it is not possible to correct it .This condition is reported.
Codes that operate in this fashion are referred to as error-correcting codes. A code is
characterized by the number of bit errors in a word that it can correct and detect.
Figure 7.2 error-correcting code function
The simplest of the error-correcting codes is the Hamming code devised by Richard
Hamming at Bell Laboratories. Figure 7.3 uses Venn diagrams to illustrate the use of
this code on 4-bit words (M= 4) .With three intersecting circles, there are seven
compartments. We assign the 4 data bits to the inner compartments (Figure 7.3a).
The remaining compartments are filled with what are called parity bits.
Each parity bit is chosen so that the total number of 1s in its circle is even (Figure
7.3b). Thus, because circle A includes three data 1s, the parity bit in that circle is set
to 1. Now, if an error changes one of the data bits (Figure 7.3c), it is easily found.
5
By checking the parity bits, discrepancies are found in circle A and circle C but not in
circle B. Only one of the seven compartments is in A and C but not B. The error can
therefore be corrected by changing that bit.
To clarify the concepts involved, we will develop a code that can detect and correct
single-bit errors in 8-bit words.
To start, let us determine how long the code must be. Referring to Figure 7.2, the
comparison logic receives as input two K-bit values. A bit-by-bit comparison is done
by taking the exclusive-OR of the two inputs. The result is called the syndrome word.
Thus, each bit of the syndrome is 0 or 1 according to if there is or is not a match in
that bit position for the two inputs.
The syndrome word is therefore K bits wide and has a range between 0 and 2K-1. The
value 0 indicates that no error was detected, leaving 2K-1 values to indicate, if there
is an error, which bit was in error. Now, because an error could occur on any of the
M data bits or K check bits, we must have
2K-1 ≥ M + K
Figure 7.3 Hamming Error-Correcting Code
This inequality gives the number of bits needed to correct a single bit error in a word
containing M data bits. For example, for a word of 8 data bits (M= 8), we have
K=3: 23 - 1 < 8 + 3
K=4: 24 - 1 > 8 + 4
6
Thus, eight data bits require four check bits. For convenience, we would like to
generate a 4-bit syndrome for an 8-bit data word with the following characteristics:
• If the syndrome contains all 0s, no error has been detected.
• If the syndrome contains one and only one bit set to 1, then an error has occurred
in one of the 4 check bits. No correction is needed.
• If the syndrome contains more than one bit set to 1, then the numerical value of
the syndrome indicates the position of the data bit in error. This data bit is inverted
for correction.
To achieve these characteristics, the data and check bits are arranged into a 12-bit
word as depicted in Figure 7.4. The bit positions are numbered from 1 to 12.
Those bit positions whose position numbers are powers of 2 are designated as check
bits. The check bits are calculated as follows, where the symbol designates the
exclusive- OR operation:
Figure 7.4 layout of Data Bits and Check Bits
Each check bit operates on every data bit whose position number contains a 1 in the
same bit position as the position number of that check bit. Thus, data bit positions
3, 5, 7, 9, and 11 (D1, D2, D4, D5, D7) all contain a 1 in the least significant bit of their
position number as does C1; bit positions 3, 6, 7, 10, and 11 all contain a 1 in the
second bit position, as does C2; and so on. Looked at another way, bit position n is
checked by those bits Ci such that ∑𝑖 = 𝑛. For example, position 7 is checked by bits
in position 4, 2, and 1; and 7 4 2 1.
Let us verify that this scheme works with an example. Assume that the 8-bit input
word is 00111001, with data bit D1 in the rightmost position. The calculations are as
follows:
7
Suppose now that data bit 3 sustains an error and is changed from 0 to [Link] the
check bits are recalculated, we have
When the new check bits are compared with the old check bits, the syndrome word
is formed:
The result is 0110, indicating that bit position 6, which contains data bit 3, is in error.
Figure 7.5 illustrates the preceding calculation. The data and check bits are
positioned properly in the 12-bit word. Four of the data bits have a value 1 (shaded
in the table), and their bit position values are XORed to produce the Hamming code
0111, which forms the four check digits.
Figure 7.5 check bit calculation
8
ADVANCED DRAM ORGANIZATION
As discussed earlier, one of the most critical system bottlenecks when using high-
performance processors is the interface to main internal memory. This interface is
the most important pathway in the entire computer system. The basic building block
of main memory remains the DRAM chip, as it has for decades; until recently, there
had been no significant changes in DRAM architecture since the early 1970s.
The traditional DRAM chip is constrained both by its internal architecture and by its
interface to the processor’s memory bus.
We have seen that one attack on the performance problem of DRAM main memory
has been to insert one or more levels of high-speed SRAM cache between the DRAM
main memory and the processor. But SRAM is much costlier than DRAM, and
expanding cache size beyond a certain point yields diminishing returns
Synchronous DRAM
One of the most widely used forms of DRAM is the synchronous DRAM (SDRAM).
Unlike the traditional DRAM, which is asynchronous, the SDRAM exchanges data
with the processor synchronized to an external clock signal and running at the full
speed of the processor/memory bus without imposing wait states.
In a typical DRAM, the processor presents addresses and control levels to the
memory, indicating that a set of data at a particular location in memory should be
either read from or written into the DRAM. After a delay, the access time, the DRAM
either writes or reads the data. During the access-time delay, the DRAM performs
various internal functions, such as activating the high capacitance of the row and
column lines, sensing the data, and routing the data out through the output buffers.
The processor must simply wait through this delay, slowing system performance.
With synchronous access, the DRAM moves data in and out under control of the
system clock. The processor or other master issues the instruction and address
information, which is latched by the DRAM. The DRAM then responds after a set
number of clock cycles. Meanwhile, the master can safely do other tasks while the
SDRAM is processing the request.
Rambus DRAM
RDRAM, developed by Rambus, has been adopted by Intel for its Pentium and
Itanium processors. It has become the main competitor to SDRAM. RDRAM chips are
vertical packages, with all pins on one side. The chip exchanges data with the
9
processor over 28 wires no more than 12 centimeters long. The bus can address up
to 320 RDRAM chips and is rated at 1.6 GBps.
The special RDRAM bus delivers address and control information using an
asynchronous block-oriented protocol. After an initial 480 ns access time, this
produces the 1.6 GBps data rate. What makes this speed possible is the bus itself,
which defines impedances, clocking, and signals very precisely. Rather than being
controlled by the explicit RAS, CAS, R/W, and CE signals used in conventional DRAMs,
an RDRAM gets a memory request over the high-speed bus. This request contains
the desired address, the type of operation, and the number of bytes in the
operation.
DDR SDRAM
SDRAM is limited by the fact that it can only send data to the processor once per bus
clock cycle. A new version of SDRAM, referred to as double-data-rate SDRAM can
send data twice per clock cycle, once on the rising edge of the clock pulse and once
on the falling edge.
DDR DRAM was developed by the JEDEC Solid State Technology Association, the
Electronic Industries Alliance’s semiconductor-engineering-standardization body.
Numerous companies make DDR chips, which are widely used in desktop computers
and servers.
Cache DRAM
Cache DRAM (CDRAM), developed by Mitsubishi, integrates a small SRAM cache (16
Kb) onto a generic DRAM chip. The SRAM on the CDRAM can be used in two ways.
First, it can be used as a true cache, consisting of a number of 64-bit lines. The cache
mode of the CDRAM is effective for ordinary random access to memory.
The SRAM on the CDRAM can also be used as a buffer to support the serial access of
a block of data. For example, to refresh a bit-mapped screen, the CDRAM can
prefetch the data from the DRAM into the SRAM buffer. Subsequent accesses to the
chip result in accesses solely to the SRAM.
10