QP 2021
QP 2021
Answer any five of the following. Each question carries six marks : (5×6=30)
1. 1D8A(16)
2. 117(10)
3. 171647(8)
The NOR gate is known as a "universal gate" because it can be used to construct any
other basic logic gate (NOT, AND, OR). This is done by combining the NOR gates in
specific configurations.
A NOT gate outputs the opposite of the input. To make a NOT gate from a NOR, you
simply tie the two inputs of the NOR gate together.
An OR gate outputs 1 if at least one input is 1. To make an OR gate, you use two NOR
gates.
Circuit: Feed inputs AA and BB into the first NOR gate. Then, feed the output of
that NOR into a second NOR gate configured as a NOT gate (inputs tied together).
Operation:
An AND gate outputs 1 only if both inputs are 1. To make an AND gate, you need three
NOR gates.
Circuit: First, pass AA and BB through NOT gates (NORs with tied inputs) to
get A‾A and B‾B. Then, feed these inverted signals into a NOR gate.
Operation:
CISC (Complex Instruction Set Computer) and RISC (Reduced Instruction Set Computer)
are two different design philosophies for computer architecture. Here are the key
differences:
Micro-programmed (slow).
Complex instructions are Hard-wired (fast). Simple instructions are
Control Unit
broken down into executed directly by hardware.
microcode.
Fewer general-purpose
Register Large number of general-purpose registers
registers (due to memory
Usage (typically 32 to 128).
operands).
complex instructions.
An instruction format defines the layout of the bits of an instruction, specifying how
the opcode and operands are organized. The most common classification is based on
the number of addresses (operands) an instruction contains.
Structure: OPCODE A, B, C
Meaning: This format performs an operation on two operands (B and C) and
stores the result in a third operand (A).
Operation: A = B OP C
Advantage: It is very flexible and requires the fewest number of instructions to
perform complex tasks.
Disadvantage: It requires longer instruction words to accommodate three
addresses, leading to larger program size.
Example (Assembly-like):
Structure: OPCODE A, B
Meaning: This format performs an operation using two operands. Typically, the
result overwrites one of the operands (usually the first one).
Operation: A = A OP B or A = B OP A (depending on architecture).
Advantage: It reduces the instruction length compared to three-address formats.
Disadvantage: It destroys the original value of one of the operands, which may
require a copy instruction if the original value is needed later.
Example (Assembly-like):
o ADD R1, R2 This means: R1 = R1 + R2 (R1 is both a source and the destination).
o MOV R3, R4 This means: R3 = R4 (copy value).
Structure: OPCODE A
Meaning: This format uses a single explicit address. It implicitly uses a special
register called the accumulator (ACC) for all operations.
Operation: ACC = ACC OP A
Advantage: Very short instruction length.
Disadvantage: Program length increases significantly because many instructions
are needed to load and store values from the accumulator.
Example (Accumulator-based Machine):
o LOAD X This means: ACC = X (load value from memory location X into accumulator).
o ADD Y This means: ACC = ACC + Y (add value at Y to accumulator).
o STORE Z This means: Z = ACC (store accumulator value to memory location Z).
Structure: OPCODE
Meaning: This format has no address fields. It is used in Stack-based
architectures. All operations implicitly use the top of the stack.
Operation: TOS = (TOS-1) OP TOS (Pop two values from stack, apply operation,
push result back).
Advantage: Instructions are extremely short, allowing for very dense code.
Disadvantage: It is inefficient for random data access and requires many
push/pop instructions.
Example (Stack Machine):
This is the most fundamental limitation. If instruction B needs the result of instruction A,
it cannot execute at the same time as A or before A. It must wait.
Example:
Even if instructions are independent, they may compete for the same hardware
resources at the same time.
Example: A CPU might have only one memory port. If one instruction needs to
read data from memory and another needs to write data to memory
simultaneously, they cannot both proceed. One must stall.
Limitation: Hardware designers try to mitigate this by adding more resources
(e.g., multiple ALUs, multiple memory ports), but there are physical and cost limits
(power consumption, chip area).
Beyond the classical hazards, there are theoretical limits described by researchers.
Amdahl's Law: The speedup from parallel processing is limited by the sequential
portion of the code. Even in highly parallel programs, there is always some part
that must run in order.
Memory Wall: The speed gap between the processor and main memory is huge.
If an instruction needs data from memory, the CPU often stalls waiting for it,
regardless of how many execution units are free. This is often the biggest
bottleneck in modern systems.
Power Wall: Increasing ILP requires more complex hardware (more execution
units, complex schedulers), which consumes exponentially more power and
generates more heat, making further scaling impractical.
6. DMA has Priority over the CPU when both request a memory
transfer. Justify your answer.
DMA (Direct Memory Access) is a feature that allows hardware devices (like disk
drives, graphics cards, network cards) to transfer data directly to and from the main
memory without involving the CPU for every byte.
When both the CPU and a DMA controller request access to the memory bus at the
same time, the DMA controller is generally given priority. This is justified by the
following reasons:
I/O devices that use DMA (e.g., a hard disk or network interface) operate at a fixed
speed determined by the hardware. They have a limited buffer to hold incoming or
outgoing data.
The Risk: If the CPU is using the memory bus and the DMA controller has to wait,
the I/O device's buffer may overflow (during input) or underflow (during output).
This results in data loss.
The Justification: The CPU can wait for a few cycles without losing its state (it is
just paused), but an I/O device cannot pause the physical flow of data. Therefore,
DMA must be serviced immediately to maintain data integrity.
The Benefit: If the DMA gets priority, it can quickly grab the bus, transfer the
entire block at high speed, and then release the bus back to the CPU.
Comparison: If the CPU held priority, it would intersperse its own memory
accesses, dragging out the DMA transfer time and keeping the I/O device busy for
longer, which degrades overall system performance.
The CPU is a programmable unit that can save its current state (registers, program
counter) and resume later. When the CPU requests memory, it is usually to fetch an
instruction or data for the current process. If the CPU is forced to wait (insert "wait
states"), it simply stalls temporarily.
The Trade-off: Giving priority to DMA causes a slight, temporary slowdown for
the CPU (known as cycle stealing). However, this small delay is a worthwhile
trade-off to prevent a much more catastrophic failure (data loss from an I/O
device).
Conclusion: DMA gets priority because I/O devices have real-time constraints and
physical buffers that cannot wait, whereas the CPU is a flexible, stateful machine that can
tolerate brief stalls without losing data.
Characteristic: Multiple processors can execute multiple tasks (or threads of the
same task) simultaneously.
Explanation: This allows the system to handle more work in a given amount of
time. For example, one processor can handle a database query while another
renders a video, vastly improving overall system throughput.
2. Shared Memory
3. Cache Coherence
4. Interconnection Network
Characteristic: The system can continue operating even if one processor fails.
Explanation: In a multiprocessor system, if one CPU fails, the operating system
can typically reassign its tasks to another functioning CPU. The system might run
slower (graceful degradation), but it does not crash entirely, which is crucial for
servers and critical systems.
7. Load Balancing
Description: This is the original type of ROM. The data is physically "hard-wired"
into the chip during the manufacturing process using a photomask.
Programming: Programmed at the factory. It cannot be altered or
reprogrammed after manufacturing.
Advantages: Very inexpensive for mass production.
Disadvantages: Inflexible. If there is an error in the data, the entire batch of chips
is useless. High initial cost for creating the mask.
Usage: Used in devices with absolutely no need for updates (e.g., simple
calculators, legacy automotive controllers).
SECTION – B
9. a) Minimize the following expression using K-Map
Y=A‾BC‾D‾+A‾BCD‾+ABC‾D‾+AB‾CD‾+ABCD‾+A‾BCD+ABCDY=ABCD+
ABCD+ABCD+ABCD+ABCD+ABCD+ABCD
Note: I have rewritten the expression as it appeared to have some typographical errors in
the original transcription (duplicate terms). This corrected version contains all 7 unique
minterms.
The expression is in Sum of Products (SOP) form. Let's convert each term to its minterm
number (assuming A is the MSB and D is the LSB):
1. A‾BC‾D‾ABCD = 0 1 0 0 = 4₁₀
2. A‾BCD‾ABCD = 0 1 1 0 = 6₁₀
3. ABC‾D‾ABCD = 1 0 1 0 = 10₁₀
4. AB‾CD‾ABCD = 1 0 1 0 = 10₁₀ (Wait, this seems to be a duplicate of term 3—likely a
transcription error. Let's check carefully.)
A‾BC‾D‾ABCD 0100 4
A‾BCD‾ABCD 0110 6
A‾B‾A‾BABAB‾C‾D‾04128C‾D15139CD371511CD‾261410CDCDCDCD
AB0132AB4576AB12131514AB891110
0001111000011001000011011010011100011110000000011011111011100001
Step 3: Grouping
Row 00 cols 01 and 11 have 1 s → 4 and 12 → both have = , =1, =0, =0 Wait, 4 is
0100 (A=0, B=1, C=0, D=0), 12 is 1100 (A=1, B=1, C=0, D=0). So group: B=1, C=0, D=0
→ term BC‾D‾BCD (A cancels)
Row 10 cols 01, 11, 10 have 1 s → 6 (0110 , 14 (1110 , 10 (1010 . ctually 10 is col 10,
row 10 → yes. So in row 10 ( =1, =0 , cols 01, 11, 10 have 1 s → that s 01 =1, 11 =1,
10: B=0. So in this row, all three columns with 1's share C=1, D=0, but B varies. So
group =1, =0 → term CD‾CD (A and B cancel? Wait, check:
6: A=0,B=1,C=1,D=0
14: A=1,B=1,C=1,D=0
10: A=1,B=0,C=1,D=0
So yes, only C=1 and D=0 common. So group: CD‾CD
Row 11 cols 01 and 11 have 1 s → 7 (0111 and 15 (1111 → share =1, =1, =1 Wait,
7 =0, =1, =1, =1; 15 =1, =1, =1, =1 → share =1, =1, =1 →
term: BCDBCD (A cancels)
1. BC‾D‾BCD
2. CD‾CD
3. BCDBCD
So minimized expression:
Y=BC‾D‾+CD‾+BCDY=BCD+CD+BCD
This is a Full Adder circuit. A full adder adds three bits (A, B, and Carry-in Cin) and
produces a Sum (S) and a Carry-out (Cout).
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
Sum = A ⊕ B ⊕ Cin
(Sum is 1 when odd number of inputs are 1)
Sum = A ⊕ B ⊕ Cin
This requires two XOR gates:
First XOR: A ⊕ B
Second XOR: (A ⊕ B) ⊕ Cin
A -----\
XOR -----\
B -----/ XOR ----- Sum
/
Cin ------------/
A -----\
AND \
B -----/ \
OR ----- Cout
A -----\ /
XOR --/
B -----/ \
AND
Cin ---------/
Sum = A ⊕ B ⊕ Cin
Cout = A·B + B·Cin + A·Cin
Truth Table
S1S1 S0S0 YY
0 0 I0I0
0 1 I1I1
S1S1 S0S0 YY
1 0 I2I2
1 1 I3I3
Boolean Expression
From the truth table, the output can be written as:
Y=S1‾⋅S0‾⋅I0+S1‾⋅S0⋅I1+S1⋅S0‾⋅I2+S1⋅S0⋅I3Y=S1⋅S0⋅I0+S1⋅S0⋅I1+S1⋅S0⋅I2+S1⋅S0
⋅I3
Working Procedure
Result: The MUX acts as a digitally controlled switch, routing one of four inputs to the
output based on the binary value on the select lines.
Answer 10(a): A 4-to-1 MUX is constructed using 4 AND gates, 1 OR gate, and 2 NOT
gates. It works by enabling exactly one AND gate based on the select lines, passing the
corresponding input to the output.
4 variables: A,B,C,DA,B,C,D
Minterms: 1, 3, 5, 6
8:1 MUX has 3 select lines and 8 data inputs
We can use A,B,CA,B,C as select lines, and DD as a variable input to the data lines.
Step 2: Truth table with A, B, C as select lines
List all combinations of A,B,CA,B,C (000 to 111) and for each, determine FF as a
function of DD:
Step 4: Implementation
Connect:
I0,I1,I2I0,I1,I2 to DD
I3I3 to D‾D (use a NOT gate on D)
I4,I5,I6,I7I4,I5,I6,I7 to logic 0 (ground)
Select lines A,B,CA,B,C are connected directly to the MUX select inputs.
No memory reference is needed to fetch the operand (other than the instruction fetch
itself).
Fast execution because the data is immediately available.
Limitation: The operand value is fixed at compile/assembly time and cannot be
changed dynamically; only constants can be used.
Example:
text
Another Example:
text
Requires one memory access to fetch the operand (after fetching the instruction).
Simple to implement and understand.
Limitation: The address is fixed, limiting the range of addressable memory and making
dynamic data structures harder to implement. The address space is limited by the
number of bits in the address field.
Example:
text
Another Example:
text
Comparison Summary
Speed Fastest (no extra memory access) Slower (requires memory access)
Distributed Memory MIMD is a type of MIMD architecture where each processor has
its own private memory and there is no shared global memory. Processors communicate
by passing messages over an interconnection network.
Key Characteristics
1. Private Memory: Each processor can access only its own local memory directly.
Accessing another processor's memory requires explicit communication.
2. Message Passing: Processors communicate and synchronize by sending and receiving
messages through the interconnection network.
3. Scalability: This architecture scales very well to hundreds or thousands of processors
because there is no shared memory bottleneck.
4. No Cache Coherence Problem: Since there is no shared memory, the hardware does
not need to maintain cache coherence across processors (this is handled by the
programmer or software).
5. Non-Uniform Memory Access (NUMA): Accessing local memory is fast, but accessing
remote memory (via messages) is very slow.
Diagram
text
+-------------+ +-------------+
| Processor 1 | | Processor 2 |
+-------------+ +-------------+
| |
+-------------+ +-------------+
| Memory 1 | | Memory 2 |
+-------------+ +-------------+
\ /
\ /
\ /
+-------------------------+
| Interconnection Network |
| (e.g., Mesh, Hypercube, |
| Ring, Torus) |
+-------------------------+
/ \
/ \
+-------------+ +-------------+
| Processor N | | Processor 3 |
+-------------+ +-------------+
| |
+-------------+ +-------------+
| Memory N | | Memory 3 |
+-------------+ +-------------+
Working Procedure
1. Local Computation: Each processor works independently on the data stored in its own
local memory.
2. Data Sharing: If Processor 1 needs data that resides in Memory 2, it must:
Advantages
Disadvantages
A Three-State Buffer (or Tri-state Buffer) is a digital logic device that has three possible
output states: logic 0, logic 1, and High-Impedance (Hi-Z) . The Hi-Z state effectively
disconnects the output from the circuit, making it appear as if the device is not present.
Logic Symbol:
text
A ------|>o----- Y
|
E
(The small circle (bubble) indicates active-low enable, though active-high enables are also
common).
1 0 0
1 1 1
When E = 1: The buffer is enabled. Output Y = A (the buffer acts like a normal buffer).
When E = 0: The buffer is disabled. Output Y is disconnected (Hi-Z). It acts like an open
circuit.
Vdd
|
|
+--o--+
| PMOS|
+--|--+
|
A ----+--------o|----+---- Y
| NMOS |
| | |
+-------o|----+
NMOS
|
GND
The enable signal controls the gates of the output transistors, either connecting the
output to the input logic or disconnecting both pull-up and pull-down paths, resulting
in Hi-Z.
Diagram:
text
The primary goal is to hide latency (e.g., cache misses, pipeline stalls) by utilizing the
processor's resources with work from another thread when one thread is stalled.
Part 2: Multithreaded Processor Architecture
1. Multiple Register Sets: The processor has multiple copies of the architectural registers
(one set per hardware thread).
2. Multiple Program Counters (PCs): It maintains a separate PC for each thread.
3. Thread Selector Logic: Hardware logic that decides which thread's instructions to fetch
and execute at any given time.
Types of Multithreading
1. The Fetch unit, in a single cycle, may fetch instructions from multiple threads (e.g.,
Thread 1 and Thread 2).
2. The Decode unit prepares these instructions for execution.
3. The Scheduler, in the same cycle, dispatches these independent instructions to the
available functional units.
o Cycle N: Thread 1's ADD goes to ALU 1, Thread 3's MUL goes to FPU.
o Cycle N+1: Thread 2's LOAD goes to Load/Store unit, Thread 4's SUB goes to ALU 2.
4. By filling the execution units with instructions from different threads, the processor
maximizes the utilization of its hardware, significantly improving overall throughput
compared to a single-threaded processor.