0% found this document useful (0 votes)
8 views16 pages

Superscalar CPU Architecture Overview

The document discusses superscalar architecture in computer architecture, highlighting its ability to execute multiple instructions per clock cycle through parallel execution units. It outlines key features, advantages, and limitations of superscalar CPUs, including dynamic instruction scheduling and the challenges of complex hardware design. Additionally, it touches on instruction-level parallelism (ILP) and various hardware techniques for performance enhancement, concluding that while superscalar processors improve performance, they face limitations due to data dependencies and hardware complexity.

Uploaded by

kashinjeelias136
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views16 pages

Superscalar CPU Architecture Overview

The document discusses superscalar architecture in computer architecture, highlighting its ability to execute multiple instructions per clock cycle through parallel execution units. It outlines key features, advantages, and limitations of superscalar CPUs, including dynamic instruction scheduling and the challenges of complex hardware design. Additionally, it touches on instruction-level parallelism (ILP) and various hardware techniques for performance enhancement, concluding that while superscalar processors improve performance, they face limitations due to data dependencies and hardware complexity.

Uploaded by

kashinjeelias136
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

SUPE R S C A L A R ,

T I O N-L E V E L &
INSTRUC
P A R A L L E L I S M
MACHINE GAN IZATION (CT
CTU R E AN D OR
COMPUTER ARCHITE 211)
OMA
UNIVERSITY OF DOD
GROUP INFORMATION

• COURSE: COMPUTER ARCHITECTURE AND ORGANIZATION


(CT 211)

• GROUP NUMBER: 09
• PROGRAMME CE2
• FACILITATOR: MR. BAKII JUMA
• ACADEMIC YEAR: 2025 / 2026
HARDWARE SUPPORT AND
SUPER SCALAR
SUPER SCALAR
• In computer architecture superscalar refers to a type of cpu design that can
execute more than one instruction per clock cycle by using multiple execution
units (alu, fpu, load/store units) in parallel.

• A superscalar processor fetches, decodes and executes multiple instructions


simultaneously during a single clock cycle
A SUPERSCALAR CPU HAS

• -Multiple execution units, advanced instruction scheduling , -instruction-level


parallelism (ILP) detection
KEY FEATURES OF SUPERSCALAR
CPU

• Multiple instructions per cycle: A superscalar CPU can fetch, decode, and execute more than one
instruction in a single clock cycle. It issues independent instructions simultaneously to different
execution units
• Dynamic instruction scheduling; the CPU decides at runtime the order in which instructions are
executed.
• Out-of-order execution: (in many designs); in out-of-order execution instructions are executed as soon
as their operands are available, rather than strictly following the program order.
• Register renaming: to avoid data hazards; superscalar cpus use register renaming to eliminate false
dependencies (war and waw hazards).
ADVANTAGES OF SUPERSCALAR
ARCHITECTURE

• Higher performance; A superscalar CPU can execute multiple instructions in a single clock cycle
instead of just one. By issuing instructions in parallel to different execution units (such as ALU,
FPU, and load/store units), the CPU completes more work per cycle, significantly increasing
overall performance.

• Better use of hardware resources; superscalar processors have multiple execution units. Instead
of leaving these units idle, the CPU intelligently schedules independent instructions to run
simultaneously. This maximizes hardware utilization and reduces wasted processing power.

• Faster program execution; because instructions are executed in parallel, programs finish in
fewer clock cycles. This leads to faster execution of applications, improved responsiveness, and
better performance for compute-intensive tasks such as multimedia processing, scientific
computing, and gaming.
LIMITATIONS OF SUPERSCALAR
ARCHITECTURE
• Complex hardware design; superscalar cpu’s must analyze multiple instructions at the same
time to decide which can run in parallel

• Higher power consumption;


to execute multiple instructions per clock cycle, superscalar processors include:
• Multiple execution units (alus, fpus, load/store units)
• Large instruction windows and buffers
• Sophisticated scheduling and prediction hardware
All this extra hardware consumes more power, generates more heat, and reduces battery life in
mobile devices.

• Diminishing returns if instructions are not independent


Superscalar performance depends heavily on instruction-level parallelism (ILP).
Diminishing returns if instructions are not independent
Superscalar performance depends heavily on instruction-level parallelism (ILP).
When instructions are not independent, the CPU cannot issue multiple instructions, so
performance gains become limited.
INSTRUCTION-LEVEL
PARALLELISM (ILP)
Ability to execute multiple independent instructions
simultaneously
Example, Program
A=B+C and D=E+F instructions does not depend on each other

• Unlike dependent instruction like A=B+C and D=A+F Means second


instruction depends on first one in such that can not Run in parallel.

• Depends on the program structure


 Independent instructions → high ilp
 Dependent instructions → low ilp
HOW ILP WORKS

Modern cpu's divide instruction execution into several stages, such as:

• Instruction fetch (IF) read the instruction from memory


• Instruction decode (ID): the CPU analyzes the instruction and generates the
necessary control signals to execute it

• Execute (EX) perform the operation (arithmetic, logic, etc.).


• Memory access (MEM) read or write data (if required)
MACHINE (HARDWARE)
PARALLELISM
ABILITY OF CPU HARDWARE TO EXECUTE MULTIPLE
INSTRUCTIONS AT ONCE

• DEPENDS ON
CPU DESIGN (CORES): whether is superscalar
ISSUE WIDTH: how many instructions can issue per cycle
NUMBER OF EXECUTION UNITS
• INDEPENDENT of the program logic, only depends on the cpu
design
HOW MLP WORKS

• Multi-core processors: each core can execute its own instruction stream
independently.

• Simultaneous multi-threading (smt) / hyper-threading:


• A single cpu core runs multiple threads concurrently, utilizing idle execution
units efficiently.

• Multiple processors (smp – symmetric multiprocessing):


• Two or more physical cpu's work together to execute multiple tasks in parallel.
HARDWARE TECHNIQUES FOR
PERFORMANCE ENHANCEMENT
• PIPELINING: breaks instruction execution into stages (fetch, decode, execute, etc.). Allows
multiple instructions to be processed at once.

• Two alus allow executing two arithmetic instructions at once.

• Pipeline split into 10 stages instead of 5 → cpu cycles faster


• SUPERSCALAR EXECUTION: cpu issues multiple instructions
per clock cycle. uses several parallel execution units (alu, fpu,
load/store unit). allows true parallel instruction execution.

• OUT-OF-ORDER EXECUTION : cpu does not wait for


stalled instructions. executes other independent
instructions first. not necessarily in order.
CONT….
BRANCH PREDICTION: predicts the outcome of conditional branches to avoid delays.
• SPECULATIVE EXECUTION: executes instructions ahead of time based on branch
prediction.

if the prediction is correct → results are kept. if wrong → results are discarded

improves performance by not wasting pipeline cycles.

• CACHING & MEMORY HIERARCHY IMPROVEMENTS

• CACHING: uses small, fast memory between cpu and main memory. reduces memory
access time. MEMORY HIERARCHY: faster ram types (DDR3 → DDR4 → DDR5)

• WIDER MEMORY BUSES


CONCLUSION

• Superscalar Processors Improve Performance By Executing


Multiple Instructions Per Cycle Using Parallel Hardware Units
And Advanced Techniques Like Pipelining, Branch Prediction,
And Out-of-order Execution. However, Data Dependencies,
Hardware Complexity, And Prediction Failures Create Limitations
To The Achievable Speedup.

You might also like