0% found this document useful (0 votes)

7 views20 pages

Introduction To Floating Point Representation: CS453 Computer System Design

The document provides an overview of floating-point representation in computer systems, detailing the differences between integers, fixed-point, and floating-point numbers. It explains the IEEE 754 standard for single precision floating-point numbers, including the formats for representation, conversion methods, and special cases like positive and negative zero and infinity. Additionally, it covers operations such as addition, subtraction, and multiplication of floating-point numbers.

Uploaded by

raghavmour

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views20 pages

Introduction To Floating Point Representation: CS453 Computer System Design

Uploaded by

raghavmour

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CS453 Computer System Design

Introduction to Floating Point Representation

John Jose
Associate Professor
Department of Computer Science & Engineering
Indian Institute of Technology Guwahati
Integers and fixed-point numbers
• Integers: the universe is infinite but discrete
• No numbers between consecutive integers, e.g., 5 and 6
• A countable (finite) number of items in a finite range
• Referred to as fixed-point numbers
Real Numbers and floating-point numbers
• Real numbers – the universe is infinite and continuous
• Fractions represented by decimal notation
• Rational numbers, e.g., 5/2 = 2.5
• Irrational numbers, e.g., 22/7 = 3.14159265 . . .
• Infinite numbers exist even in the smallest range
• Referred to as floating-point numbers
• A large number: 976,000,000,000,000 = 9.76 × 1014
• A small number: 0.0000000000000976 = 9.76 × 10 –14
Standard Scientific Notation
• Decimal numbers
• 0.513×105, 5.13×104 and 51.3×103 are written in
scientific notation.
• 5.13×104 is the normalized scientific notation.
• Binary numbers
• Base 2
• Binary point – multiplication by 2 moves the point to
the right.
• Normalized scientific notation, e.g., 1.02×2 –1
Floating Point Numbers
• General format : ±1.bbbbb2×2eeee
or (-1)S × (1+F) × 2E
• Where
• S = sign, 0 for positive, 1 for negative
• F = fraction (or mantissa) as a binary integer,
1+F is called significand
• E = exponent as a binary integer, positive or negative (two’s
complement)
Numbers in 32-bit Formats
• Two’s complement integers
Expressible numbers
-231 0 231-1

• Floating point numbers

Negative underflow Positive underflow

–∞ -0 +0
+∞
Negative Expressible Expressible Positive
Overflow negative positive Overflow
numbers numbers
0
IEEE 754 Floating Point Standards
IEEE 754 Floating Point Standard
• Single Precision Floating point numbers

• Biased exponent: true exponent range

• [-126,127] is changed to [1, 254]
• Biased exponent is an 8-bit positive binary integer.
• True exponent obtained by subtracting 127ten or 01111111two
• First bit of significand is always 1
• ± [Link] . . . b × 2E
• 1 before the binary point is implicitly assumed.
• Significand field represents 23-bit fraction after the binary point.
IEEE 754 Floating Point Standard
• Single Precision Floating point numbers
normalized E F
bits 23-30 bits 0-22
Sign bit S
1 1011001 01001100000000010001101
Positive integer – 127 = E

Negative underflow Positive underflow

–∞ –0 +0
+0 +∞
Negative Expressible Expressible Positive
Overflow negative positive Overflow
numbers -2-126 0 2-126 numbers
Decimal Fraction to Binary (IEEE 754) conversion
• Represent 85.125 in IEEE 754 format

• 85 = 1010101 : 0.125 = 001

• 85.125 = 1010101.001 = 1.010101001 x 26 [sign = 0]

• Biased exponent 127+6=133→= 10000101

• Normalised mantissa = 010101001 (we will add 0's to complete the 23 bits)

• The IEEE 754 Single precision is: = 0 10000101 01010100100000000000000

• Hexadecimal form 0100,0010,1010,1010,0100,0000,0000,0000
42AA4000
Binary to Decimal Fraction Conversion
Binary (-1)S (1.b1b2b3b4) × 2E

Decimal (-1)S × (1 + b1×2-1 + b2×2-2 + b3×2-3 + b4×2-4) × 2E

Example: -1.1100 × 2-2 (binary)

= - (1 + 2-1 + 2-2) ×2-2

= - (1 + 0.5 + 0.25)/4

= - 1.75/4

= - 0.4375 (decimal)
Conversion From Hex to Decimal
• R1= 0x42220000
0 100 0010 0010 0010 0000 0000 0000 0000
→ E’= 100 0010 0 →132 → E=132-127=5 → 25
+ 1.0100010 x 25 = 101000.10 = +40.5
• R2=0xC12E0000
1 100 0001 0010 1110 0000 0000 0000 0000
→ E’= 100 0001 0 →130 → E=130-127=3 → 23
= - 1.0101110 x 23 = 1010 = -10 + (0.5+0.25+0.125) = -10.875
• R3=0xC0800000
1 100 0000 1000 0000 0000 0000 0000 0000
= - 1.00 x 22 = 0100 = -4
Positive Zero in IEEE 754
0 00000000 00000000000000000000000
Biased Fraction
exponent

• + 1.0 × 2 –127
• Smaller than the smallest positive number in single-precision IEEE 754
standard.
• Interpreted as positive zero.
• True exponent less than –126 is positive underflow
Negative Zero in IEEE 754
1 00000000 00000000000000000000000
Biased Fraction
exponent
• – 1.0 × 2 –127
• Greater than the largest negative number in single-precision IEEE 754
standard.
• Interpreted as negative zero.
• True exponent less than –126 is negative underflow
Positive Infinity in IEEE 754
0 11111111 00000000000000000000000
Biased Fraction
exponent
• + 1.0 × 2128
• Greater than the largest positive number in single-precision IEEE 754
standard.
• Interpreted as + ∞
• If true exponent > 127, then the number is greater than ∞. It is called “not
a number” or NaN and may be interpreted as ∞.
Negative Infinity in IEEE 754
1 11111111 00000000000000000000000
Biased Fraction
exponent
• –1.0 × 2128
• Smaller than the smallest negative number in single-precision IEEE 754
standard.
• Interpreted as - ∞
• If true exponent > 127, then the number is less than - ∞. It is called “not a
number” or NaN and may be interpreted as - ∞.
FP Addition and Subtraction
1. Significand alignment: Right shift significand of smaller exponent until
two exponents match.

2. Addition: Add significands and report error if overflow occurs.

If significand = 0, return result as 0.

3. Normalization

-Shift significand bits to normalize.

- report overflow or underflow if exponent goes out of range.

4. Rounding
Example (4 Significant Fraction Bits)
• Subtraction: 0.5ten – 0.4375ten
• Floating point numbers to be added
1.000two× 2 –1 and –1.110two× 2 –2
• Significand of lesser exponent is shifted right until exponents match
–1.110two× 2 –2 → – 0.111two× 2 –1 01000
• Add significands, 1.000two + ( – 0.111two) +11001
Result is 0.001two × 2 –1 00001
2’s complement addition,
• Normalize, 1.000two× 2 – 4 one bit added for sign
No overflow/underflow since
127 ≥ exponent ≥ –126
1.000two × 2 – 4 = (1+0)/16 = 0.0625ten
FP Multiplication

1. Separate sign
2. Add exponents (integer addition)
3. Multiply significands (integer multiplication)
4. Normalize, round, check overflow/underflow
5. Replace sign
johnjose@[Link]
[Link]

Single Precision Floating-Point Overview
No ratings yet
Single Precision Floating-Point Overview
24 pages
IEEE 754 Floating Point Overview
No ratings yet
IEEE 754 Floating Point Overview
16 pages
Floating Point Representation Overview
No ratings yet
Floating Point Representation Overview
42 pages
Understanding Floating-Point Representation
No ratings yet
Understanding Floating-Point Representation
21 pages
IEEE Floating Point Representation Explained
No ratings yet
IEEE Floating Point Representation Explained
31 pages
Understanding IEEE Floating Point Standards
No ratings yet
Understanding IEEE Floating Point Standards
31 pages
Understanding IEEE Floating Point Standards
No ratings yet
Understanding IEEE Floating Point Standards
31 pages
Understanding Floating-Point Representation
No ratings yet
Understanding Floating-Point Representation
21 pages
IEEE 754 Floating Point Overview
No ratings yet
IEEE 754 Floating Point Overview
38 pages
IEEE 754 Floating Point Arithmetic
No ratings yet
IEEE 754 Floating Point Arithmetic
24 pages
IEEE Floating Point Overview for 15-213
No ratings yet
IEEE Floating Point Overview for 15-213
34 pages
Understanding Floating-Point Representation
No ratings yet
Understanding Floating-Point Representation
36 pages
Floatpoint Intro
No ratings yet
Floatpoint Intro
20 pages
Floating Point Division Explained
No ratings yet
Floating Point Division Explained
19 pages
Floating Point Numbers 237045407 237045407
No ratings yet
Floating Point Numbers 237045407 237045407
20 pages
IEEE Floating Point Representation Guide
No ratings yet
IEEE Floating Point Representation Guide
31 pages
Floating Point Representation in Computers
No ratings yet
Floating Point Representation in Computers
57 pages
Understanding Floating-Point Numbers
No ratings yet
Understanding Floating-Point Numbers
51 pages
Computer Arithmetic and Number Representation
No ratings yet
Computer Arithmetic and Number Representation
24 pages
Computer Arithmetic and Number Representation
No ratings yet
Computer Arithmetic and Number Representation
24 pages
IEEE 754 Floating Point Overview
No ratings yet
IEEE 754 Floating Point Overview
10 pages
Floating Point Representation Overview
No ratings yet
Floating Point Representation Overview
33 pages
Number Systems and Floating-Point Representation
No ratings yet
Number Systems and Floating-Point Representation
36 pages
32-Bit Floating Point Arithmetic Guide
100% (1)
32-Bit Floating Point Arithmetic Guide
30 pages
Ternary Quantization in Neural Networks
No ratings yet
Ternary Quantization in Neural Networks
150 pages
Understanding Binary and Floating Point Representation
No ratings yet
Understanding Binary and Floating Point Representation
33 pages
25+IEEE 754 Floating Point Part 1
No ratings yet
25+IEEE 754 Floating Point Part 1
22 pages
Understanding Floating Point Representation
No ratings yet
Understanding Floating Point Representation
13 pages
Floating Point Representation Overview
No ratings yet
Floating Point Representation Overview
12 pages
FPGA-Based 64-Bit Floating Point Adder
No ratings yet
FPGA-Based 64-Bit Floating Point Adder
11 pages
Understanding Floating Point Representation
No ratings yet
Understanding Floating Point Representation
13 pages
IEEE Floating Point Arithmetic Overview
No ratings yet
IEEE Floating Point Arithmetic Overview
30 pages
04 Floating Point Numbers
No ratings yet
04 Floating Point Numbers
32 pages
Ieee754 Floating Point Notes
No ratings yet
Ieee754 Floating Point Notes
10 pages
IEEE754 FloatingPtArithmetic
No ratings yet
IEEE754 FloatingPtArithmetic
13 pages
26+IEEE 754 Floating Point Part 2
No ratings yet
26+IEEE 754 Floating Point Part 2
25 pages
Decimal of 27/100 Explained
No ratings yet
Decimal of 27/100 Explained
8 pages
Understanding Floating Point Representation
No ratings yet
Understanding Floating Point Representation
18 pages
Floating Point Representation Overview
No ratings yet
Floating Point Representation Overview
22 pages
Fixed and Floating Point Number Representation
No ratings yet
Fixed and Floating Point Number Representation
21 pages
Floating Point Imprecision Explained
No ratings yet
Floating Point Imprecision Explained
44 pages
IEEE 754 Floating Point Standards Explained
No ratings yet
IEEE 754 Floating Point Standards Explained
27 pages
Floating Point Representation
No ratings yet
Floating Point Representation
29 pages
Tiny Floating Point Representation
No ratings yet
Tiny Floating Point Representation
38 pages
Understanding Floating-Point Numbers
No ratings yet
Understanding Floating-Point Numbers
28 pages
IEEE 754 Floating Point Overview
No ratings yet
IEEE 754 Floating Point Overview
4 pages
IEEE 754 Floating Point Overview
No ratings yet
IEEE 754 Floating Point Overview
9 pages
Understanding Floating Point Numbers
No ratings yet
Understanding Floating Point Numbers
18 pages
Floating Point Representation Explained
No ratings yet
Floating Point Representation Explained
37 pages
IEEE 754 Floating Point Standards Explained
No ratings yet
IEEE 754 Floating Point Standards Explained
16 pages
MIPS Computer Arithmetic Overview
No ratings yet
MIPS Computer Arithmetic Overview
55 pages
IEEE 754 Single Precision Format Explained
No ratings yet
IEEE 754 Single Precision Format Explained
6 pages
IEEE 754 Floating Point Overview
No ratings yet
IEEE 754 Floating Point Overview
14 pages
Understanding Floating Point Representation
No ratings yet
Understanding Floating Point Representation
11 pages
Overview of IEEE 754 Floating Point
No ratings yet
Overview of IEEE 754 Floating Point
7 pages
Memory Allocation Strategies Explained
No ratings yet
Memory Allocation Strategies Explained
32 pages
COS 201: Intro to Computer Programming
No ratings yet
COS 201: Intro to Computer Programming
3 pages
Understanding Sigma Notation in Math
No ratings yet
Understanding Sigma Notation in Math
7 pages
Java OOP Question Bank for B.Tech Students
No ratings yet
Java OOP Question Bank for B.Tech Students
4 pages
Anuj Reports
No ratings yet
Anuj Reports
45 pages
Weekly Class Schedule for Engineering Courses
No ratings yet
Weekly Class Schedule for Engineering Courses
4 pages
089 - Practice Set AVL Trees
No ratings yet
089 - Practice Set AVL Trees
62 pages
Minecraft 1.12.2 Crash Report Analysis
No ratings yet
Minecraft 1.12.2 Crash Report Analysis
32 pages
CEA201 Lab 1: Number Systems Exercises
No ratings yet
CEA201 Lab 1: Number Systems Exercises
8 pages
Java Collection Framework Cheat Sheet
No ratings yet
Java Collection Framework Cheat Sheet
3 pages
Data Structures and Algorithms Overview
No ratings yet
Data Structures and Algorithms Overview
19 pages
gtsummary: Create Custom Tables in R
No ratings yet
gtsummary: Create Custom Tables in R
92 pages
Cloud Computing Exam Questions November 2022
No ratings yet
Cloud Computing Exam Questions November 2022
4 pages
Understanding Artificial Intelligence Basics
100% (3)
Understanding Artificial Intelligence Basics
25 pages
Class 11 Computer Science Lesson Plan
No ratings yet
Class 11 Computer Science Lesson Plan
2 pages
Process Scheduling in Operating Systems
No ratings yet
Process Scheduling in Operating Systems
85 pages
BFS and DFS Applications in Graphs
No ratings yet
BFS and DFS Applications in Graphs
76 pages
Library Management System Code
No ratings yet
Library Management System Code
5 pages
Multiple Attribute Decision Making Methods and Applications 1st Edition Gwo-Hshiung Tzeng Ebook With Full Chapters
100% (1)
Multiple Attribute Decision Making Methods and Applications 1st Edition Gwo-Hshiung Tzeng Ebook With Full Chapters
36 pages
Grade 12 Information Technology Exam P2
No ratings yet
Grade 12 Information Technology Exam P2
18 pages
C Programming Output Analysis Guide
No ratings yet
C Programming Output Analysis Guide
33 pages
Front-End Web Design Internship Report
No ratings yet
Front-End Web Design Internship Report
35 pages
Java I/O Streams Overview
No ratings yet
Java I/O Streams Overview
50 pages
Ap Csa Unit 6 MCQ Scoring Guide Analysis and Solutions
No ratings yet
Ap Csa Unit 6 MCQ Scoring Guide Analysis and Solutions
90 pages
Graph Algorithms: DFS, BFS, and Pseudocode
No ratings yet
Graph Algorithms: DFS, BFS, and Pseudocode
4 pages
RARS Programs for RISC-V Basics
No ratings yet
RARS Programs for RISC-V Basics
37 pages
C++ Cinemax Ticket Booking System
No ratings yet
C++ Cinemax Ticket Booking System
10 pages
Understanding Standard Form in Math
No ratings yet
Understanding Standard Form in Math
37 pages
VN Maker Basics and Navigation Guide
No ratings yet
VN Maker Basics and Navigation Guide
38 pages
University of Ilorin Course Registration Form
No ratings yet
University of Ilorin Course Registration Form
1 page

Introduction To Floating Point Representation: CS453 Computer System Design

Uploaded by

Introduction To Floating Point Representation: CS453 Computer System Design

Uploaded by

CS453 Computer System Design

Introduction to Floating Point Representation

• Floating point numbers

Negative underflow Positive underflow

• Biased exponent: true exponent range

Negative underflow Positive underflow

• 85 = 1010101 : 0.125 = 001

• 85.125 = 1010101.001 = 1.010101001 x 26 [sign = 0]

• Biased exponent 127+6=133→= 10000101

• The IEEE 754 Single precision is: = 0 10000101 01010100100000000000000

Decimal (-1)S × (1 + b1×2-1 + b2×2-2 + b3×2-3 + b4×2-4) × 2E

Example: -1.1100 × 2-2 (binary)

2. Addition: Add significands and report error if overflow occurs.

-Shift significand bits to normalize.

- report overflow or underflow if exponent goes out of range.

You might also like