CS453 Computer System Design
Introduction to Floating Point Representation
John Jose
Associate Professor
Department of Computer Science & Engineering
Indian Institute of Technology Guwahati
Integers and fixed-point numbers
• Integers: the universe is infinite but discrete
• No numbers between consecutive integers, e.g., 5 and 6
• A countable (finite) number of items in a finite range
• Referred to as fixed-point numbers
Real Numbers and floating-point numbers
• Real numbers – the universe is infinite and continuous
• Fractions represented by decimal notation
• Rational numbers, e.g., 5/2 = 2.5
• Irrational numbers, e.g., 22/7 = 3.14159265 . . .
• Infinite numbers exist even in the smallest range
• Referred to as floating-point numbers
• A large number: 976,000,000,000,000 = 9.76 × 1014
• A small number: 0.0000000000000976 = 9.76 × 10 –14
Standard Scientific Notation
• Decimal numbers
• 0.513×105, 5.13×104 and 51.3×103 are written in
scientific notation.
• 5.13×104 is the normalized scientific notation.
• Binary numbers
• Base 2
• Binary point – multiplication by 2 moves the point to
the right.
• Normalized scientific notation, e.g., 1.02×2 –1
Floating Point Numbers
• General format : ±1.bbbbb2×2eeee
or (-1)S × (1+F) × 2E
• Where
• S = sign, 0 for positive, 1 for negative
• F = fraction (or mantissa) as a binary integer,
1+F is called significand
• E = exponent as a binary integer, positive or negative (two’s
complement)
Numbers in 32-bit Formats
• Two’s complement integers
Expressible numbers
-231 0 231-1
• Floating point numbers
Negative underflow Positive underflow
–∞ -0 +0
+∞
Negative Expressible Expressible Positive
Overflow negative positive Overflow
numbers numbers
0
IEEE 754 Floating Point Standards
IEEE 754 Floating Point Standard
• Single Precision Floating point numbers
• Biased exponent: true exponent range
• [-126,127] is changed to [1, 254]
• Biased exponent is an 8-bit positive binary integer.
• True exponent obtained by subtracting 127ten or 01111111two
• First bit of significand is always 1
• ± [Link] . . . b × 2E
• 1 before the binary point is implicitly assumed.
• Significand field represents 23-bit fraction after the binary point.
IEEE 754 Floating Point Standard
• Single Precision Floating point numbers
normalized E F
bits 23-30 bits 0-22
Sign bit S
1 1011001 01001100000000010001101
Positive integer – 127 = E
Negative underflow Positive underflow
–∞ –0 +0
+0 +∞
Negative Expressible Expressible Positive
Overflow negative positive Overflow
numbers -2-126 0 2-126 numbers
Decimal Fraction to Binary (IEEE 754) conversion
• Represent 85.125 in IEEE 754 format
• 85 = 1010101 : 0.125 = 001
• 85.125 = 1010101.001 = 1.010101001 x 26 [sign = 0]
• Biased exponent 127+6=133→= 10000101
• Normalised mantissa = 010101001 (we will add 0's to complete the 23 bits)
• The IEEE 754 Single precision is: = 0 10000101 01010100100000000000000
• Hexadecimal form 0100,0010,1010,1010,0100,0000,0000,0000
42AA4000
Binary to Decimal Fraction Conversion
Binary (-1)S (1.b1b2b3b4) × 2E
Decimal (-1)S × (1 + b1×2-1 + b2×2-2 + b3×2-3 + b4×2-4) × 2E
Example: -1.1100 × 2-2 (binary)
= - (1 + 2-1 + 2-2) ×2-2
= - (1 + 0.5 + 0.25)/4
= - 1.75/4
= - 0.4375 (decimal)
Conversion From Hex to Decimal
• R1= 0x42220000
0 100 0010 0010 0010 0000 0000 0000 0000
→ E’= 100 0010 0 →132 → E=132-127=5 → 25
+ 1.0100010 x 25 = 101000.10 = +40.5
• R2=0xC12E0000
1 100 0001 0010 1110 0000 0000 0000 0000
→ E’= 100 0001 0 →130 → E=130-127=3 → 23
= - 1.0101110 x 23 = 1010 = -10 + (0.5+0.25+0.125) = -10.875
• R3=0xC0800000
1 100 0000 1000 0000 0000 0000 0000 0000
= - 1.00 x 22 = 0100 = -4
Positive Zero in IEEE 754
0 00000000 00000000000000000000000
Biased Fraction
exponent
• + 1.0 × 2 –127
• Smaller than the smallest positive number in single-precision IEEE 754
standard.
• Interpreted as positive zero.
• True exponent less than –126 is positive underflow
Negative Zero in IEEE 754
1 00000000 00000000000000000000000
Biased Fraction
exponent
• – 1.0 × 2 –127
• Greater than the largest negative number in single-precision IEEE 754
standard.
• Interpreted as negative zero.
• True exponent less than –126 is negative underflow
Positive Infinity in IEEE 754
0 11111111 00000000000000000000000
Biased Fraction
exponent
• + 1.0 × 2128
• Greater than the largest positive number in single-precision IEEE 754
standard.
• Interpreted as + ∞
• If true exponent > 127, then the number is greater than ∞. It is called “not
a number” or NaN and may be interpreted as ∞.
Negative Infinity in IEEE 754
1 11111111 00000000000000000000000
Biased Fraction
exponent
• –1.0 × 2128
• Smaller than the smallest negative number in single-precision IEEE 754
standard.
• Interpreted as - ∞
• If true exponent > 127, then the number is less than - ∞. It is called “not a
number” or NaN and may be interpreted as - ∞.
FP Addition and Subtraction
1. Significand alignment: Right shift significand of smaller exponent until
two exponents match.
2. Addition: Add significands and report error if overflow occurs.
If significand = 0, return result as 0.
3. Normalization
-Shift significand bits to normalize.
- report overflow or underflow if exponent goes out of range.
4. Rounding
Example (4 Significant Fraction Bits)
• Subtraction: 0.5ten – 0.4375ten
• Floating point numbers to be added
1.000two× 2 –1 and –1.110two× 2 –2
• Significand of lesser exponent is shifted right until exponents match
–1.110two× 2 –2 → – 0.111two× 2 –1 01000
• Add significands, 1.000two + ( – 0.111two) +11001
Result is 0.001two × 2 –1 00001
2’s complement addition,
• Normalize, 1.000two× 2 – 4 one bit added for sign
No overflow/underflow since
127 ≥ exponent ≥ –126
1.000two × 2 – 4 = (1+0)/16 = 0.0625ten
FP Multiplication
1. Separate sign
2. Add exponents (integer addition)
3. Multiply significands (integer multiplication)
4. Normalize, round, check overflow/underflow
5. Replace sign
johnjose@[Link]
[Link]