CMSC216: Binary Floating Point Numbers
Chris Kauffman
Last Updated:
Thu Feb 29 09:14:52 AM EST 2024
1
Logistics
Reading Bryant/O’Hallaron Assignments
▶ Ch 2.1-3: Integers ▶ Lab05: Bits and GDB
▶ Minor bugs in Makefile
▶ Ch 2.4-5: Floats
and Quiz code line
▶ 2021 Quick Guide to GDB numbers
▶ HW05: Assembly Intro
Goals ▶ Project 2: Bitwise Ops,
▶ Finish Ints / Bitwise Ops GDB, C Application
▶ gdb introduction ▶ Delayed for release, out
▶ Floating Point layout later today
▶ 10 days to compelte
▶ Thu: Assembly
Grading on Exam 1 / Project 1 ongoing, release grades towards
end of week
2
Don’t Give Up, Stay Determined!
▶ If Project 1 / Exam 1 went awesome, count yourself lucky
▶ If things did not go well, Don’t Give Up
▶ Spend some time contemplating why things didn’t go well,
talk to course staff about it, learn from any mistakes
▶ There is a LOT of semester left and plenty of time to recover
from a bad start
3
GDB: The GNU Debugger
▶ P2 will include a “debugging problem” called puzzlebox
▶ Easiest to solve this problem using GDB (or some other
debugger)
▶ Debuggers allow one to stop time in a program, inspect
variables, pause execution at certain points and skip forwards
▶ If you’ve added tons of printf()’s to your code and still can’t
figure out what’s going on, a Debugger is your next option
▶ Phase 03 of puzzlebox is part of Lab05, makes a good demo
of GDB basics
▶ Associated Reading: 2021 Quick Guide to GDB
4
Note on Float Coverage
▶ Floating point layout is complex and interesting but. . .
▶ It’s not a core topic that will appear on any exams, only
tangentially on assignments
▶ Our coverage will be brief, examine slides / textbook if you
want more depth
▶ GOAL: Demonstrate that (1) Real numbers can be
approximated and (2) doing so uses bits in a very different
way than integer representations
5
Parts of a Fractional Number
The meaning of the “decimal point” is as follows:
123.40610 =1 × 102 + 2 × 101 + 3 × 100 + 123 = 100 + 20 + 3
4 6
4 × 10−1 + 0 × 10−2 + 6 × 10−3 0.406 = +
10 1000
=123.40610
Changing to base 2 induces a “binary point” with similar meaning:
110.1012 =1 × 22 + 1 × 21 + 0 × 20 + 6=4+2
1 1
1 × 2−1 + 0 × 2−2 + 1 × 2−3 0.625 = +
2 8
=6.62510
One could represent fractional numbers with a fixed point e.g.
▶ 32 bit fractional number with
▶ 10 bits left of Binary Point (integer part)
▶ 22 bits right of Binary Point (fractional part)
BUT most applications require a more flexible scheme
6
Scientific Notation for Numbers
“Scientific” or “Engineering” notation for numbers with a
fractional part is
Standard Scientific printf("%.4e",x);
123.456 1.23456 × 102 1.2346e+02
50.01 5.001 × 101 5.0010e+01
3.14159 3.14159 × 100 3.1416e+00
0.54321 5.4321 × 10-1 5.4321e-01
0.00789 7.89 × 10-3 7.8900e-03
▶ Always includes one non-zero digit left of decimal place
▶ Has some significant digits after the decimal place
▶ Multiplies by a power of 10 to get actual number
Binary Floating Point Layout Uses Scientific Convention
▶ Some bits for integer/fractional part
▶ Some bits for exponent part
▶ All in base 2: 1’s and 0’s, powers of 2
7
Conversion Example
Below steps convert a decimal number to a fractional binary
number equivalent then adjusts to scientific representation.
float fl = -248.75;
7 6 5 4 3 2 1 0 -1 -2
-248.75 = -(128+64+32+16+8+0+0+0).(1/2+1/4)
= -11111000.11 *2^0
76543210 12
= -1111100.011 *2^1
6543210 123
= -111110.0011 *2^2
543210 1234
...
MANTISSA EXPONENT
= -1.111100011 * 2^7
0 123456789
Mantissa ≡ Significand ≡ Fractional Part
8
Principle and Practice of Binary Floating Point Numbers
▶ In early computing, computer manufacturers used similar
principles for floating point numbers but varied specifics
▶ Example of Early float data/hardware
▶ Univac: 36 bits, 1-bit sign, 8-bit exponent, 27-bit significand1
▶ IBM: 32 bits, 1-bit sign, 7-bit exponent, 24-bit significand2
▶ Manufacturers implemented circuits with different rounding
behavior, with/without infinity, and other inconsistencies
▶ Troublesome for reliability: code produced different results on
different machines
▶ This was resolved with the adoption of the IEEE 754 Floating
Point Standard which specifies
▶ Bit layout of 32-bit float and 64-bit double
▶ Rounding behavior, special values like Infinity
▶ Turing Award to William Kahan for his work on the standard
1
Floating Point Arithmetic
2
IBM Hexadecimal Floats
9
IEEE 754 Format: The Standard for Floating Point
float double Property
32 64 Total bits
1 1 Bits for sign (1 neg / 0 pos)
8 11 Bits for Exponent multiplier (power of 2)
23 52 Bits for Fractional part or mantissa
7.22 15.95 Decimal digits of accuracy3
▶ Most commonly implemented format for floating point
numbers in hardware to do arithmetic: processor has physical
circuits to add/mult/etc. for this bit layout of floats
▶ Numbers/Bit Patterns divided into three categories
Category Description Exponent
Normalized most common like 1.0 and -9.56e37 mixed 0/1
Denormalized very close to zero and 0.0 all 0’s
Special extreme/error values like Inf and NaN all 1’s
3
Wikipedia: IEEE 754
10
Example float Layout of -248.75: float_examples.c
Source: IEEE-754 Tutorial, [Link]fl[Link]
Color: 8-bit blocks, Negative: highest bit, leading 1
Exponent: high 8 bits, 27 encoded with Fractional/Mantissa portion is
bias of -127 1.111100011...
1000_0110 - 0111_1111 ^ |||||||||
= 128+4+2 - 127 | explicit low 23 bits
= 134 - 127 |
= 7 implied leading 1
not in binary layout
11
Normalized Floating Point: General Case
▶ A “normalized” floating point number is in the standard range
for float/double, bit layout follows previous slide
▶ Example: -248.75 = -1.111100011 * 2^7
Exponent is in Bias Form (not Two’s Complement)
▶ Unsigned positive integer minus constant bias number
▶ Consequence: exponent of 0 is not bitstring of 0’s
▶ Consequence: tiny exponents like -125 close to bitstring of
0’s; this makes resulting number close to 0
▶ 8-bit exponent 1000 0110 = 128+4+2 = 134
so exponent value is 134 - 127 = 7
Integer and Mantissa Parts
▶ The leading 1 before the binary point is implied so does not
show up in the bit string
▶ Remaining fractional/mantissa portion shows up in the
low-order bits 12
Fixed Bit Standards for Floating Point
IEEE Standard Layouts
Kind Sign Exponent Mantissa
Bit Bits Bias Exp Range Bits
float 31 (1) 30-23 (8 bits) -127 -126 to +127 22-0 (23 bits)
double 63 (1) 62-52 (11 bits) -1023 -1022 to +1023 51-0 (52 bits)
Standard allows hardware to be created that is as efficient as
possible to do calculation on these numbers
Consequences of Fixed Bits
▶ Since a fixed # of bit is used, some numbers cannot be
exactly represented, happens in any numbering system:
▶ Base 10 and Base 2 cannot represent 31 in finite digits
▶ Base 2 cannot represent 1
10 in finite digits
float f = 0.1;
printf("0.1 = %.20e\n",f);
0.1 = 1.00000001490116119385e-01
Try show_float.c to see this in action
13
Exercise: Quick Checks
1. What distinct parts are represented by bits in a floating point
number (according to IEEE)
2. What is the “bias” of the exponent for 32-bit floats
3. Represent 7.125 in binary using “binary point” notation
4. Lay out 7.125 in IEEE-754 format
5. What does the number 1.0 look like as a float?
Source: IEEE-754 Tutorial, [Link]fl[Link]
The diagram above may help in recalling IEEE 754 layout
14
Special Cases: See float_examples.c
Special Values
▶ Infinity: exponent bits all 1, fraction all 0, sign bit indicates
+∞ or −∞
▶ Infinity results from overflow/underflow or certain ops like
float x = 1.0 / 0.0;
▶ #include <math.h> gets macro INFINITY and -INFINITY
▶ NaN: not a number, exponent bits all 1, fraction has some 1s
▶ Errors in floating point like 0.0 / 0.0
Denormalized values: Exponent bits all 0
▶ Fractional/Mantissa portion evaluates without implied leading
one, still an unsigned integer though
▶ Exponent is Bias + 1: 2-126 for float
▶ Result: very small numbers close to zero, smaller than any
other representation, degrade uniformly to 0
▶ Zero: bit string of all 0s, optional leading 1 (negative zero);
15
Other Float Notes
Source: XKCD #217
Approximations and Roundings Clever Engineering
▶ Approximate with 4 digits,
2
3
▶ IEEE 754 allows floating point
usually 0.6667 with standard numbers to sort using signed
rounding in base 10 integer sorting routines
▶ Similarly, some numbers cannot ▶ Bit patterns for float follows are
be exactly represented with fixed ordered nearly the same as bit
1
number of bits: 10 approximated patterns for signed int
▶ IEEE 754 specifies various ▶ Integer comparisons are usually
rounding modes to approximate fewer clock cycles than floating
numbers comparisons
16
Sidebar: The Weird and Wonderful Union
▶ Bitwise operations like & are // union.c
typedef union { // shared memory
not valid for float/double float fl; // an float
int in; // a int
▶ Can use pointers/casting to char ch[4]; // char array
get around this OR. . . } flint_t; // 4 bytes total
▶ Use a union: somewhat int main(){
flint_t flint;
unique construct to C [Link] = 0xC378C000;
▶ Defined like a struct with printf("%.4f\n", [Link]);
printf("%08x %d\n",[Link],[Link]);
several fields for(int i=0; i<4; i++){
unsigned char c = [Link][i];
▶ BUT fields occupy the same printf("%d: %02x '%c'\n",i,c,c);
}
memory location (!?!) }
▶ Allows one to treat a byte | Symbol | Mem | Val |
|-------------------+-------+------|
position as multiple different | [Link][3] | #1027 | 0xC3 |
| [Link][2] | #1026 | 0x78 |
types, ex: int / float / | [Link][1] | #1025 | 0xC0 |
char[] | [Link]/fl/ch[0] | #1024 | 0x00 |
| i | #1020 | ? |
▶ Memory size of the union is
the max of its fields 17
Floating Point Operation Efficiencies
▶ Floating Point Operations per Second, FLOPS is a major
measure for numerical code/hardware efficiency
▶ Often used to benchmark and evaluate scientific computer
resources, (e.g. top super computers in the world)
▶ Tricky to evaluate because of
▶ A single FLOP (add/sub/mul/div) may take 3 clock cycles to
finish: latency 3
▶ Another FLOP can start before the first one finishes:
pipelined
▶ Enough FLOPs lined up can get average 1 FLOP per cycle
▶ FP Instructions may automatically operate on multiple FPs
stored in memory to feed pipeline: vectorized ops
▶ Generally referred to as superscalar
▶ Processors schedule things out of order too
▶ All of this makes micro-evaluation error-prone and pointless
▶ Run a real application like an N-body simulation and compute
number of floating ops done
FLOPS =
time taken in seconds
18
Top 5 Super Computers Worldwide, June 2023
Rmax Rpeak Power*
Rank System #Cores (TFlop/s) (TFlop/s) (kW)
1 Frontier, USA / Oak Ridge 8,699,904 1,194.00 1,679.82 22,703
Cray EX235a, AMD EPYC 2GHz
(x86-64)
2 Fugaku, Japan / Fujitsu 7,630,848 442,010.0 537.21 29,899
Fujitsu A64FX 2.2GHz
(Arm)
3 LUMI Finland / EuroHPC 2,220,288 309.10 428.70 6,016
Cray EX235a, AMD EPYC 2GHz
(x86-64)
4 Leonardo Italy / EuroHPC 1,824,768 238.70 304.47 7,404
5 Summit United States 2,414,592 148,600.0 200,794.9 10,096
IBM POWER9 22C 3.07GHz
(Power)
[Link]
*: An average US Home uses 909 kWh of power per month
19
Top 5 Super Computers Worldwide, June 2022
Rmax Rpeak Power*
Rank System #Cores (TFlop/s) (TFlop/s) (kW)
1 Frontier, USA / Oak Ridge 8,730,112 1,102.00 1,685.65 21,100
Cray EX235a, AMD EPYC 2GHz
(x86-64)
2 Fugaku, Japan / Fujitsu 7,630,848 442,010.0 537,212.0 29,899
Fujitsu A64FX 2.2GHz
(Arm)
3 LUMI Finland / EuroHPC 1,110,144 151.90 214.35 2,942
Cray EX235a, AMD EPYC 2GHz
(x86-64)
4 Summit United States 2,414,592 148,600.0 200,794.9 10,096
IBM POWER9 22C 3.07GHz
(Power)
5 Sierra United States 1,572,480 94,640.0 125,712.0 7,438
IBM POWER9 22C 3.1GHz
(Power)
[Link]
*: An average US Home uses 909 kWh of power per month
20
Top 5 Super Computers Worldwide, June 2021
Rmax Rpeak Power
Rank System #Cores (TFlop/s) (TFlop/s) (kW)
1 Fugaku, Japan / Fujitsu 7,630,848 442,010.0 537,212.0 29,899
Fujitsu A64FX 2.2GhZ
(Arm)
2 Summit United States 2,414,592 148,600.0 200,794.9 10,096
IBM POWER9 22C 3.07GHz
(Power)
3 Sierra United States 1,572,480 94,640.0 125,712.0 7,438
IBM POWER9 22C 3.1GHz
(Power)
4 Sunway TaihuLight China 10,649,600 93,014.6 125,435.9 15,371
Sunway SW26010
(custom RISC)
5 Perlmutter, United States 706,304 64,590.0 89,794.5 2,528
AMD EPYC 2.45GHz, Cray
(x86-64)
[Link]
21
Top 5 Super Computers Worldwide, Nov 2020
Rmax Rpeak Power
Rank System #Cores (TFlop/s) (TFlop/s) (kW)
1 Fugaku, Japan / Fujitsu 7,299,072 415,530.0 513,854.7 28,335
Fujitsu A64FX 2.2GhZ
(Arm)
2 Summit United States 2,397,824 143,500.0 200,794.9 10,096
IBM POWER9 22C 3.07GHz
(Power)
3 Sierra United States 1,572,480 94,640.0 125,712.0 7,438
IBM POWER9 22C 3.1GHz
(Power)
4 Sunway TaihuLight China 10,649,600 93,014.6 125,435.9 15,371
Sunway SW26010
(custom RISC)
5 Selene USA, NVIDIA/AMD 555,520 63,460.0 79,215.0 2,646
AMD EPYC 7742 64C 2.25GHz
(x86-64)
[Link]
22
Top 5 Super Computers Worldwide, June 2020
Rmax Rpeak Power
Rank System #Cores (TFlop/s) (TFlop/s) (kW)
1 Fugaku, Japan / Fujitsu 7,299,072 415,530.0 513,854.7 28,335
Fujitsu A64FX 2.2GhZ
(Arm)
2 Summit United States 2,397,824 143,500.0 200,794.9 10,096
IBM POWER9 22C 3.07GHz
(Power)
3 Sierra United States 1,572,480 94,640.0 125,712.0 7,438
IBM POWER9 22C 3.1GHz
(Power)
4 Sunway TaihuLight China 10,649,600 93,014.6 125,435.9 15,371
Sunway SW26010
(custom RISC)
5 Tianhe-2A China 4,981,760 61,444.5 100,678.7 18,482
Intel Xeon 2.2GHz
(x86-64)
[Link]
23
Top 5 Super Computers Worldwide, Nov 2019
Rmax Rpeak Power
Rank System #Cores (TFlop/s) (TFlop/s) (kW)
1 Summit United States 2,397,824 143,500.0 200,794.9 9,783
IBM POWER9 22C 3.07GHz
2 Sierra United States 1,572,480 94,640.0 125,712.0 7,438
IBM POWER9 22C 3.1GHz,
3 Sunway TaihuLight China 10,649,600 93,014.6 125,435.9 15,371
Sunway MPP
4 Tianhe-2A China 4,981,760 61,444.5 100,678.7 18,482
Xeon 2.2GHz
5 Frontera, United States 448,448 23,516.4 38,745.9 ??
Dell 6420, Xeons 2.7GHz
[Link]
24
Top 5 Super Computers Worldwide, Nov 2018
Rmax Rpeak Power
Rank System #Cores (TFlop/s) (TFlop/s) (kW)
1 Summit United States 2,397,824 143,500.0 200,794.9 9,783
IBM POWER9 22C 3.07GHz
2 Sierra United States 1,572,480 94,640.0 125,712.0 7,438
IBM POWER9 22C 3.1GHz,
3 Sunway TaihuLight China 10,649,600 93,014.6 125,435.9 15,371
Sunway MPP
4 Tianhe-2A China 4,981,760 61,444.5 100,678.7 18,482
TH-IVB-FEP Cluster
5 Piz Daint Switzerland 387,872 21,230.0 27,154.3 2,384
Cray XC50, Xeon E5-2690v3
[Link]
25
Top 5 Super Computers Worldwide, Nov 2017
Rmax Rpeak Power
Rank System #Cores (TFlop/s) (TFlop/s) (kW)
1 Sunway TaihuLight China 10,649,600 93,014.6 125,435.9 15,371
Sunway MPP
2 Tianhe-2 (MilkyWay-2) China 3,120,000 33,862.7 54,902.4 17,808
TH-IVB-FEP Cluster
3 Piz Daint Switzerland 361,760 19,590.0 25,326.3 2,272
Cray XC50
4 Gyoukou Japan 19,860,000 19,135.8 28,192.0 1,350
ZettaScaler-2.2 HPC system
5 Titan USA 560,640 17,590.0 27,112.5 8,209
Cray XK7
[Link]
26