CNIT 141
Cryptography for Computer Networks
6. Hash Functions
Updated 10-1-2020
Topics
• Secure Hash Functions
• Building Hash Functions
• The SHA Family
• BLAKE2
• How Things Can Go Wrong
Use Cases of Hash
Functions
• Digital signatures
• Public-key encryption
• Message authentication
• Integrity verification
• Password storage
Fingerprinting
• Hash function creates a fixed-length "fingerprint"
• From input of any length
• Not encryption
• No key; cannot be reversed
Non-Cryptographic Hash
Functions
• Provide no security; easily fooled by crafted
input
• Used in hash tables
• To make table lookups faster
• To detect accidental errors
• Ex: CRC (Cyclic Redundancy Check)
• Used at layer 2 by Ethernet and Wi-Fi
Secure Hash Functions
Digital Signature
• Most common use case for hashing
• Message M hashed, then the hash is signed
• Hash is smaller so the signature computation is
faster
• But the hash function must be secure
Unpredictability
• Two different inputs must always have different
hashes
• Changing one bit in the input makes the hash
totally different
Non-Uniqueness
• Consider an input message 1024 bits long
• And a hash 256 bits long
• There are 21024 different messages
• But only are 2256 different hashes
• So many different messages have the same
hash
One-Wayness
• Hashing cannot be reversed
• An attacker cannot find M from H
• In principle, always true
• Many different messages have the same
hash
Preimage Resistance
• An attacker given H cannot find any M with
that hash
• This is preimage resistance
• Also called first preimage resistance
The Cost of Preimages
• Attacker hashes many messages
• Hunting for a match
• Will take 2N guesses for hash of length N bits
• For n = 256, that's forever
Second Preimage
• Given M1 and H1
• Attacker tries to find another message M2
• That hashes to the same H
• Essentially the same as first preimage resistance
• Unless the hash function has a certain type of
mathematical defect
• Like length extension attacks
Collision Resistance
• Every hash function has collisions
• Two messages with the same hash
• Because there are more possible messages
than hashes
• Pigeonhole principle
• If you have 10 holes and 11 pigeons, at least
one hole contains more than one pigeon
Collision Resistance
• Collisions should be hard to find
• In practice, impossible
Birthday Attack
• Get each person in a group to say the month
and day they were born
• With 23 people, the probability of two having
the same birthday is 51%
Finding Collisions
• If you search for collsions
• And have enough RAM to remember each
hash you calculate
• It only takes 2N/2 hashes to find a collision
• N is the number of bits in the hash
Naive Birthday Attack
• Requires 2N/2 calculations to create the list
• And 2N/2 units of storage to hold it
• Sorting the list takes even more calculations
MD5 Example
Preimage Attack
• MD5 is 128 bits long
• Given an MD5 hash, can we find an input with
that hash value?
• Calculating every possible hash (nearly) would
require 2128 calculations
• But we don't need to store them: we just
compare each one with the target
• How hard is that?
72 Bit Brute-Force
• RSA Security made several challenges to test
brute-force attacks
• Distributed.net has been attacking these
challenges since 1997
• Cracked a 56-bit key in 250 days in 1997
• Cracked a 64-bit key in 5 years in 2002
• 72-bit key attack is in progress
• So far they'te tested 5.6% of the keyspace
• It will take 127 years to test all keys
• Link Ch 6g
MD5 Preimage Attack
Safety Margin
• Max. computing available to any attacker is 296
calculations
• Preimage attack requires 2128 calculations
• Safety margin is 230 units of RAM
• 210 is 1024
• 230 is 1024 x 1024 x 1024 = 1 billion
Naive Birthday Attack on
MD5
• Take 264 inputs, such as counting numbers
from 1 to 264
• Calculating the hashes, requiring 264
calculations
• Store them, using 264 units of RAM
• What's the safety margin?
Requirements for Naive
Birthday Attack on MD5
• 264 calculations is not impossible
• But 264 units of RAM?
• 1GB = 109 = 230
• 1TB = 240
• 1 million TB = 260
• 16 million TB = 264 = 16,000 PB
• (1 PB (petabyte) = 1000 TB = 1 million GB)
Worldwide Data Center Storage
Capacity
MD5 Naive Birthday Attack
Safety Margin
• Attack requires 16,000 PB
• Whole world has less than 3,000 PB
• Safety margin is a factor of 5
• Not very large
Preimage Attack
• MD5 is 128 bits long
• Calculating every possible hash (nearly) would
require 2128 calculations
• Storing them would take 2128 units of RAM
• What's the safety margin?
Low-Memory Collision
Search: The Rho Method
• Don't just choose sequential inputs to hash
• Calculate the hash of the previous hash
• Start with any random seed s
• H1 = hash(s)
• H2 = hash(H1)
• H3 = hash (H2)
• etc...
Rho Method
Rho Method
• Tail and circle are approximately 2N/2 long
• Since the hashes are going in a circle
• You don't need to store all the hash values in
memory to detect a collision
• You store only hashes starting with many
zeroes ("Distinguished Points")
• This requires more calculations but less
storage -- time-memory trade-off
• Link Ch 6k
Rho Method
Rho Method for 16-bit
Hash
Building Hash Functions
Iterative Hashing
• A long messages is broken into blocks
• Each block is processed consecutively using
compression or permutation
Compression-Based Hash Functions:
the Merkle-Damgard Construction
• Used in MD5, MD5, SHA-1, and SHA-2
• Also RIPEMD and Whirlpool
• H0 is an initial value
• H1, H2, ... are chaining values
• The final H is the output
Padding Blocks
• Pad with 1, then zeroes, finally the message
length
• SHA-256 uses 512-bit blocks
• To hash the eight-bit message 10101010
Building Compression Functions:
The Davies-Meyer Construction
• Uses a block cipher to build a compression
function
• Use message blocks as keys
• The XOR feedback makes it secure against
decryption
Other Compression
Functions
• Less popular because
• They are more complex, or
• Require message block to be same length
as chaining value
Permutation-Based Hash
Functions: Sponge Functions
• Simpler than using a cipher
• No key
• Keccak is the most famous sponge function
• Also known as SHA-3
The SHA Family
MD5
• Broken in 2005
• Fast attacks now to generate MD5 collisions
• It still cannot be reversed in general
SHA-0
• Approved by NIST in 1993
• Replaced in 1995 by SHA-1
• To fix an unidentified security issue
• 160 bits long, so it should take 280 calculations
to find a collision
• In 1998 an attack was found requiring only
260 calculations to find a SHA-0 collision
• Later attacks work in 233 calculations--less
than an hour of computing (for SHA-0)
SHA-1
• Uses an encryption function called SHACAL
• Repeats this calculation for every 512-bit block
in the message M
Compression Function
• 160-bit chaining value broken into 5 32-bit
words a b c d e
Attacks on SHA-1
• In 2005 an attack
was found that
would find a
collision in 263
calculations
instead of 280
• In 2017 a SHA-1
collision was
found
• No longer trusted
SHA-2
• Four versions
• SHA-224, SHA-256, SHA-384, SHA-512
• SHA-256 uses
• Eight 32-bit words
• 64 rounds of calculation with a more
complex expand function
Security of SHA-2
• All four SHA-2 algorithms are still considered
strong
• But researchers and NIST grew concerned
because its algorithm is close to SHA-1's
SHA-3 Finalists
• BLAKE
• Grøstl
• JH
• Keccak
• Skein
Keccak (SHA-3)
• Completely different from SHA-1 and SHA-2
• Sponge function with a 1600-bit state
• Includes four hashes:
• SHA3-224, SHA3-256, SHA3-384, SHA3-512
• And two Extendible output functions
• Can produce hashes of any length
• Called SHAKE128 and SHAKE256
BLAKE2
Speed
• BLAKE2 is a secure as SHA-3
• But much faster to compute
• Faster than MD5 or SHA-1
• Two functions
• BLAKE2 (also called BLAKE2b) optimized for
64-bit platforms, produces digests from 1 to 64
bytes
• BLAKE2s optimized for 8-bit to 32-bit
platforms, produces digests from 1 to 32 bytes
Usage
• BLAKE2 is the fastest secure hash available
today
• The most popular non-NIST-standard hash
• Used in many apps and in major libraries such
as openSSL and Sodium
BLAKE's Compression
Function
• Parameters: a counter and a flag
• Block cipher is based on ChaCha
• Which is based on Salsa20
BLAKE2b's Core
Operations
• Transforms the state
of four 64-bit words
• a b c d
• Using two message
words
• Mi Mj
How Things Can Go
Wrong
Misuse
• Using weak checksum algorithms like CRC32
for file integrity
• Instead of a cryptographic hash algorithm
The Length-Extension
Attack
• If you know the hash H(M)
• You can add more data to the right and
calculate the hash of the longer message
• Without knowing the original message
Padding
• If the original message is properly padded, it's
actually M1 || M2
• M1 is message; M2 is padding
• || means concatenating text
• The extended message is M1 || M2 || M3
• Won't affect most applications
• But some applications do get fooled
SHA-2 Length Extension
• SHA-2 is vulnerable
• Could have easily been avoided by making
the last compression function different
• BLAKE2 does that
Fooling Proof-of-Storage
Protocols
• A cloud server verifies that it has stored a
message M
• Server can cheat by keeping only the chain
value from hashing the message
• And discarding the message
• Server can still calculate the respose by
length extension
• Works for SHA-1, SHA-2, SHA-3, and BLAKE
• Cure: hash C||M instead of M||C
Fooling Proof-of-Storage
Protocols
CNIT 141: 6. Hash Functions