Congestion Control in TCP
Introduction
Many very important networking issues exist at the TCP layer:
For instance:
• How to reliable communicate over an unreliable channel
• How should a sender restrain itself: flow and congestion control
• How to agree on the action to take for lost, delayed, reordered
messages
• How to deal with large data: fragmentation and reassembly
• How to guarantee performance when resources shared
TCP segment
32 bits
URG: urgent data counting
(generally not used) source port # dest port #
by bytes
sequence number of data
ACK: ACK #
valid acknowledgement (not segments!)
U A Pnumber
head not
PSH: push data now len used
R S F Receive window
(generally not used) # bytes
checksum Urg data pnter
rcvr willing
RST, SYN, FIN: to accept
Options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data
checksum (variable length)
(as in UDP)
Congestion Control:
• Congestion: informally: “too many sources sending too much
data too fast for network to handle”
• Different from flow control!
• Manifestations:
– Packet loss (buffer overflow at routers)
– Increased end-to-end delays (queuing in router buffers)
• Results in unfairness and poor utilization of network resources
– Resources used by dropped packets (before they were lost)
– Retransmissions
– Poor resource allocation at high load
Historical Perspective:
• October 1986, Internet had its first congestion collapse
• Link LBL to UC Berkeley
– 400 yards, 3 hops, 32 Kbps
– throughput dropped to 40 bps
– factor of ~1000 drop!
• Van Jacobson proposes TCP Congestion Control:
– Achieve high utilization
– Avoid congestion
– Share bandwidth
Congestion Control: Approaches
• Goal: Throttle senders as needed to ensure load on the
network is “reasonable”
• End-end congestion control:
– No explicit feedback from network
– Congestion inferred from end-system observed loss,
delay
– This is the approach taken by TCP
• Network-assisted congestion control:
– routers provide feedback to end systems
– single bit indicating congestion (e.g., ECN)
– explicit rate sender should send at
TCP Congestion Control: Overview
• End-end control (no help from the network)
• Limit the number of packets in the network to window W
• Roughly,
W
rate = Bytes/sec
RTT
• W is dynamic, function of perceived network congestion
TCP Congestion Controls
• Tahoe (Jacobson 1988)
– Slow Start
– Congestion Avoidance
– Fast Retransmit
• Reno (Jacobson 1990)
– Fast Recovery
• SACK
• Vegas (Brakmo & Peterson 1994)
– Delay and loss as indicators of congestion
Slow Start:
• “Slow Start” is used to reach the receiver
sender
equilibrium state cwnd
• Initially: W = 1 (slow start) 1 data
segment
• On each successful ACK:
ACK
WW+1 2
• Exponential growth of W
each RTT: W 2 x W
3
• Enter CA when 4
W >= ssthresh
• ssthresh: window size after which 5
6
TCP cautiously probes for bandwidth 7
8
Congestion Avoidance:
• Starts when sender receiver
W ssthresh 1 data
• On each successful ACK: segment
ACK
2
W W+ 1/W
• Linear growth of W each
RTT:
3
WW+1
4
CA: Additive Increase, Multiplicative Decrease
• “additive increase” in the absence of loss events
• After loss event, decrease congestion window by half –
“multiplicative decrease”
– ssthresh = W/2
– Enter Slow Start
Detecting Packet Loss:
• Assumption: loss indicates 1
0
1
congestion 1
1
2
X
1
3
1
4
1
• Option 1: time-out 5
1
6
1
– Waiting for a time-out can 1
0
1
7
be long! 1
1
1
1
1
1
• Option 2: duplicate ACKs 1
– How many? At least 3.
Sender Receiver
Fast Retransmit:
• Wait for a timeout is quite long
• Immediately retransmits after 3 dupACKs without waiting
for timeout
• Adjusts ssthresh
ssthresh W/2
• Enter Slow Start
W=1
How to Set the TCP Timeout Value?
• Longer than RTT
– but RTT varies
• Too short: premature timeout
– unnecessary retransmissions
• Too long: slow reaction to segment loss
How to Estimate RTT?
• SampleRTT: measured time from segment
transmission until ACK receipt
– ignore retransmissions
• SampleRTT will vary, want estimated RTT
“smoother”
– average several recent measurements, not just
current SampleRTT
TCP Round-Trip Time and Timeout:
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
• EWMA
• influence of past sample decreases
exponentially fast
• typical value: = 0.125