After surviving for over four decades, TCP appears to be facing the choice sometimes given to an unreliable employee: “change or you’re outa here”. There are currently two transport protocols used in the all Internet and IP network exchanges, TCP and UDP. Transmission Control Protocol carries 80-90% of the IP traffic in all networks. User Datagram Protocol carries the rest. However, the very fact that TCP was designed to work on all sorts of connections in the Internet has turned out to be the source of its current problems.
Proposed and adopted in 1981, the original purpose of TCP was to deliver traffic reliably and as fast as possible. To do this, a significant degree of overhead was necessary. It had to formally exchange parameters for the interaction with the partner device, sequence packets, indicate how much it could receive, and, most importantly, sense when congestion had developed. This congestion control mechanism has been the focus of intense study for several decades. Since it was conceived, TCP watched for indications that packets were being dropped as a sign that there was network congestion. By default, the network will carry as much as it can until buffers fill. Then, arriving packets at a buffer are simply dropped.
To understand why TCP appears to be in need of change, I’ll explain a few ideas about how it works. Let’s say a server has been asked to transfer a “chunk” of adaptive bit rate (ABR) video. Such a chunk often represents about 10 seconds of the video and is represented by a stream of 30-50 Mbits. The sending TCP will initially enter a phase called slow start. In this phase, it sends a block consisting of a few packets and awaits acknowledgement. When the entire block is acknowledged, it doubles the size of the block and sends twice as many packets. Each time the block is acknowledged, the sender doubles the size of the outgoing block. At some point, the sending rate becomes too high for the network to absorb and a packet is dropped. The receiver detects this when the packets arrive out of order. The receiver signals this condition by sending a duplicate acknowledgement (dup ack) of the last packet that was received in order. The sender gets this dup ack and realizes that congestion has developed. It cuts its sending rate, expecting that this will relieve the congestion.
Since TCP’s inception, every version of TCP has used a mechanism like this. Only the details of the method have been modified in the four or five popular versions of the protocol. However, around 2010, a researcher at Bell Labs named Jim Gettys caught the world’s attention when he discovered that the mechanism was introducing very high levels of latency in TCP transfers. These latencies could reach 10 seconds or more. Under such conditions, web pages load very slowly, videos pause, even the Google search page comes up as if it had been asleep. This started a period of intense study by researchers from the Internet community, Microsoft, Cisco, Google, Comcast, and others. How could this latency be reduced while at the same time preserving congestion notification?
A few years ago, these researchers agreed that the problem was due to oversized buffers in the Internet and suggested active management of the buffer that was creating the choke point. But, Google is proposing a radically different approach: change TCP. Their idea is referred to as TCP BBR (Bandwidth Bottleneck and RTT). In their approach, the bandwidth that the network can safely handle at the bottleneck is based on an estimate of the delivery rate not the loss in the network. The data they have made available seems to indicate that by using TCP BBR, the client and server can get about the same throughput but with latencies that have been reduce by 80% or more.
Now, this new approach may seem like a minor change. But almost everyone is asking the same critical question, “Do I need to change my TCP stack?” What does this mean for clients that have the traditional version of TCP installed? Fortunately, TCP BBR is compatible with previous versions. If its major proponent, Google, is correct, the devices using TCP BBR will benefit more than the devices using the traditional version of TCP. What is very evident is that a considerable amount of energy is now being expended to research if the Google method is really as good as they say it is.
So, for any of us who like the idea of ABR video and file transfers to video storage, we need to keep on top of whether this change happens or not. Both of these critical functions rely heavily on TCP’s operating characteristics.
Phil Hippensteel, PhD, is a regular contributor to AV Technology.