As video streaming over IP continues to increase as an application, the conversation typically turns to bandwidth and compression. The general expectation when streaming outside of a LAN network is a 1GB pipeline. Even full HD (1920 x 1080), which is slightly less than half the data of 4K, requires over 2GB/s to run without compression. 4K60 4:4:2 requires 10.6GB/s and 4K60 4:4:4 needs 12.7GP/s to run without compression. While certainly possible in dedicated networks and the budget for the switches to do it, it isn’t the most practical for multi-room applications and streaming outside of a single LAN.
Video compression is the answer to getting the stream to run on smaller bandwidth networks; but that brings up questions about compression algorithms. We will explore H.264, H.265, JPEG2000, and M-JPEG in order to provide a practical understanding of each of them.
The first difference is between lossless and lossy compression. Lossless compression preserves all of the original data while rendering it with less raw data. Lossy compression compresses some of the data, and removes some as well, to achieve even further compression of the data. Lossy compression may sound bad right off the bat, but it does have its uses and allows for very small data streams with low latency.
JPEG2000 and M-JPEG perform data compression on each original frame. They compress the original frame on one side then reconstruct that frame identically on the receiving side, assembling all of the original data. This is called intra-frame compression and is free of temporal artifacts. This is a relatively fast algorithm and tends to come with lower latencies, but it can’t achieve the higher compression ratios that lossy algorithms can achieve. The original focus of the JPEG algorithms was the transmission of high resolution, still images over email or on the Internet.
H.264 and H.265 are inter-frame compression algorithms that use predictive frames between the actual original image frames. These algorithms deconstruct the original frame and then construct predictive frames based on a starting point and an ending point within the original frames. Depending on how much motion there is in the images of the original frames, this can go mostly undetected, or can produce visible temporal artifacts such as jerking, floating, or flickering. Depending on the use case, this can result in very high compression ratios such as streaming internet video, but it takes time to process the algorithm and comes with higher latencies than intra-frame techniques.
There is no single answer to which compression scheme is the “best,” and part of that is that hardware manufacturers can also choose to apply each algorithm in slightly different ways. The question revolves around how much tolerance there is for latency and visible artifacts. When evaluating AV-over-IP solutions, note that the need for greater compression is crucial as compared to an in-room application. The network bandwidth becomes more variable and likely much narrower for multi-room distribution. The key is to evaluate the hardware you are considering and investigate the compression techniques being used. Counterbalance this information with the bandwidth you need and consider the compression ratio specifications provided by the manufacturer. Finally, decide which is most important, the size of the data, or the preservation of the image.
AV Technology magazine's technical advisor, Justin O'Connor, has spent nearly 20 years as a product manager, bringing many hit products to the professional AV industry. He earned his Bachelor’s degree in Music Engineering Technology from the Frost School of Music at The University of Miami. Follow him at @JOCAudioPro. Subscribe today for The Agile Control Room newsletter sponsored by RGB Spectrum (distributed twice per month, every other Tuesday).