What is Latency? How to get Low Latency?
December 26, 2022
Traditional TV broadcasting is progressively being replaced by multiscreen streaming, even for live content. But one frequent drawback is the additional latency induced by streaming.
Latency is the time it takes for the video to be distributed from its capture until it reaches end users. This means that at one specific moment, the images displayed on a screen are not the live action but images from the past captured several seconds before.
Streaming latency is generally between 30 to 40 seconds versus around 4 seconds for traditional broadcast. Therefore, live events such as football games might be spoiled for users when they get notifications from friends or hear from loud neighbors about a goal 30 seconds before they can actually see it themselves.
Latency comes from 3 main stages in the streaming chain:
Encoding and packaging — account for about 3 seconds each — delivery typically reaches up to 25 seconds, corresponding to the worst-case content speed on very variable HTTP networks. Latency at these different stages happens as a trade-off to improve some aspects of the user experience.
The goal of low-latency streaming is to reduce the number of seconds of delay and match the broadcast standard while preserving these other aspects of the user experience.
Reducing latency at the encoding stage is not usually considered the right approach. Indeed, the less time the encoder is given to do its job, the less efficient it is at video compression. If you reduce the latency at the encoding stage, a higher video bitrate will be needed for a given quality. With the adoption of new codecs, such as H.264 or H.265, the trend is to improve encoding efficiency, which increases latency.
Moreover, as streaming shares similar encoding principles with broadcast, generally favoring compression efficiency, it already achieves a similar latency to broadcast at the encoding stage.
Delivery, on the other hand, generally offers a great opportunity for optimization, with more to gain and little to no trade-off on quality.
Streaming relies on HTTP networks which, by nature, deliver content at a variable speed, whereas live video is consumed at a fixed pace by the player. To compensate and make sure that the player is constantly fed, the player is provided with a buffer that withholds content for a configurable amount of time. The network latency is equal to this buffer time. The shorter it is, the smaller the latency.
The best way to reduce latency is therefore to optimize the CDN, and create the conditions allowing to safely shorten the buffer without risking that it ever empties out and that the player stops. One particularly efficient solution is to distribute the video segments through multicast ABR. Multicast ABR is a technology that enables operators to send a single copy of live content to all users at once utilizing a very small amount of reserved bandwidth. Multicast ABR ensures secure and steady delivery with nearly no constraint on the buffer.
Optimizing the CDN reduces the latency delta between broadcast and streaming, typically down to around 5 seconds. It is possible to lower it even further by addressing another challenge: the segmented nature of streaming, which affects both the packaging and delivery stages.
A video stream is made of a succession of segments representing 2 to 6 seconds of content. Each segment has to be fully downloaded at every step of the delivery chain before it can be made available downstream, accumulating a significant latency when reaching the player.
One efficient way around this, available in both HLS and DASH protocols, is to use the combination of the CMAF media format and HTTP chunked transfer encoding. Both use the same principle of subdividing segments into smaller chunks. These chunks are downloaded and transmitted faster, in a continuous flow, nearly canceling the download delay effect and allowing to reduce further the player buffer. However, this reduction is only possible after the CDN delivery has been secured or it will cause frequent player rebuffering. CMAF and CTE formats are relevant only as a second step, after CDN optimization.
Ultimately, by combining an optimized CDN with CMAF and CTE, video streaming can reach a latency similar to broadcast, with a delta limited to 1 second only. This solution enables a full and flawless streaming experience for end users, with no notable regression in latency, even when viewers are consuming their favorite live events.