Diving into ultra-low latency for live using MPEG-DASH
Low latency with HTTP-streaming (segment-based) technologies is a challenge. In particular MPEG-DASH is gaining adoption among international industry consortiums such as DVB with DVB DASH or ETSI with HbbTV 2.0 (not published yet), with a growing focus on live use-cases. For such applications, latency is a concern.
In this article, we will show you how the GPAC team studied the impact on the overhead of HD streaming with very low latency using MPEG-DASH and demonstrated that overhead on the transport side is negligible, in the order of 1%. At 1% overhead, we could demonstrate a 240ms latency. With such a low latency, interactive or bidirectional applications such as video conferencing or live streaming with voting become possible.
Of course such low latency can only be safely reproduced in local network conditions. Yet it shows that the latency is not due to the MPEG-DASH technology but rather to the network conditions. It also shows that a few technical choices can dramatically reduce the latency.
All the tools used for this demonstration are available as free software. Feel free to try and contact us if you have any questions.
Sources of latency
Here is a list of latency sources at the client side:
- The video encoding: traditional encodings rely on a pattern of encoding with Random Access Points (RAP). While this pattern is very bitrate-efficient, it creates a variable latency from 0 frame (if the first retrieved frame is a RAP) to the length of the GoP (if the first retrieved frame just follows a RAP).
- The segment caching and buffering: most players buffer many segments (or seconds of data depending on the buffering policy). For example the Apple HLS implementation adds at least 10 seconds of buffering at this stage.
- MPEG-DASH parameters: most parameters are not tuned at all. The presence of intermediary entities such as CDNs, which do not yet rewrite MPEG-DASH manifests to reflect the latency added by caching servers, forces content generator to add latency. This has another unwanted consequence: some players are aware that the signalled buffering is probably excessive and act aggressively by retrieving segments in advance. This can lead to starvation and an unpleasant experience for the watcher.
Solutions
The article takes advantages of several mechanisms to address the concerns exposed above:
- The video encoding: H264 Gradual Decoding Refresh (GDR). GDR allows the intra-refreshing process to occur regularly over a constant number of frames. Said clearly: when you start decoding, you know how many frames you need to decode before getting a fully-decodable image, and this number is constant. Thus the latency is constant. The x264 free software encoder has supported this feature for years, and certified decoders also do. The bitrate overhead of such a choice is estimated to 13% for HD content, 30% for CIF content. The lower the latency, the more overhead.
- The segment caching and buffering: first the player should be able to use segments before they are entirely downloaded. This is made possible using HTTP 1.1 chunk transfers to get the smallest valid part of an ISOBMF/MP4 file, called a fragment. Previously the client had no knowledge of how segments were produced and sent requests only when entire segments had been generated. Now with a proper download strategy, the latency only depends on the duration of the HTTP chunks (i.e. doesn't depend on the segment duration anymore). The overhead for HTTP and the aggressive use of fragments is detailed in the article.
-
Optimizations of some DASH MPD attributes: availabilityStartTime and minBufferTime.
- availabilityStartTime contains the UTC time at which data is ready to be processed. The GPAC team advocated for a new availabilityStartTimeOffset attribute (which became available as availabilityTimeOffset and availabilityTimeComplete in the second edition of the DASH standard). This allows to take advantage of the smaller fragments entities of ISOBMF and gets a smaller granularity compared to segments.
- minBufferTime is the DASH client buffer. It should be adjusted depending on the network conditions. The structure of the Internet makes it difficult to evaluate this value when generating the DASH content since it depends on network metrics which are unique for each user. The Internet also suffers from "jitter", the deviation around a mean latency which forces to increase the buffer size. While the Internet offers no guarantee on video delivery, legacy broadcast networks (terrestrial, satellite, ...) also have theirs. And it is up to intermediates which deliver the content to handle this value properly.
Note: the latency and jitter of the Internet have led video industry actors to pay ISP to have better network conditions. This has been recently the case between Netflix and Comcast. This has raised concerns aboutNet Neutrality but that's beyond this article's scope. What is interesting is that this mechanism is similar to what happens with IPTV: IPTV benefits from dedicated links from the network providers.
To go further
By tuning these parameters, the authors showed experimentally that it is possible to achieve a latency below 6 frames (240ms at 25 frames per second). The overhead is around 1% on the transport side, and 13% on the encoding side for HD content. The experiments were conducted using the GPAC open-source tools.
This article is an explanation about this research article published by the researchers from the GPAC Team at Telecom ParisTech. The article contains all the data to reproduce the experiments and figures cited in this article. If you have any questions, please feel free to contact the authors.