How does Netflix ensure quality? Netflix technologies unveiled.
February 28, 2023
With about 200 million subscribers worldwide. Netflix is the number one SVOD platform. Their success results not only from their investment in content production and marketing but also from the quality of their video streams. And this has been achieved thanks to a smart mix of technology and innovative architecture, deployed as a service on the platform. With the help of Guillaume Bichot, our head of the Exploration Team, we will analyze the different tools deployed by Netflix to achieve a high level of Quality of Experience.
Let’s start from the beginning. What is Netflix’s strategy when it comes to encoding video content?
Netflix content is about video on demand delivered through adaptive bit-rate streaming. In 2015, they decided to re-encode the entire catalog, for a better bit rate/quality ratio. Before 2015, there was only one same bitrate ladder for all the titles.
In 2015, they re-encoded all titles with different bitrate ladders on a segment basis. They could reach a three times improvement, meaning they decreased the corresponding bandwidth need for the same quality,
In 2016, they addressed the mobile market. Before, they were using only H.264, AVC content encoders for the mobile market. They added VP9 and AVC high profile, re-encoded again all the titles on a per shot, per scene basis. So, that was again another level of significant optimization. In 2020, they did the same process for the entire Ultra HD and 4K content base.
To finely tune the encoding, a new in-house video experience user perception (mean user Quality of experience opinion) metric called VMAF was designed. This has helped them a lot to finely tune the encoding. You can see in the following table the same content being encoded in AVC main (1.2 Mbps) and re-encoded on a per shot, per scene basis achieving 400 Kbps for the same quality of experience (VMAF = 80). More than 3 times more efficient; quite a huge achievement.
This was possible, thanks to a step-by-step approach (2015, 2016, 2018, and 2020 when they addressed the UHD content) taking into account the technology evolution, with AVC main profile, adopting VP9, adopting AVC High Profile, and adopting new UHD codecs.
Netflix has set up what is called the Open Connect program. It allows network operators who justify a certain amount of Netflix traffic to request the installation of Netflix caches inside their own network. Could you tell us a bit more about this initiative and how it helps Netflix with improving video quality?
Open Connect, is the basis, the root of the Netflix CDN (Content Delivery Network) . By the way, Netflix started to operate their own CDN in 2007. They switched in 2009 to a third-party CDN and went back in 2011 with their own CDN again, with the OCA (Open Connect Appliance) initiative.
Open Connect is based on the Open Connect appliance (OCA). This is a physical machine. There are two types of machines. One type is full of disks and it contains the entire catalog of Netflix. The second type of machine is full of SSD, RAM disks, and it gets nearly 60% of the catalog, the most popular contents are in this kind of machine.
These machines are deployed in the Internet exchanges (public facilities where all the players in the Internet ecosystem can interconnect each other) for peering with ISPs or deployed within ISP’s network premises. Having a server deployed in the ISP/telecom operator network reduces the physical distance between the terminal and the server. The Open Connect machine is basically a cache full of content that provides a big improvement for the streaming experience (faster start, less/no rebuffering while decreasing the pressure on the ISP network (the content does not need to flow across the entire ISP’s network. They have around 14,000 appliances distributed in the world in many ISP network or in Internet exchanges.
Are there any specific technologies or protocols that are used by Netflix to enhance their quality level?
Let’s describe first the control plane that is the Netflix front-end application located in the cloud (hosted by AWS). This application gets the first request coming from the terminal, and drives the request to [at least] one of the selected Open Connect server(s) that is(are) going to serve the terminal.
In addition, there are a lot of optimizations that Netflix has done in the OCA platform and in the protocol itself. Regarding the latter, it is a proprietary adaptive streaming protocol working roughly like MPEG-DASH or Apple HLS).
Several optimizations have been developed. For example, when starting a streaming session gets the addresses of several OCA machines, not only one, and even if it selects only one machine, it will use parallel TCP connections in order to operate the diversity of the connections. That helps to have more robust connections between the terminal and the server.
In the machine itself, there are a lot of optimizations. This is full of open-source software, but they are very finely tuned with the operating system and even with the hardware like e.g. the network cards. Overall, with all these optimizations, it makes the system very robust and very effective regarding the quality of experience.
One thing that is very interesting about Netflix is its ability to leverage what is called “A/B testing” to find out which technologies should be activated, and which use case. Could you elaborate a bit on what A/B testing is about and how Netflix uses it?
Remember about 2016 when they started to re-encode every piece of content for the mobile market, they tried different codecs and also different settings, and they needed a new metric. That was VMAF: Video Multimethod Assessment Fusion , a user perception metric that has been built with machine learning and subjective tests campaigns.
They selected different features and the weight attached to each feature was done thanks to the result of those subjective tests . At the end of the day, they got a model capable of computing a VMAF score indicating the quality of user experience (QOE) comparing a reference video stream with a transcoded stream.
The VMAF sensor is everywhere in the encoding pipeline, and combining this VMAF score with other metrics attached to the adaptive streaming experience like, for example, related to the start time, the rebuffering/stale events is the basis of the A/B testing framework. A/B testing is about selecting statistically some users for experimenting a new feature and comparing the experience of these users with the experience from another similar set of users that do not use the feature. The framework is mostly deployed in AWS which is about metrics collection, processing and presentation.
Thank you very much, for this in-depth analysis. As we have seen, the Open Connect program is the cornerstone of Netflix’s strategy to provide a superior level of Quality of Experience, because it gives them control over the delivery. At Broadpeak, we think that all content providers who start to have some serious amount of traffic with the specific network operators should be able to access an equivalent type of service.
Therefore we have created the BroadCache Box that allows content owners to have their own managed caches deployed in the operator’s premises. This solution can be further enhanced with the cloud PVR, ad insertion or the nanoCDN multicast ABR capability, which adds to support for live channels and enhancements, something that Netflix does not provide. These new resources are deployed in the operators’ network or put at the service of the content providers.
They benefit from all the unique features that are available in Broadpeak CDN, including, as Netflix, the possibility of A/B testing options, policies and protocols in order to find out what is the best setup for their specific use case. And comprehensive analytics and a monitoring solution are available to have a deep inside of what is going on inside the system.
This interview was recorded on 21 November 2020, during Broadpeak Open House and updated on 28 February 2023.