Predictive VMAF Review
When you work in video, quality is always on your mind. It’s why a Demuxed presentation titled “measuring live video quality with minimal complexity” caught my eye [1]. The talk showcased an open-sourced x264 encoder by Quortex that uses features within the encoder as inputs to a machine learning model to predict the encoded video’s VMAF score. Not only does it allow for live video quality analysis, but Quortex claims that the inference time is 35x quicker than running VMAF [2].
As an aside, VMAF (Video Multi-Method Assessment Fusion) is a visual quality assessment tool, developed by Netflix, which has become an industry standard for measuring perceptual video quality. Scores range from 0-100, with 100 being the highest quality, that map linearly to humans’ perception opinions. For example, a VMAF score of 70 can be interpreted between “good” and “fair” by an average viewer on a 1080p display [4].
I came away excited with two main implications. First, this represented a significantly easier way of collecting VMAF information. It eliminated many of the complexities of running VMAF like scaling videos to match resolutions or dealing with misaligned frames. But second, it seemed like a significantly more time and cost-effective method to calculate scores. VMAF calculations are notoriously slow and if pVMAF promised to use less compute, then it would become feasible to calculate VMAF scores on larger swaths of video where previously cost would have been a major hindrance. With these in mind, I set out to implement a POC and try out the encoder during a Hudl innovation week.
Installation
I first began by compiling and testing the pVMAF encoder on my local machine. Quortex provides some detailed documentation about configuring the encoder that served as a helpful resource [5]. One important note, I wasn’t sure if the following configure options from the documentation were recommended or required, but it seems they were a requirement for properly encoding with --pvmaf
enabled and without them, I received failed assertions.
--bit-depth=8 --chroma-format=420 --disable-interlaced
Although pVMAF is meant to encode 1080p videos, I tested the encoder on 720p and 4k videos. The 720p videos processed as expected, with the predicted VMAF seemingly a bit inflated, while encoding 4k videos caused segmentation faults. This was only ever an issue with the --pvmaf
flag enabled and 4k videos encoded fine without the flag.
Compiling Into FFmpeg
Using the raw encoder was enough for a simple proof of concept, but I really wanted to test the encoder within the context of FFmpeg commands we use at Hudl.
Compiling the pVMAF encoder into FFmpeg was not very different from its x264 counterpart. In the FFmpeg build script I simply cloned the repository, navigated into it, ran configure with the options specified in the README, and finally make
. With the --enable-libx264
configure option set during the build of FFmpeg, the pVMAF encoder was linked properly.
However, I did find one bug that prevented successful compilation. The x264.h file within the repository used a FILE type, but the proper header wasn’t included. I was able to circumvent this by adding the header to the file during my build process with the following command:
sed -i '45i#include <stdio.h>' x264.h
I opened a GitHub issue on the repository and a maintainer promised to include this in an upcoming fix, but as of the time of writing this post, it doesn’t appear the code has been updated [6]. So for now, you may need to use this patching method.
The Review
Testing pVMAF against a production command proved to be a bit more inaccurate than I may have imagined. Comparing the predicted score to the actual, the predicted was 10+ points higher. This could have been due to a variety of factors including the encoding command options differing from what the model was trained on or the source videos, but unfortunately with innovation week coming to a close, I didn’t have enough time to do a deep dive into why. Even if the predicted scores weren’t precise, I still believe that there could be great value in collecting the scores as a means of following and marking trends, so long as they are only compared against other pVMAF scores.
The main consideration I had to make at the end of this project was deciding how valuable encode time VMAF scores are. Hudl uses encoders made by Visionular that provide improved visual quality at reduced bitrates [7]. In our case, the opportunity cost of not using the encoders provided by Visionular wasn’t worth the value pVMAF provided. If we had chosen libx264 as our encoder, implementing pVMAF and sampling the flag on encodes would have been an obvious choice.
Citations:
- https://www.youtube.com/watch?v=ZgM4KawB5tg
- https://github.com/quortex/x264-pVMAF?tab=readme-ov-file#inference-time-of-various-vqm-metrics
- https://github.com/Netflix/vmaf
- https://netflixtechblog.com/vmaf-the-journey-continues-44b51ee9ed12
- https://github.com/quortex/x264-pVMAF?tab=readme-ov-file#steps-to-configure-x264-with-pvmaf
- https://github.com/quortex/x264-pVMAF/issues/4
- https://www.linkedin.com/posts/visionular_how-hudl-does-video-compression-with-visionular-activity-7328803867872108544-Ag6Z