UCSD and Meta AI Work Up Video Frame Interpolation for VR and AR Headsets

Computer scientists at the University of California, San Diego (UCSD) and Meta AI have developed a new video frame interpolation method that achieves smoother slow-motion video progression using something called Flow-Agnostic Video Representations for Fast Frame Interpolation (FLAVR). One of the possible uses for this technology is in virtual reality (VR) and augmented reality (AR) headsets. FLAVR’s ability to create smooth, fluid video by interpolating additional animation frames between existing ones could greatly enhance the visual experience for users of VR and AR headsets.

Axe vs Sprite can, the progression captured in extreme slo-motion video using a new video frame interpolation framework called FLAVR (Source: UCSD)

In these immersive environments, video frames must keep up with the user’s movements to maintain the illusion of existing within a virtual or augmented world. Traditional video frame interpolation methods may struggle to handle non-linear motion patterns, resulting in occlusions that cause visual glitches and disrupt the user’s experience. FLAVR’s 3D space-time convolutions and machine learning approach can better account for non-linear motions in the video, thus minimizing temporal-spatial glitches and providing a more seamless experience for VR and AR headset users.

In the past, one area Meta has been researching is foveated rendering, which improves performance in VR headsets by rendering only the area where the user’s eye is focused in high resolution while rendering the peripheral regions at lower resolutions. This approach could be related to interpolation techniques in the sense that it aims to optimize the visuals displayed on VR headsets, but it is not directly related to video frame interpolation.

Traditional Interpolation1. Optical flow estimation: Algorithms like Lucas-Kanade, Farnebäck, or deep learning-based methods estimate motion vectors between frames.
2. Frame warping: Optical flow is used to warp input frames towards the desired in-between frame, creating two intermediate frames.
3. Blending: The two intermediate frames are combined, often by taking a weighted average, to create the final interpolated frame.
FLAVR1. End-to-end trainable architecture: FLAVR is designed as an end-to-end trainable deep learning model that learns to interpolate frames directly from input video data.
2. 3D space-time convolutions: These convolutions model the temporal-spatial relationships between video frames, enabling the model to learn complex motion patterns and handle occlusions better.
3. Flow-agnostic approach: FLAVR learns motion implicitly from training data, generalizing better to unseen motion patterns.

Traditional interpolation and FLAVR can complement displays with higher refresh rates and faster response times by generating additional frames, which leads to a smoother and more visually appealing experience. However, it’s important to note that these interpolation techniques do not directly affect the display’s refresh rate or response time; they work on the video content itself to create a more fluid motion.

The refresh rate of a display is the number of times per second the screen updates its image. A higher refresh rate can provide a smoother visual experience, as it reduces motion blur and screen tearing. When traditional interpolation or FLAVR is used to increase the frame rate of a video, it can better align with the higher refresh rates of modern displays, resulting in a smoother and more fluid viewing experience.

The response time of a display is the time it takes for a pixel to change from one color to another. Faster response times help to reduce ghosting and motion blur. While video frame interpolation techniques like traditional interpolation and FLAVR do not directly impact response times, they can enhance the perceived smoothness of the video content by adding extra frames. This can make the display’s performance appear more fluid, especially when combined with a display that has fast response times.

The work on FLAVR was done by Manmohan Chandraker, associate professor of computer science and engineering (CSE), affiliated with the Center for Visual Computing at the UC San Diego Jacobs School of Engineering, in collaboration with CSE PhD student and first author Tarun Kalluri and research scientists Du Tran and Deepak Pathak from Meta AI (formerly Facebook AI Research). The code for this project is open-sourced and available at