Bringing Greater Reality to VR

What They Say

Researchers at Stanford University have published a paper describing how it used a client (VR headset)/server system that fed details on the user’s gaze to the server to allow frame-based foveated rendering of the video.

The team’s “gaze-contingent” foveated compression and streaming system has a total latency—from eye movement to generated image—of about 14 milliseconds. Current VR systems have an end-to-end latency between 45 and 81 milliseconds, the article said. And this doesn’t even take into consideration the time it takes to transmit data to the network. The team’s approach includes time for eye-tracking (~1.5 ms), encoding and decoding video (~5 ms), HDMI video output (~3.5 ms), and physical transition of the LCD (~4 ms), but similarly not time to transmit data via a network.

The prototype employs a two-stream approach in which two versions of the current frame get compressed. The first frame, the a less-precise area outside the fovea, is compressed at significantly lower resolution. The second frame remains at full resolution but gets cropped to a small area nearest the gaze location.

The server then encodes both frames and sends them back to the client where the process is reversed. The client upscales the lower-resolution background frame to display size. It then decodes the second high-resolution foveated frame precisely within the viewer’s gaze. The two images are overlaid and blended seamlessly to produce a single foveated frame.

The server then encodes both frames and sends them back to the client where the process is reversed. The client upscales the lower-resolution background frame to display size. It then decodes the second high-resolution foveated frame precisely within the viewer’s gaze. The two images are overlaid and blended seamlessly to produce a single foveated frame.

Original Article: Luke Hsiao, Brooke Krajancich, Philip Levis, Gordon Wetzstein, and Keith Winstein. 2022. Towards retina-quality VR video streaming: 15ms could save you 80% of your bandwidth. SIGCOMM Comput. Commun. Rev. 52, 1 (January 2022), 10–19.

What We Think

As we have discussed regularly, latency in passthrough mixed reality is a big deal. Not adopting foveated rendering would be a strange decision, I think, although it requires more processing and we already know that the shear level of processing needed in HMDs is already a challenge. I have heard that one of the reasons for gaze not catching on in notebooks was that the power used to perform the recognition process was too high, reducing battery life. (BR)

foveated video