The terms “scene referred” and “display referred” have come up a lot lately in discussions about HDR workflows and the transforms needed to capture and display this wider dynamic range content. A paper by Tim Borer and Andrew Cotton from the BBC Research & Development team at the SMPTE fall conference shed more light on what these terms mean.
In his presentation, Borer said that the definitions and meaning of scene referred and display referred remains an area of debate, but they would attempt to provide their opinion on the topic. Borer thinks that “scene referred” typically is best suited for real time television and “display referred” works fine for non-real time or file-based workflows.
Modern digital camera-to-display systems are now characterized by an Opto-Electric Transfer Function (OETF) or camera gamma, Electro-Optical Transfer Function (EOTF) or display gamma and an Optical-to-Optical Transfer Function (OOTF) or system gamma. The system gamma is the concatenation of the OETF and the EOTF and is non-linear to minimize banding from quantization errors and to reproduce the director’s or rendering intent. The OETF maps scene luminance to digital code values, while the EOTF maps digital code vales to display luminance.
Borer defines the difference between “scene referred” and “display referred” according to where the rendering intent is implemented. In scene referred for example, the rendering intent is implemented in the EOTF at the display after the application of the inverse OETF. The rendering intent stage is needed to account for the psychovisual effects of viewing a display in a dim environment.
In display referred content, the rendering intent is decided at the camera, then the inverse EOTF is applied. At the display, the EOTF then undoes the non-linearity introduced by the inverse EOTF and recreates a perceptually accurate image. In theory, both should work identically and produce the same non-linear system gamma or rendering intent. These are shown schematically below.
Borer makes the point that with a display referred solution, the inverse EOTF must be input into the camera so a reference final display must be known. That can work well for movie production, but not so well for other forms of content consumption as the display brightness and viewing characteristics can be quite variable. To have proper images on these other display devices requires knowledge of the display characteristics and local re-rendering. This is what the Dolby Vision solution does.
In scene referred solutions, the rendering intent is applied at the display, not the camera. The BBC/NHK Hybrid Log Gamma solution for HDR capture and display is a “scene referred” solution, which Borer believes is simpler to implement and meets the needs of broadcasters.