Well, they are closer than I thought anyway. In talking to possible presenters for the upcoming Light Field and Holographic Display Summit, I was speaking with Digital Domain’s John Canning, who pointed me to a TED Talk (Digital Humans that Look Just Like Us). You might want to view the full talk (12.35 minutes) which shows a real human (Doug Roble) driving a digital version of himself (DigiDoug) with a high level of fidelity.
Such fidelity has been routinely achieved by special effects houses like Digital Domain for movies, but the real revelation for me was that this can now be done in real time with almost no latency!
This capability is the result of multiple technologies developed over many years. It starts with a volumetric video capture of Doug’s face and body. While Doug was scanned about a year and half ago in USC’s lightstage, volumetric capture stages are now moving out of the R&D labs and Universities into commercial installations.
For example, Microsoft has developed mixed reality volume capture technology that uses 60 to 100 visible cameras arranged in a hemisphere around a subject. There is also an array of IR lasers that capture point cloud data useful in creating the 3D model of the subject. Data capture with 106 cameras is output at 600 GB/minute, but because of the inherent redundancy of the images, the data can be highly compressed.
Next, the video and point cloud data can be input into a game engine like Unity or Unreal where the 3D model can be built and textured. Since the market today for such characters is mobile AR and VR headsets, this very rich data set is dramatically scaled down 2D or stereo-pair views of the 3D model. VR headsets typically are modeled as 20K polygons with 2K textures while mobiles devices model characters with 10K polygons and 1K textures. Once we have real light field displays, we will want to see the whole model with as much fidelity as the display can produce. But we aren’t there yet, which is why we are organizing this version of Display Summit.
To create DigiDoug, much emphasis was placed on his face as this is the most important thing to get right if you are too fool someone into thinking you are looking at a real person. This requires the capture of all kinds of facial expressions under many lighting conditions to understand how Doug’s skin, pores, eyebrows, etc. react to light from various directions. They even captured IR images to see how blood flow changed with various expressions. The result is shown in the image below.
The next step was to get the face and body to move naturally, which is where a deep learning neural network came into play. They started with conventional motion capture technology with a lot of markers on Doug’s face to capture point clouds. This neural network can now figure out how all the skin moves including wrinkles, blood flow and even his eyelashes. This can be calculated and rendered with very high fidelity in 16 ms – essentially, real time. This TED talk, in April 2019, was the first time Digital Domain had shown such a capability outside of their lab.
But, a different character can be rendered just as easily with the actions, expressions and performance still being driven by Doug in real time. Elbor is the name of the elfish character that was created with the touch of a button.
Clearly, such technology is great for story telling but it can have a dark side as well. Deep fakes are doctored videos that insert characters that were not originally there. Digital Human technology enables this but now at an even higher level of fidelity with a real 3D model.
What the TED talk demonstrated was a human driving a very lifelike digital avatar in real time. But that is not the same as an autonomous avatar that has no human driver. This will take a next big leap in AI and neural network processing, but it will allow virtual assistants to have a face and body; movie stars to deliver performance they never actually acted; or sports or entertainment figures to come alive on stage.
Today, characters are coming to life on stage and being called “holograms”. These are not holograms but simply 2D videos played back in a Peppers Ghost configuration (basically a big beam splitter to project the image on the stage). But light field displays are in development that will allow the rendering of DigiDoug in all its volumetric glory with all of the nuances described above. At that point it will indeed be extremely hard to tell the real thing from the model. (CC)