subscribe

VEFXi Shows 2D to 3D Conversion Using a Neural Net

A Display Daily colleague, Bob Raikes, visited the VEFXi booth at Mobile World Congress (MWC) recently to see a demonstration of their new 3rd generation 2D to 3D ASIC introduced last month. In the brief demo possible at MWC, the images looked good but Bob wasn’t ready to render a final judgement on the company. After MWC was over, he passed the company on to me to look deeper into their technology. The technology appears to have broad application, including to VR headsets.

I got in touch with Craig Peterson, President, CEO and Founder of VEFXi. The company name VEFXi is derived from “Video Effects.” The fabless semiconductor company was founded in December, 2010 and is headquartered in Portland Oregon. Peterson has 37 years of industry experience, including 29 years at Intel. Since retiring from Intel in 2003, he has been a busy man, including work at six high-tech startups. This includes founding two startups in Shanghai and more recently founding VEFXi. In 2012 VEFXi was invited to join the International 3D Society (I3DS) and Peterson was asked to join the board of governors. Now known as the I3DS & Advanced Imaging Society or just the Advanced Imaging Society, the group has gone beyond just 3D to include other cutting-edge television and display technologies such as high dynamic range (HDR), virtual reality (VR), high frame rate (HFR), and 4K.

bio craig peterson technology ceo TweakedCraig Peterson, President, CEO and Founder of VEFXi

I had a chance to talk to Peterson on the phone and he gave me a good briefing on VEFXi’s technology. Their current product, as introduced publicly at MWC, is something they have been working on for about 2½ years. Instead of a conventional processor/software design, it uses what Peterson calls “Application-specific Neural Net [ASNN] technology for live video processing.” Peterson called the individual nodes or operators in the neural net “synapses.”

Rather than a conventional processor with its memory fetch for both data and instructions, the ASIC uses data flow technology, also known as neural net, where the video data moves automatically from one step to the next. There is no need to fetch data from memory or cache because the data arrives automatically from the previous step. There is also no need to fetch instructions because each synapse in the neural net perform the same operation repetitively. This operation is similar to what a human neuron does: a multiply and add.

The system is not “programmed” by what Peterson calls “an army of software engineers.” Instead, the neural net is trained by putting in sample images and looking at the 3D results. Training consists of adjusting the parameters for the individual synapses in the neural net associated with the add and multiply functions.

Peterson declined to say exactly how many synapses his ASIC used, although he used 1000 nodes as an example. He said that the clock speed was roughly 1/10 the clock speed of a conventional GPU or CPU. With 1000 nodes, even with the low clock speed, his ASIC could perform roughly 100x the number of operations that the GPU or CPU could perform in the same time. These operations are, of course, very simple, even compared to the operations a GPU can perform. A major added benefit of the VEFXi design is the low power consumption, roughly 1/100 the power consumption of a conventional processor. Because of this low power consumption, the VEFXi ASIC could not only be used in television but also in battery powered mobile devices such as smartphones, tablets and VR headsets.

According to Peterson, VEFXi initially used a general purpose neural net design to develop its technology but switched to a neural net design specifically optimized for 3D image processing. This not only increases the stability of the process but also reduces the number of synapses needed in the neural net. Hence the term “Application Specific Neural Net.”

The 2D to 3D conversion by the VEFXi ASIC is purely intra-frame. In particular, it does not use motion data between frames to help generate the 3D depth map. This intra-frame design allows for very short latency in the processing. Peterson said latency is about ½ the frame time, even at 120Hz frame rates. He compared this to the multi-frame latency of other 2D to 3D conversion systems, typically on the order of 300mS. For gaming, VR and other interactive applications, a 300mS latency is simply not acceptable.

Peterson said the ASNN does not just do 2D to 3D conversion. It will also do stereoscopic 3D to multi-view 3D with the same ASNN. New parameters for all the synapses are all that need to be loaded for this second application.

VEFXi already sells set-top boxes called 3D Bee into the consumer 3D market, These boxes are based on 1st or 2nd generation VEFXi data flow technology. For the 3rd generation, as represented by the ASNN technology shown at MWC, Peterson said, “we embedded the application-specific neural net technology in our pixel depth synthesis, and we innovated advanced no-glasses 3D optical mathematics models in our render engine to support the extreme 3D with low artifact rendering.”

The ASNN is designed to take as input uncompressed 2D video, which is the output by the system host video processor, and then output uncompressed 3D video in the same format. According to Peterson, the VEFXi ASNN can handle all of the common uncompressed formats currently in used in the industry, including LVDS, MIPI, Embedded Display Port (EDP) and V by One HS from THine Electronics. He said LVDS was commonly used in “low resolution” systems such as 1080p TVs but the other systems were gaining ground in the higher resolution mobile market. V by One HS is used increasingly in 4K TVs. Peterson said the ASNN acts as a video cable with the same input and output formats making it basically transparent to the system designer. Since the ASNN deals only with the uncompressed video ready for display, it is independent of the original compressed consumer video format.

When asked about what cues his technology looks at to generate the depth map in the 2D to 3D conversion, Peterson said he was not prepared to talk about this now. He did say that his system used many more cues than are commonly considered by processor-based 2D to 3D conversion systems. Of course, since it is an intra-frame system, motion is not one of the cues taken into consideration.

Peterson emphasized the ability of his technology to produce out-of-screen effects (“3D Popout”), with objects in the 3D image coming 75% of the way from the display to the viewer. He said other 2D to 3D systems typically produced only relatively shallow depth of the 3D image and effects are typically all behind the screen.

Peterson sees a number of applications for his company’s technology. First, is 2D to 3D conversion for 3D TV sets, including 4K sets. The second application is professional-quality 2D to 3D conversion. Peterson says they can equal the quality from $25,000 professional conversion boxes. VEFXi has got 2D video clips from studios, converted them to 3D and returned them to the studios. Peterson says VEFXi has received very positive feedback from the studios on the quality of the conversion.

Another possible application is in the mobile smartphone and tablet market, where glasses-based 3D is probably not usable. The low power consumption and the ability to convert input 2D or 3D content to multi-view 3D will both be important properties for this market.

VR, perhaps, will be the key market for this technology, as I see it. One issue with 3D VR is the extreme difficulty in generating content corresponding to live-action scenes. You can’t simply put two VR camera systems side-by-side to generate stereoscopic views because interpupillary distance will vary with angle and when the viewer is looking to the side, he will see the second VR camera, not the desired scene. Peterson says it takes more complex acquisition systems and considerable image stitching to generate native 3D 360° VR.

On the other hand, if you acquire and produce the VR content in 2D and then convert to 3D in the VR headset, this problem is eliminated. When the viewer moves his head to look to the side, the sensors in the headset sense this motion, the VR video processor shifts the 2D view appropriately and the headset converts it to 3D with a low enough latency to not be noticeable to the viewer. For this to work, you must have a 2D to 3D converter that produces very good results on all types of content with very low latency. These are all properties the VEFXi ASNN is said to have.

VEFXi is planning on selling the ASNN as an OEM product and not a consumer product like the 3D Bee. Peterson said the company had a very successful MWC. “We came back from Mobile World Congress a week ago with 34 new companies that want to look at incorporating our microchip into their designs. Several of those companies are large companies. A number of them are VR companies. We also have engagements with other large companies that are under NDA.” He added that the company has already signed MOU’s that could result in as many as 7.2 million chips sold in the 2017 – 2018 time frame.

Peterson says VEFXi has 15 patent applications in the pipeline, two of which have been published. The two published applications are US 2015/0116458 and US 2015/0116457, both listing Javed Sabir Barkatullah as sole inventor.

I also found one additional published application, US 2015/0371450, titled “Real – time stereo 3D and autostereoscopic 3D video and image editing.” This application shows Peterson as the sole inventor. The abstract is brief and reads in full: “A system that provides control over three-dimensional image content.” The claims are equally brief, in fact there is only one claim. It reads in full:

“1. A method for controlling a three dimensional image comprising:

“(a) receiving a two dimensional input image;

“(b) processing said two dimensional input image to determine a depth map;

“(c) generating a three dimensional image based upon said two dimensional input image and said depth map.”

This is an extremely broad claim and I suspect that other companies in the 3D conversion industry are likely to object to it, saying their technology and patents represent prior art to this claim. For that matter, I believe either of the Barkatullah applications, with their earlier application dates, would be prior art invalidating this claim. However, I’ll let the Patent Office, VEFXi and VEFXi’s competitors sort this out.

Unfortunately, I was not at MWC and have not been able to see the VEFXi demo. 2D to 3D conversion and Stereoscopic 3D to multi-view 3D are very visual things and it’s the visible results that count, not a marketing discussion of technology. While it appears that VEFXi has a very promising approach to these conversions, I won’t commit myself until I actually get a chance to see a demo that involves a very broad range of content types as input. Peterson said if they do an East coast demo, he will be sure to let me know. –Matthew Brennesholtz