Mumbai: Kiran Bhat, co-founder and chief technology officer of Loom.ai, a company that hopes to revolutionize the world of communications with its artificial intelligence (AI) three-dimensional (3D) selfies, was previously head of the computer vision research and development team at US film and TV production firm Lucasfilm Ltd. He pioneered Industrial Light and Magic’s (ILM’s) facial capture system that was used to create Hulk in The Avengers, the turtles in Teenage Mutant Ninja Turtles and orcs in Warcraft. He won a sci-tech Oscar Award in 2017 for his work in facial performance-capture for the movie Rogue One. Bhat, a graduate from BITS Pilani, holds a PhD degree in robotics from the Carnegie Mellon University. In an interview in Mumbai, he spoke about his vision for the company and the trends that he expects in virtual reality (VR), augmented reality (AR) and mixed reality (MR). Edited excerpts:
Tell us in brief about your innovation and its unique proposition.
We are building the next-gen technology for people to communicate with each other virtually. And our secret sauce is to build 3D avatars of you from photographs—think of it as digital puppet versions of you. The trick is that they are alive—it’s not a scan. It is 3D. It can be customized. The idea is that bringing humans into VR is the only way to help VR take off. The reason: Producing content for VR is crazy-hard. 3D is hard to create. So if people become the content, and they use it as a medium to communicate, then it becomes a much more powerful medium that can drive innovation.
How did you come up with this idea of 3D selfies?
I was the head of computer vision and performance capture at Lucasfilms for almost 10 years. My co-founder Mahesh (Ramasubramanian, CEO), who studied with me at BITS Pilani, was in DreamWorks for almost 16 years. When we used to meet at conferences, we would exchange notes about where the industry is headed and where we stand. A couple of things happened when we met at the end of 2015. We realized that we had developed a bunch of characters that we had moved to the big screen and we knew the elements that you need to produce a compelling 3D experience for the audience. Mahesh, for instance, worked on Madagascar 3. So all these characters are very powerful in conveying an emotion. We wondered how we could bring some of that magic to the consumer and in a way that is not too hard for them to use. Mahesh was the visual effects (VFX) supervisor. Even in the VFX world, our role was to apply technology to animation.
My earlier work at ILM was to produce a performance capture pipeline that would analyse a human face and use computer vision algorithms to recreate that performance for the Hulk. The most recent work was in Rogue One, where we recreated a digital version of Peter Cushing (who died in 1994) in Star Wars as Grand Moff Tarkin. To create a human in a computer and, then, tell a story and still have a very iconic face like Cushing is a very tall order. The core technology that I used in the movie is what got me the Academy Award (for sci-tech) as well.
How did your employer react when you resigned?
The company realized that I wanted to do different things now. But the technology was in such a solid shape that they knew they could continue their work for many years to come. We (Kiran and Mahesh) still maintain a very good relationship with Lucasfilms and Disney.
But why did you leave? ILM was exciting enough…
I was somewhat frustrated with the level of machinery—cameras, etc.—(needed) to produce digital characters. This is perfectly fine for the big screen for a character like the Hulk. But it is a big overkill if consumers just want to capture their face. My interest was in bringing digital capture to the masses. That said, what we do at Loom.ai is no less challenging or any less interesting than what we did at ILM. The main difference is that it required a completely different technology stack to produce a digital human. That is what we are building. While we are focusing on messaging and VR—which does not have the kind of fidelity that you see on the big screen—over time, we will have a learning platform that will take away a lot of effort needed to do this.
How has your academic background—robotics to animation—helped you build your company?
As an undergrad, I was interested in robotics. My interest has always been in understanding nature’s movement—how robots balance, for instance. Our brains help us balance with the help of dynamic balance. This was not the case with robots for a very long time. I wanted to understand locomotion but then I became more interested in gauging how robots perceive (vision) the world. So I got into computer vision and started dabbling in vision algorithms, which led to animation. My thesis was that if you wanted something to look real—which is the main problem in computer graphics—you must look at nature. So you must try to make your simulations match nature by looking at videos and matching them. It is very hard to mimic nature completely. You can only have a good approximation of nature. So I was trying to come up with perceptually salient optimization algorithms that would make simulations feel like the real thing.
Tell us more about how the core technology works.
We have an app internally but the core technology requires you to upload your photo on the cloud and it produces a 3D, animated rig (think of it as facial musculage) which helps create realistic movements. Essentially, you can take a single photograph and create a 3D animated avatar. The 3D image will be accurate when you see it facing you. Perceptually, you need to recognize the face. You need to capture the eyes perfectly, following which you can capture proportions of the face. Algorithms pick and choose pronounced features from the photograph. The promise is that you can start training the 3D avatar and make it heed your commands—you can ask the live 3D avatar to look at you or follow your movements. You can also trigger emotions like anger in the 3D avatar with the touch sensors. The idea is to use this system to produce very compelling 3D content. This whole exercise takes just 90 seconds. Essentially, we have put together an AI engine to basically automate most of the functions like recognizing images and a production-grade facial rig that industries are already familiar with.
Where do you see the applications of this technology?
Let’s think of having a conversation two years from now—we could be doing it with VR goggles while communicating from different geographies. Your head motions will be captured by the VR headset which will track you. And your 3D avatar will make it a social conversation. We also see a lot of use in messaging—millennials send a lot of emojis (emoticons) but those are not personal. Imagine, then, that you could customize and program your 3D selfies but these are all fun applications.
Our long-term vision is visual teleportation. We want to enable meaningful long-distance conversations in VR. It will change the way we communicate. In four-five years’ time, I believe, we will have holographic versions of people. This is true 3D where you will see a hologram that will react to your movement. You could use it in online classrooms to have a meaningful interaction with their 3D teacher avatars.
How do you plan to commercialize this technology? What’s the business model?
We already have an API (application programming interface) that runs on the cloud and we are working on pilots with our partners—we will have some announcements soon. Our dream is to have this technology reach five billion people since the market is anyone with a smartphone that has a camera to create a 3D avatar. We will not be building apps for different sectors—we want to provide the technology that empowers a lot of different versions.
What trends do you see in this field?
AR, VR and MR are some of the most exciting fields in computer science in terms of both research and investment. There are a lot of interesting start-ups in this space. We are not in the content space. We build the core technology. But there are some real hardware challenges including weight, battery life and nausea that our community will have to address.