Imagine speaking to someone across the world and seeing their facial expressions in real-time, as if they were sitting right in front of you. That’s what researchers at the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) are promising with their latest virtual reality innovations.
During a demonstration to Khaleej Times at the university’s Data Observatory, Hao Li, Professor of Computer Vision and Director of MBZUAI’s Metaverse Centre, and PhD student Ariana Bermudez showcased two technologies developed at the university — Voodoo XP and XMem++ — which offer new possibilities for virtual communication and digital interaction.
Voodoo XP allows for realistic, real-time facial reenactment using just a single photo. “We’re doing this live,” said Professor Li, as he controlled a digital avatar that mirrored his every movement. “What you’re seeing is me controlling the avatar in real time, with no special hardware — just an ordinary camera.” The technology aims to bring people together in virtual spaces without the constraints of physical presence.
Bermudez demonstrated how Voodoo XP simplifies what has traditionally been an extremely complex process. She compared it to Meta’s Codec Avatar system, which requires an elaborate setup of 171 cameras and hours of training to create a 3D model. “But with our version, you just need one webcam, and it creates an avatar in seconds,” she explained.
The system captures fine movements and expressions in real time, creating avatars that can be instantly animated inside virtual environments. Bermudez described how accessible the technology is, requiring no complicated equipment or data. “Even small movements like blinking or smiling are captured,” she added. The researchers behind Voodoo XP include Phong Tran (MBZUAI student), Egor Zakharov (ETH Zurich), Long-Nhat Ho (MBZUAI student), Anh Tuan Tran (VinAI Research), Liwen Hu (Pinscreen), and Professor Li.
Bermudez also showed XMem++, an enhanced video object segmentation method. It improves memory efficiency and segmentation accuracy by introducing refined memory management strategies and lightweight attention mechanisms. Designed for long video sequences, XMem++ balances real-time performance with high-quality mask propagation, making it suitable for applications like video editing, augmented reality, and autonomous systems.
“This is something that is tedious for many VFX artists, like the ones that do special effects for movies, because usually they have to refine a lot the details,” she explained. “The tool is very complete in the sense that you can stop the generation, and do the fixes, and propagate. So, it will adjust accordingly.”
The technology has already been adopted by the visual effects community. “When it was launched in 2023, the community immediately started using it in Nuke,” she said, referring to the industry-standard compositing software. “So, here is Nuke and this is a tool used by the VFX artists and they incorporated this into the software and basically helps them to do this kind of effects like making someone disappear… This is how this user is using our tool,” she said while showcasing how it works on her cell phone. XMem++ is open source and freely available; “it has all these features that they can use about tracking, refining,” she added. “As you see that the user is going back and forth. He’s selecting the person, and then it just propagates. Then he gets the masks, and he can do the effects of the tracking.”
In addition to Li and Bermudez, XMem++ was developed by a team of researchers including Maksym Bekuzarov (MBZUAI Alumni) and Joon-Young Lee (Adobe).
ALSO READ:
- UAE: Learn AI from industry leaders, experts not just instructors at MBZUAI
- UAE: How learning coding helped this mum connect with her kids