ByteDance Unveils AI Tool Transforming Photos into Lifelike Videos
ByteDance, the parent company of TikTok, has introduced OmniHuman-1, an innovative AI system capable of producing realistic videos of individuals talking, gesturing, singing, and playing instruments, all from just a single photo. According to a research paper published on arXiv, OmniHuman surpasses existing methods by generating highly authentic human videos, particularly focusing on audio inputs. The tool accommodates images of any aspect ratio, whether portraits, half-body, or full-body, resulting in lifelike and top-quality outcomes across various scenarios.
Researchers have showcased the capabilities of OmniHuman-1 through sample videos, illustrating hand and body movements from different perspectives, animated characters, animals, and even historical figures brought back to life. For instance, a video featuring Albert Einstein speaking in front of a blackboard with hand gestures and facial expressions appears remarkably realistic, as if the renowned physicist were delivering a lecture today.
Freddy Tran Nager, a clinical associate professor at the University of Southern California, expressed admiration for the sample videos, highlighting their impressiveness, especially on smaller screens like phones. The tool's introduction places ByteDance and TikTok in the competitive landscape of creating the most authentic AI-generated human footage, which is increasingly prevalent in various domains, from virtual influencers to faux versions of celebrities.
Nager envisions potential educational applications for tools like OmniHuman, suggesting scenarios where historical figures or celebrities could virtually teach subjects like statistics. Additionally, he speculates on the tool's use by content creators seeking a break by employing virtual versions of themselves. Samantha G. Wolfe, an adjunct professor at NYU, acknowledges the technological fascination of creating lifelike videos from images but also raises concerns about potential negative implications, such as misinformation spread by AI-generated content.
As AI-generated videos become more sophisticated, the risks associated with misinformation and deception also escalate, warns Wolfe. ByteDance's team trained OmniHuman on an extensive dataset of over 18,700 hours of human video data, incorporating various inputs like text, audio, and physical poses. While not providing specific details on the training data, ByteDance's approach sets OmniHuman apart, according to Nager, due to the vast amount of training data accessible, potentially including data from TikTok users.
ByteDance unveils OmniHuman-1, an AI tool creating lifelike videos from a single photo.
The tool surpasses existing methods by focusing on audio inputs and supports images of any aspect ratio.
Researchers showcase the tool's capabilities through sample videos, including bringing historical figures back to life.
Source: FORBES