Share this article

Latest news

With KB5043178 to Release Preview Channel, Microsoft advises Windows 11 users to plug in when the battery is low

Copilot in Outlook will generate personalized themes for you to customize the app

Microsoft will raise the price of its 365 Suite to include AI capabilities

Death Stranding Director’s Cut is now Xbox X|S at a huge discount

Outlook will let users create custom account icons so they can tell their accounts apart easier

VLOGGER AI: Now create a lifelike avatar from a photo & use your voice to control it

And, it looks kind a real

3 min. read

Published onMarch 18, 2024

published onMarch 18, 2024

Share this article

Read our disclosure page to find out how can you help Windows Report sustain the editorial teamRead more

Researchers at Google are working on making its AI technology smarter every day. One of the latest research projects that they are working on is VLOGGER.

VLOGGER is explained as

A method for text and audio driven talking human video generation from a single input image of a person, which builds on the success of recent generative diffusion models.

Our method consists of 1) a stochastic human-to-3d-motion diffusion model, and 2) a novel diffusion based architecture that augments text-to-image models with both temporal and spatial controls.

According to theblog post onGitHub, this approach allows for the generation of high-quality videos of desirable length. Without the need for training for every person, it doesn’t depend on face detection cropping; it can generate the complete image and take a broad spectrum into account, which is important for synthesizing humans who communicate.

As of now, VLOGGER is not available for use as it is still under development, but when it hits the market, it could be a great way to communicate in a video conference on Skype, Teams, or Slack.

Google is quite confident in the project and has tested it through various benchmarks; here is what it says:

We evaluate VLOGGER on three different benchmarks and show that the proposed model surpasses other state-of-the-art methods in image quality, identity preservation and temporal consistency. We collect a new and diverse dataset MENTOR one order of magnitude bigger than previous ones (2,200 hours and 800,000 identities, and a test set of 120 hours and 4,000 identities) on which we train and ablate our main technical contributions. We report the performance of VLOGGER with respect to multiple diversity metrics, showing that our architectural choices benefit training a fair and unbiased model at scale.

How does VLOGGER work?

How does VLOGGER work?

VLOGGER is a framework that acts as a two-stage pipeline and works on stochastic diffusion architecture that powers text to image, video & even 3D models but also adds a control mechanism.

The first step is to take an audio waveform as an input to generate intermediate body motion controls like gaze and facial expressions. Next, the second network uses a temporal image-to-image translation model to identify these movements and a reference image of a person to generate frames for the video.

Another important feature of VLOGGER is the ability to edit existing videos. It can take a video and change the expression of the person in the video.

Furthermore, the VLOGGER can also help in video translation; it can take an existing video in a specific language and edit the lips & face area to make it consistent with new audio or different languages.

As VLOGGER is not a product but a research project and is not complete, it can’t be relied upon. Yes, it can create realistic-like motion, but it may not be able to match how a person really moves. Given its diffusion model, it could show unusual behavior.

Its team also mentioned that it is not versed in large motions or diverse environments and can only work for short videos.

According to the features mentioned on the GitHub page, it can help create talking and moving virtual avatars, chatbots, and more.

Do you think it could be helpful for you? Share your opinions with our readers in the comments section below

More about the topics:Google

Srishti Sisodia

Windows Software Expert

Srishti Sisodia is an electronics engineer and writer with a passion for technology. She has extensive experience exploring the latest technological advancements and sharing her insights through informative blogs.

Her diverse interests bring a unique perspective to her work, and she approaches everything with commitment, enthusiasm, and a willingness to learn. That’s why she’s part of Windows Report’s Reviewers team, always willing to share the real-life experience with any software or hardware product. She’s also specialized in Azure, cloud computing, and AI.

User forum

0 messages

Sort by:LatestOldestMost Votes

Comment*

Name*

Email*

Commenting as.Not you?

Save information for future comments

Comment

Δ

Srishti Sisodia

Windows Software Expert

She is an electronics engineer and writer with a passion for technology. Srishti is specialized in Azure, cloud computing, and AI.