Share this article

Latest news

With KB5043178 to Release Preview Channel, Microsoft advises Windows 11 users to plug in when the battery is low

Copilot in Outlook will generate personalized themes for you to customize the app

Microsoft will raise the price of its 365 Suite to include AI capabilities

Death Stranding Director’s Cut is now Xbox X|S at a huge discount

Outlook will let users create custom account icons so they can tell their accounts apart easier

Visual language models might soon use LLMs to improve prompt learning

Soon, VLMs might learn how to recognize and use the data from our prompts to generate better visuals

2 min. read

Updated onFebruary 29, 2024

updated onFebruary 29, 2024

Share this article

Read our disclosure page to find out how can you help Windows Report sustain the editorial teamRead more

Ai can create visual content from our prompts. However, the result is not always accurate, mainly if we use free visual language models (VLMs). Moreover, when we try to use free VLMs for intricate details, they fail to produce high-quality results. Thus, there is a need for visual language models who can generate better-quality content. For example, we haveSora AI, which is excellent at creating visuals that aChinese firmalready wants to use.

How will the LLMs improve the visual language models?

How will the LLMs improve the visual language models?

According to aMicrosoft Research Blog, researchers are trying to find a way to use large language models (LLMs) to generate structured graphs for the visual language models. So, to do this, they ask the AI questions, restructure the information, and generate structured graphs afterward. Furthermore, the process needs a bit of organization. After all, the graphs need to feature the entity, its attributes, and the relationship between them.

To understand the process better, think about a specific animal. Then, ask the AI to provide descriptions based on questions related to the animal. Then, you will have more information about the animal you thought of. Afterward, ask the AI to restructure and categorize your information.

After getting the results, researchers implemented Hierarchical Prompt Tuning (HTP), a framework that organizes content. With it, the visual language models learn to discern different data, such as specific details, categories, and themes from a prompt. Furthermore, this method improves the capability of the VLMs to understand and process various queries.

When the last step is over, the visual language models will be able to generate more accurate images based on your prompts. Additionally, the next time you need to analyze an image, you could use the VLM to create descriptions for it in return.

In a nutshell, the main goal of the research is to use an LLM to teach a visual language model how to understand the details from a prompt to generate more accurate and realistic pictures. Meanwhile, the second goal is to teach the VLM to identify the elements from a picture and create descriptions.

If you want to learn more about the research, check theirGitHub page.

What are your thoughts? Are you excited about this research? Let us know in the comments.

More about the topics:AI,artificial intelligence,microsoft

Sebastian Filipoiu

Sebastian is a content writer with a desire to learn everything new about AI and gaming. So, he spends his time writing prompts on various LLMs to understand them better. Additionally, Sebastian has experience fixing performance-related problems in video games and knows his way around Windows. Also, he is interested in anything related to quantum technology and becomes a research freak when he wants to learn more.

User forum

0 messages

Sort by:LatestOldestMost Votes

Comment*

Name*

Email*

Commenting as.Not you?

Save information for future comments

Comment

Δ

Sebastian Filipoiu

Sebastian is a content writer with a desire to learn everything new about AI and gaming.