Back
Technology

Google Unveils Gemini Omni Multimodal Video Generation Model at I/O Conference

View source

Google announces Gemini Omni, a new family of multimodal AI models designed to generate and edit video content from text, images, audio, and existing video.

Capabilities and Functionality

According to Google, the Gemini Omni model can process combined inputs to produce consistent outputs. CEO Sundar Pichai stated that the system is designed to "create anything from any input." The model is intended to understand physical forces such as gravity and fluid dynamics and incorporates real-world knowledge from Google's existing Gemini AI system.

Key features demonstrated include:

  • Altering the environment of a selfie video, such as placing the user on a different planet or in a forest.
  • Generating educational content, such as claymation-style explanations of scientific concepts.
  • Creating digital avatars using a user's own voice.
  • Editing photos using plain text commands.

Google noted that the model is designed to maintain consistency across sequential edits. The company also cautioned that editing prompts must be specific to avoid unintended alterations to the output.

Availability and Access

The first version of the model, named Gemini Omni Flash, is available today to subscribers of Google AI Plus, Pro, and Ultra tiers. It is being rolled out to YouTube Shorts, the Google Flow platform, and the YouTube Create App this week. The avatar tool is specifically available on YouTube Shorts.

"The avatar tool is specifically available on YouTube Shorts."

Google stated that an application programming interface (API) for Omni will be released in the coming weeks. A more advanced Omni Pro model has not yet been released.

The current version can generate up to 10 seconds of video, with longer durations reportedly planned for future updates.

Watermarking and Safety

All videos generated with Gemini Omni will include Google's SynthID digital watermark to verify their origin. Google acknowledged the potential for misuse, including the creation of deepfakes. The company stated it has policies and a product onboarding process in place to prevent harmful use, particularly regarding the creation of digital avatars.

Comparison to Previous Models

Unlike Google's previous video generator, Veo 3.1, which was limited to processing prompts and images, Gemini Omni accepts a wider range of input types. The company frames Omni as a tool for adding fictional elements to personal photos and videos, which it states is intended to help avoid legal issues that have affected other video generation tools.