Google’s Omni Model Promises ‘Anything-to-Anything’ Generation, But Still Struggles With the Details

Table of Contents
The Ambition of Anything-to-Anything
At Google I/O 2026, the company unveiled its most ambitious leap in multimodal AI yet: Omni. Positioned as an “anything-to-anything” model, Omni is designed to eventually bridge the gap between every form of digital input—text, image, audio, and video—allowing users to transform one into another with fluid precision. While the full scope of this vision remains a roadmap, the first tangible iteration, Omni Flash, has arrived within Google’s Flow platform.
For those familiar with Veo, Google’s previous foray into high-end video generation, Omni represents a strategic shift. Rather than relying solely on text-to-video prompts, Omni allows users to upload existing video as a foundational reference. The goal is better character consistency and a more nuanced understanding of real-world physics, addressing the “hallucinations” that often plague generative video.
The Consistency Gap
Testing Omni’s ability to maintain character stability reveals a mixed reality. In a series of experiments involving a stuffed animal “vacationing” in various locales, Omni demonstrated a marked improvement over Veo in terms of prompt adherence. However, the “AI jump scares”—abrupt shifts in orientation or unexpected physical mutations—persist.
The model’s struggle with object permanence is particularly evident when narrative complexity increases. In one sequence where a character packs a jar of honey for a trip, the object morphed unpredictably, transitioning from a traditional jar to a clear squirt bottle and back again within the same scene. While the AI managed to execute a coherent narrative beat—having the character mistake the honey for sunscreen—the visual execution remained erratic.
Editing is another area where Omni shows progress, though not total mastery. The ability to use text-based prompts to alter a generated scene is more responsive than in previous versions. Yet, this precision is often offset by contradictory results; for instance, prompting the AI to remove a specific artifact from a character’s head occasionally resulted in the model adding that same artifact to every other scene in the sequence.
The Uncanny Valley of Personal Deepfakes
Where Omni becomes truly disruptive is in its application to real-world footage. One of the model’s primary selling points is the ability to seamlessly integrate AI-generated elements into existing video. When used to create deepfakes of a human subject—placing them in settings like an airplane or in front of the Eiffel Tower—the results are startlingly convincing.
While technical tells remain—such as synthetic audio artifacts and repeating background characters—the visual fidelity is high enough to deceive those closest to the subject. In a blind test, a spouse was unable to distinguish between a real video and an Omni-generated clip of the subject eating pasta, noting only that the bowl looked unfamiliar. This level of realism suggests that Omni is moving past the “cartoonish” phase of generative AI and entering a territory where social media deception becomes trivial.
The Cost of Precision
High-fidelity generation comes with a significant credit cost. Under the $20-per-month AI Pro plan, users are allotted 1,000 credits monthly. However, the “ingredients” of a video—length and input complexity—can drain these quickly, with single clips costing between 15 and 40 credits. Edits are billed at a flat 40 credits per round.
For professional creators or those with a specific vision, this pricing structure creates a tension between experimentation and budget. The necessity of multiple iterations to remove glitches means that a single polished clip can consume a substantial portion of a monthly allotment, highlighting the gap between the AI’s current capabilities and a truly streamlined production tool.