Google’s Omni Model Promises ‘Anything-to-Anything’ AI, but Consistency Remains Elusive

Table of Contents
The Ambition of ‘Anything-to-Anything’
Google has long played catch-up in the high-stakes race of generative video, but its latest move at Google I/O 2026 signals a shift in strategy. The company has introduced Omni, a family of generative models designed with a massive, flexible scope: the ability to eventually convert any input—text, image, or video—into any other format. It is a vision of a truly multimodal AI, though for now, the practical application is focused on the volatile world of video synthesis.
The first iteration, Omni Flash, has arrived within Google’s AI video generation and editing platform, Flow. While the previous Veo model remains available, Omni Flash is positioned as the smarter, more context-aware successor. Google claims the model incorporates deeper real-world knowledge and offers significantly better character consistency, solving the ‘shimmering’ or mutating figures that have plagued early AI video tools.
The Consistency Gap
In practice, however, the transition from Veo to Omni is less of a leap and more of a stumble. Testing the model’s ability to maintain a character—specifically a stuffed deer used as a recurring subject—reveals that while the fidelity is higher, the logic is still fragile. For instance, prompting the model to create a playful montage of the character packing for a cruise resulted in a surprising level of narrative coherence, with the AI successfully introducing a jar of honey that reappeared later in the clip.
Yet, the technical ‘glitches’ are still pervasive. Throughout the same sequence, the honey jar morphed inconsistently between a glass jar and a plastic squeeze bottle, a telltale sign that the model is still struggling with permanent object identity across frames. More jarring were the ‘AI jump scares’—sudden shifts in orientation or the spontaneous appearance of antlers on a character specifically prompted to be a baby deer without them.
The Cost of Iteration
One of the more pressing concerns for creators is the economic barrier to entry. Omni is not a free playground; it operates on a credit system where generating a scene can cost anywhere from 15 to 40 credits depending on the complexity. Editing a clip—a necessary step given the model’s tendency to hallucinate details—costs another 40 credits.
For users on the $20-per-month AI Pro plan, which provides 1,000 credits, the math becomes precarious quickly. After roughly 20 clips and a handful of edits, a significant portion of the monthly allotment vanishes. This creates a frustrating loop: the tool requires heavy iteration to achieve a professional result, but the pricing structure penalizes the very experimentation needed to overcome the model’s current limitations.
The Uncanny Valley of Deepfakes
Where Omni truly asserts its power is in its ability to integrate AI-generated elements into existing real-world footage. By using a neutral selfie video as a base, the model can transpose a user into entirely new environments with startling accuracy. Experiments placing a subject in an airplane seat or in front of the Eiffel Tower demonstrate a level of photorealism that is increasingly difficult to debunk.
While audio artifacts—such as a manufactured-sounding clink of a fork against a bowl—and background repetitions remain, the visual output is often convincing enough to fool those closest to the subject. In one test, a spouse was unable to detect that a video of the user eating pasta was an AI synthesis, noting only that the bowl looked unfamiliar. It is a testament to Google’s progress in diffusion and rendering, but it also highlights the growing ease with which high-fidelity deepfakes can be produced by non-experts.
Omni represents a significant step toward a seamless generative interface, but it remains a tool in transition—oscillating between genuine breakthrough and the erratic nature of current LLM-driven video.