Google’s Omni Model Promises ‘Anything-to-Anything’ AI, but the Reality is a Mixed Bag

Table of Contents
The Ambition of Anything-to-Anything
At Google I/O 2026, the company unveiled Omni, a new family of generative models designed with a singular, sweeping ambition: to create an “anything-to-anything” interface. In theory, Omni is built to ingest any form of data—be it a photograph, a snippet of video, or a block of text—and transform it into any other medium. While the full scope of this versatility is still being rolled out, the first tangible manifestation is Omni Flash, now integrated into Google’s AI video generation and editing platform, Flow.
For users who have experimented with Google’s previous model, Veo, the shift to Omni is positioned as an upgrade in both world-knowledge and character consistency. The goal is to move past the erratic “dream-state” quality of early generative video and toward a tool capable of maintaining a coherent subject across multiple scenes. However, in practice, the transition from Veo to Omni feels less like a leap and more like a series of inconsistent steps.
Testing Character Consistency
Testing the model’s ability to maintain a consistent subject—in this case, a child’s stuffed deer—revealed the current friction between Google’s claims and the model’s execution. Omni allows users to upload a reference video and a text prompt to guide the generation. In some instances, the results were remarkably stable, far surpassing the coherence found in Veo five months ago. But these successes are often punctuated by “AI jump scares.” In one sequence involving a skydiving plush toy, the character abruptly shifted orientation mid-air, a jarring glitch that betrays the underlying algorithmic struggle to map 3D space.
The model’s struggle with object permanence is even more evident when prompts involve complex interactions. In a montage where the character packs for a cruise, Omni successfully generated a narrative beat involving a jar of honey. However, as the clip progressed, the honey jar morphed into a clear squirt bottle of water, then back into a honey bottle. The final frame of the sequence devolved into a fragmented mess of visual elements, as if the model had simply collapsed under the weight of its own generated logic.
The Cost of Precision
Beyond the visual glitches is the economic reality of using Omni. Video generation is not a free utility; it operates on a credit system. Depending on the scene length and the complexity of the starting materials, a single generation can cost between 15 and 40 credits. Editing a clip consumes another 40 credits.
For those on the $20-per-month AI Pro plan, which provides 1,000 credits, the burn rate is surprisingly high. After generating roughly 20 clips and applying a few edits, the balance dropped to 145 credits. For creators seeking a specific vision, the costly cycle of prompting and refining suggests that Omni is currently more of a lottery than a precision tool.
The Uncanny Valley of Deepfakes
While the creative experiments with toys were whimsical, the model’s ability to integrate AI-generated elements into real-world footage is where Omni becomes genuinely disruptive. By using a simple selfie video as a base, the model can convincingly place a user in entirely new environments.
The results are startlingly effective. In tests involving eating spaghetti or standing before the Eiffel Tower, the videos were convincing enough to fool a spouse—someone intimately familiar with the subject’s real-world appearance. While technical “tells” remain—such as a manufactured sound of a fork hitting a bowl or background characters appearing twice—the visual fidelity is high enough to pose a significant challenge to digital authenticity on social media.
Google is pushing the boundaries of what is possible with multimodal AI, but Omni remains a work in progress. It exists in the tension between a tool for harmless creativity and a potent engine for hyper-realistic synthetic media, all while operating on a credit system that makes perfection a pricey pursuit.