Google describes Omni as “Nano Banana for video”. This means a single AI model handles text, images, and audio together. Previously, Google used separate models for each, which limited consistency and made editing across multiple clips difficult.
With Omni, you can edit and remix videos through natural conversation. For example, you can tell it to “reduce the number of people in the crowd” or “make it a daytime scene”. Previously, we ended up rewriting the prompt, spending credits, and hoping the next generation landed closer to what we wanted. On a smartphone, you can do this by voice if you prefer.
Omni also accepts any combination of text, images, audio, and video as input. So you can, for example, sketch out the movements you want and hand that sketch directly to the model.
Choosing a theme
The theme for our video was “protect.”
The idea came from a podcast featuring Donald Miller and Amy Porterfield, in which Miller argued that the word “protecting” triggers curiosity in almost everyone, because “the human brain is designed to survive and it’s designed to overindex on potential threats. So when someone says the word protect, you become alert.”
Applied to Cherryleaf’s services, this gave us five natural scenes:
- Protect your users from confusion.
- Protect your team from repeated questions.
- Protect product adoption with clearer documentation.
- Protect customer trust with documentation people can rely on.
- Protect your business from unclear information.
How Omni compared to Veo 3.1
When we previously used Google Veo 3.1 to create multi-clip videos, maintaining consistency across scenes was a recurring challenge. A character might speak in a Scottish accent in one clip and an American accent in the next. It would often ignore the accent instructions in the prompts. Image references helped with visual consistency, but getting the result you were aiming for could still be frustrating.
Omni produced noticeably more consistent results across clips. We didn’t test voice consistency for this video, as our characters were silent. Visually, the people looked less “waxy” than in Veo 3.1. The overall result was that we could produce the video in significantly less time.
What we’d flag
The video isn’t perfect. In the call centre scene, the staff look suspiciously similar to one another, and one of the floating knowledge base graphics has some imperfect text rendering. Google has also reduced downloads to 720p on some plans that previously supported 1080p. Since most viewers won’t watch in full screen at maximum detail, we consider these acceptable trade-offs.
We still used Camtasia and Audiate for editing: adjusting clip lengths, adding narration, colour grading, fades, and the final card. There’s still a place for those tools in the workflow.
Watch the finished video here:

Leave a Reply