Veo 3 is Magic: We Are All Filmmakers Now (and Really Great Ones)
Plus my Hollywood style commercials
AI is moving at a rapid speed. Text-to-video AI is magic, but until now, we were limited to silent AI videos. We could create stunning visuals by describing scenes, but the audio had to be added separately. Veo 3, Google's latest text-to-video model, shatters this limitation entirely.
With a single prompt, you can now generate videos with synchronized audio, dialogue delivery, background music, ambient sounds, and sound effects, all created from written description.
Everyone’s a Filmmaker Now
I recently created Grant Orb commercials using Veo 3 and carefully crafted prompts (see the prompts below). The results? Hollywood-level polish in just couple minutes.
The dialogue delivery, the accents, character performances, the whole scene setting – it’s unmatched.
The fact that I, just me, could create something like this with just words is still keeping me up at night.
Traditionally, creating a commercial like this would have required props, casting, location scouting, perfect timing, filming equipment, post-production, sound design, and potentially thousands of dollars. Now, it takes a great prompt and a few minutes.
Veo 3 is currently available to Gemini paid subscribers, with videos limited to 8 seconds in length which is perfect for social media.
I’m also really excited about Google Flow that allows us to put together full movies by maintaining character and visual consistency from one clip to the next.
As changemakers, our stories are our most powerful tool. They’re how we connect, how we inspire, how we move people to action. But we’ve never had the big budget to tell those stories with the visual punch they deserve. Well, guess what? AI fixes that
Just last week, I conducted an AI and Visual Storytelling workshop for Techsoup members. We created this video live of plastics impact in our ocean. We were still in the silent AI video era then. To leap from that to what Veo 3 is doing now with its integrated audio? It’s just… wow.
If you'd like to level up, subscribe to my LinkedIn newsletter where I write about AI and Social Good.
Prompts
Video 1: Cinematic scene. Outside a small Italian café on a cobblestone street. Midday sun filters through striped awnings. Espresso steam curls into the warm air. An Italian man, mid-60s, in a wool coat and fedora, sits back at a round metal table. A tiny coffee and a half-eaten cannoli rest in front of him. He exudes calm, wisdom, and old-school swagger. He leans forward and says in Italian style - “I ain’t got time to sit 'round writin’ grants. We need money for the community. So I use this thing — Grant Orb. I hit the button — boom — out pops a grant."
Camera: As soon as the camera rolls, he starts talking. Slight zoom-in as he leans forward slowly, eyes locked on the camera — like he’s sharing something just with you. Minimal movement, but the gravitas is thick.
Video 2: Wide, slightly low-angle tracking shot. A cartoon-style scene set at mid-day New York City sidewalk. A tired dog walker struggles to manage five leashed dogs weaving through a noisy, chaotic city: a nervous chihuahua dodging trash, a grumpy bulldog dragging its feet, a fussy poodle hopping around a puddle, a beagle sniffing everything in sight, and a golden retriever in glasses looking increasingly fed up. The sidewalk is cluttered with garbage bags, a hot dog cart, and a “No Dogs Allowed” sign posted near a locked green space. The group trudges past a pizza box and honking yellow cab. Suddenly, the golden retriever plants its paws and stops the group. The camera pushes in on the retriever’s face as it turns toward the viewer with calm, dry wit and says, “We need a dog park. Time to use Grant Orb.” The other dogs freeze mid-step, eyebrows raised, quietly impressed. End with a wide shot of the dogs paused in the middle of the sidewalk. (sounds: barking, leash jingles, distant honking horns
Video 3: Split screen (left and right, both full vertical halves). Left Side (Online Shopping Zoom-In): Close-up zoom on a computer screen. A finger clicks a bright “BUY NOW” button over and over. Each click triggers a cheerful “ching!” sound like a cash register or digital ka-ching. The hand and cursor repeat rapidly, showing impulse buying. SFX: Ching! Ching! Ching! with each click.
Right Side (Landfill Disaster): For each “ching,” a new piece of trash (shrink-wrapped product, cardboard box, plastic gadget) falls from the sky into an ever-growing mountain of garbage in a vast landfill. A dull thud or squelch sound as it hits. Background: Overcast sky, birds circling above, and a muffled wind blowing. SFX: Thud. Thud. Thud. + faint caws and wind whoosh. End with a slightly zoomed-out view of both sides: Left: the user still clicking mindlessly. Right: the garbage now spilling out of frame.
Video 4: Begin with a serene, wide-angle shot of a beautiful, crystal-clear ocean, with a low horizon eye-level view or a split over/under water shot showcasing vibrant marine life. Sunlight dapples the surface. Transition dramatically. A massive, dense wave of plastic trash (bottles, bags, microplastics) suddenly appears and surges across the screen from one side, or rises menacingly from a low angle. The camera should feel like it's being overwhelmed or engulfed by the plastic, perhaps a point-of-view shot as the plastic debris washes over, obscuring the clear water and filling the frame with chaotic, tumbling plastic. Emphasize the suffocating density of the pollution.The immediate aftermath. Show a close-up of a marine animal, like a sea turtle, struggling amidst the plastic debris, or its eye showing distress.