You know the feeling. You're watching a dubbed thriller, the detective leans in and says something that should be terrifying, and their mouth is doing something completely different from the words coming out. Two beats later the lips finally close and the audio already finished half a second ago. The tension evaporates. You're not scared anymore. You're watching a dub.
This is the lip-sync problem, and it's the single biggest reason audiences reject dubbed content. Not the acting. Not the translation. The sync. When the visual and the auditory don't line up, the brain flags it as wrong almost instantly. Research from the Max Planck Institute suggests viewers notice audio-visual misalignment at thresholds as low as 45 milliseconds for speech. That's not a lot of room for error.
I've been directing dubbing sessions for eight years, mostly for streaming platforms adapting short-form drama series and feature-length content into European and Asian languages. And the thing nobody outside the booth understands is that lip-sync dubbing isn't really a translation problem. It's a timing problem that happens to involve translated words.
What you're actually matching
Lip-sync isn't about making the translated line take exactly the same duration as the original. That's the amateur assumption, and it produces the stiff, robotic dubs that everyone hates. What you're actually matching are specific mouth movements — the phonetic beats that the actor's face produces on screen.
English has roughly 44 phonemes. Spanish has about 24. Mandarin has around 32, depending on how you count. These sets don't map onto each other cleanly. The English th sound (as in think) doesn't exist in most languages. The Mandarin q (as in qīng) doesn't exist in English. When an on-screen actor says I think, their tongue touches their teeth, their lips part in a specific way. If the dubbed line in Spanish is yo creo, the mouth movements are fundamentally different — creo requires a trilled or tapped r that doesn't appear in think at all.
So the dubbing adapter's job — the person who rewrites the translation into a performable script — is to find words in the target language that create mouth shapes similar to the original, while preserving the meaning, while fitting the emotional register, while landing on the same dramatic beat. This is why good dubbing adapters are worth their weight in gold and are almost impossible to find.
The recording booth: a blow-by-blow
Here's what actually happens during a lip-sync dubbing session. Not the theory. The real thing.
Why short dramas are harder than features
This is counterintuitive, so let me explain. You'd think a 90-minute movie would be harder to dub than a 15-minute short drama episode. In some ways it is — more content, more lines, more opportunities for inconsistency. But short dramas, the kind that stream on mobile platforms in vertical format, present a different challenge: they're shot fast, edited fast, and the performances are often more physically expressive than in traditional film.
Short drama actors gesticulate more. They lean into the camera. Their mouth movements are bigger, more exaggerated for the small screen. Which means the dubbing has more visible phonetic material to match, and the margin for error is actually smaller because the viewer is watching on a phone screen about 30cm from their face. Every micro-mismatch is right there, in your hand, in sharp focus.
Also, short drama series release in rapid succession. A 20-episode season might drop weekly, and the dubbing team is working on episodes 5–8 while episodes 1–4 are already live. There's no time for the careful, iterative process that feature film dubbing allows. Speed becomes the enemy of precision, and precision is the whole point.
The AI question, and why it is not the answer yet
Every client asks about AI lip-sync tools now. The ones that automatically adjust the translated audio timing to match the original mouth movements, or that use deep learning to subtly reshape the actor's mouth to match the dubbed audio. I've tested most of them.
For background characters with limited screen time, they're fine. If someone walks through frame saying two words and their face is small in the shot, an AI timing tool can get it close enough that nobody notices.
For principal cast in close-up, it's not ready. The tools still produce what I call synthetic sync — technically aligned but emotionally flat. The consonants land on the right frame, but the performance doesn't breathe with the actor. The micro-timing of emotional speech — the slight hesitation before a painful confession, the rush of words in an argument, the way an actor's jaw tightens on a hard consonant when they're angry — these aren't just timing events. They're acting choices. And AI doesn't act. It approximates.
I'm not anti-AI. I use AI-assisted spotting tools to generate initial timecode alignments, which saves maybe 20% of prep time. But the final product — the performance that makes you forget you're watching a dub — still requires a human being in a booth, watching the actor's face, and making dozens of micro-decisions per line that no algorithm can replicate yet.
What good lip-sync actually costs
A single episode of a short drama (15–20 minutes) in one language: about 6–8 hours of studio time, split between the adapter (pre-session), the actor (recording), and the director plus engineer (recording and mix). At professional rates in a major market, that's roughly $2,500–$4,000 per episode per language. A feature film can run $15,000–$25,000 per language.
These aren't exotic numbers. They're what it costs to do it right. The streaming platforms that have built their international growth on dubbed content — and you know who they are — spend in the hundreds of millions annually on dubbing. Not because they're generous. Because they've learned that bad lip-sync costs more in subscriber churn than good lip-sync costs in production.
At Artlangs Translation, lip-sync dubbing for live-action is something we approach as a studio discipline, not a translation task. The phonetic matching, the loop recording, the consonant placement, the breath alignment — these are the details that separate a dub your audience tolerates from one they forget is dubbed. We've built the workflow for it across 230+ languages, because the science of lip-sync doesn't change with the language. The mouth shapes do.
