How real-time translation actually works
Real-time translation used to be a party trick. You'd hold up your phone, wait three seconds, and get back something that was almost right but felt like it came from a 1998 chatbot. For twenty years, that's what "instant translation" meant — slow, clumsy, and uncanny.
Then, somewhere in the last eighteen months, it stopped being a party trick and became infrastructure. The same way GPS stopped being a novelty and became the thing every app quietly runs on top of.
Three things changed at once
Latency collapsed. A modern translation model can run inference on a short sentence in under 150 milliseconds. That's faster than the time it takes you to look down at your phone. Below 200ms, conversation stops feeling translated and starts feeling live.
Quality crossed the uncanny threshold. Native speakers used to spot machine translation in one sentence. Now, for everyday conversation, they often can't. Idioms work. Register holds. Jokes mostly survive. The remaining errors are the kind a distracted human would make.
Voice joined text. Speech-to-text, translation, and text-to-speech used to be three separate hops with compounding error. They're now one fused pipeline. You speak, the other person hears you — in their language, in something that sounds like your voice, fast enough that eye contact still works.
Why this matters for social
Every social network that exists was built on an assumption nobody stated out loud: people who can talk to each other already share a language. That assumption used to be true. It isn't anymore.
Once translation is fast enough and good enough, the language of a post stops being a boundary and starts being metadata. You post in Portuguese. A reader in Jakarta sees it in Bahasa. Neither of you thinks about it. Neither of you should have to.
This is the quiet infrastructure shift that makes a single global social network possible for the first time. Not a better Twitter. Not a translated Instagram. A genuinely new thing — one feed, one conversation, 7.9 billion people.
What's still hard
Voice with emotional nuance — sarcasm, anger, tenderness — still drops something in translation. Cultural references that have no analog in the target language remain tricky. And the very last 5% of quality (the difference between "I understood" and "I felt it") is the hardest 5% to get right.
But the gap is closing faster than most people realize. Each model generation gets roughly 30% better on the benchmarks that matter for conversation. At that rate, the last barriers fall inside this decade.
What we're building
Babel is built on top of this shift. It's a social network where the translation layer isn't a feature — it's the foundation. Every post, every comment, every voice note: translated at the edge, before you ever see it, in whatever language you think in.
If you create content, you can finally reach every creator's dream — 7.9 billion people. If you run a business, you can sell to every market without a localization team. If you travel, every destination feels like home.
The technology is ready. The network is what's missing. That's what we're building.