Google ships real-time speech translation across 70+ languages

Gemini 3.5 Live Translate streams translated audio a few seconds behind the speaker, preserving pitch and intonation across 2,000+ language pairs.

Alessandro Benigni

PUBLISHED JUN 11, 2026

3 MIN READ

Follow on Google

YESTERDAY

Google ships real-time speech translation across 70+ languages — featured image for AI Insiders

Google released Gemini 3.5 Live Translate on June 9, a purpose-built audio model that converts spoken language into translated speech in near real-time across more than 70 languages. The release ships today into three surfaces simultaneously: public preview via the Gemini Live API and Google AI Studio, private preview for enterprise customers in Google Meet, and a broad rollout inside the Google Translate app on Android and iOS.

The technical claim that distinguishes this from prior translation tools is continuous streaming. Traditional speech translation systems use a turn-by-turn architecture: they wait for the speaker to finish a sentence, process the audio, then produce output. The result is a 1 to 3 second pause between speech and translation. Live Translate generates output while the speaker is still talking, staying a few seconds behind rather than stopping to wait. Google’s blog post, written by product manager Anuda Weerasinghe and senior staff engineer Tony Lu, describes the model as balancing “the trade-off between waiting for context to improve quality and translating immediately to stay in sync with the speaker.”

The intonation preservation claim is the other differentiator. The model is described as reproducing the speaker’s pitch, pacing, and emphasis in the translated output, not flattening everything into a neutral synthesized voice. That distinction matters for consumer settings where voice quality signals trust.

For developers, the release is concrete and accessible now. Platforms including Agora, LiveKit, Fishjam, Pipecat, and Vision Agents have already integrated with the Gemini Live API, handling real-time media streaming infrastructure so builders can focus on application logic. Grab, the Southeast Asian ride-hailing company, is testing the model to handle multilingual communication between drivers and travelers; Grab’s users make over 10 million voice calls per month through the platform.

The Google Meet upgrade is the enterprise number that demonstrates scale. The previous speech translation feature in Meet supported five languages and only translated to and from English, covering a narrow slice of global enterprise communication. The new version supports 70-plus languages and over 2,000 language combination pairs in a single meeting. Google is not disclosing a general availability date; the private preview is for select Workspace business customers, with broader rollout described as “later this year.”

There is a strategic layer here that the blog post does not address. Apple and Google formalized a deal in which Gemini serves as the AI backbone for Siri, with Apple paying roughly $1 billion per year for access to the model stack. The audio capabilities now shipping into Google Translate and Meet are drawn from the same model family. That means the voice translation quality Google is demonstrating in its own products is a close proxy for what Apple routes Siri queries through for complex language tasks. The translation surface is one of the few consumer AI use cases where users can directly perceive quality differences between providers, which gives this release more strategic visibility than a standard API update.

Google is also watermarking all audio output with SynthID, its imperceptible audio watermark designed to keep AI-generated speech detectable. The company published a model card with safety and responsibility details alongside the release.

The release announcement does not include independent benchmark comparisons against Microsoft Translator, Apple’s translation feature, or OpenAI’s voice mode. The intonation and latency claims come from Google and from early partner feedback cited in the post.

Developers building multilingual voice products should test the Gemini Live API against their current stack before Q3; the Meet private preview timeline suggests Google’s enterprise rollout will hit general availability before year-end, which sets the competitive baseline that other translation vendors will need to match.

Google published this announcement on the Google Blog (blog.google) on June 9, 2026.

Google ships real-time speech translation across 70+ languages

The morning brief for people inside the AI industry.

More in Tools

Debug the data, not the model

One dev trained a custom LLM from scratch for $80

The tokenizer is your cheapest cost lever. Here's how to optimize it.