OpenAI's GPT-Bidi-1 aims to fix voice mode's turn-taking problem

OpenAI is preparing a bidirectional audio model for ChatGPT that listens and speaks simultaneously, absorbing interruptions without freezing mid-response.

Alessandro Benigni

PUBLISHED JUN 18, 2026

3 MIN READ

Follow on Google

9 HR AGO

OpenAI's GPT-Bidi-1 aims to fix voice mode's turn-taking problem — featured image for AI Insiders

OpenAI is preparing to ship a next-generation audio model for ChatGPT’s voice mode, tentatively named GPT-Bidi-1, built on a bidirectional architecture that allows the system to listen and speak at the same time. TestingCatalog reported the development on June 16, citing signs of the feature appearing across both web and mobile clients, suggesting a consumer rollout is near, though the final model name may change before launch.

The core problem GPT-Bidi-1 addresses is architectural, not cosmetic. Current voice assistants, including ChatGPT’s Advanced Voice Mode, operate on a turn-taking model: the system speaks, pauses, then listens. When a user interjects with “mm-hm” or begins a correction mid-sentence, the assistant either stops abruptly or ignores the input entirely until it finishes its turn. This is the same limitation that has made Siri and Alexa feel robotic for fifteen years. Full-duplex audio, where the model maintains two-way audio channels simultaneously, is the engineering requirement to close that gap.

The “BiDi” in GPT-Bidi-1 refers to that bidirectional design: a model capable of processing incoming audio while generating outgoing speech, adjusting or abandoning its current output in response to real-time user signals. Barge-in detection, the ability to register an interruption as a signal rather than noise, is a specific capability the architecture is built around. The competitive benchmark is not other AI assistants but phone conversations with humans, where neither party pauses politely before speaking.

OpenAI’s text models have moved well ahead of its voice stack. GPT-5.5 generation text capabilities sit in a different tier from the audio model ChatGPT currently runs for spoken conversations. According to TestingCatalog, that divergence is part of the motivation behind GPT-Bidi-1: the company has a strategic bet on speech becoming the primary interface for AI, visible in its planned audio-first hardware and voice-based support products, and the current voice stack does not match that ambition.

The feature’s rollout structure, as TestingCatalog describes it, would give ChatGPT users a choice rather than a forced migration. A new “Bidi (Latest)” mode would sit alongside the current Advanced Voice Mode, letting users opt into the bidirectional system. More significant is the intelligence-tier structure attached to it: High, Medium, and Instant options that mirror the speed-versus-depth tradeoffs already available on the text side. That framing signals OpenAI intends to position voice as a first-class surface, not a feature layered on top of a text product.

A UI change already live, the ability to drag the voice bubble to the center of the screen, reads in this context as an early piece of a larger redesign rather than a standalone cosmetic adjustment.

Timing is uncertain. TestingCatalog is explicit that whether the rollout begins imminently or later is not yet clear, and no release date has been confirmed. The groundwork in the codebase and client interfaces is visible; the ship date is not.

For teams building products or workflows on ChatGPT’s voice API, the arrival of a bidirectional model would require revisiting assumptions about conversational state management. Turn-based logic, where the application waits for the assistant to complete a response before allowing user input, does not map cleanly onto a full-duplex system. The sooner those integrations are prototyped against a barge-in-capable model, the less rework the launch creates.

Reported by TestingCatalog, published June 16, 2026.

OpenAI's GPT-Bidi-1 aims to fix voice mode's turn-taking problem

The morning brief for people inside the AI industry.

More in Models

Microsoft Extends Phi Silica to NVIDIA GPUs, Testing NPU Limits

Qwen-RobotWorld makes plain language the control layer for robots

Weibo's 3B model matches flagships on LeetCode, falls short on science