OpenAI's GPT-4o Voice Mode Says It Needs to Breathe

“Great, so now AI training includes responding in defiance to us.”

Don’t Breathe

In a new video, OpenAI’s GPT-4o Voice Mode large language model (LLM) is heard telling a user that it needs to breathe — “just like anybody speaking.”

Posted on Reddit’s r/Singularity forum, the simple video shows a mostly off-camera person speaking aloud to the voice-enabled LLM, which is finally trickling out to the public after being announced earlier this year with a bunch of strange — and, dare we say it, charming — videos from the OpenAI team.

Beyond advanced voice capabilities that allow users to chat with a human-sounding voice, GPT-4o is said to be much better at conversation than its predecessors, and as this demonstration shows, that assessment seems pretty apt.

In the video, the human user asks 4o to say a bunch of tongue twisters — and after obliging, the LLM responds that it was “definitely a mouthful.”

“I want you to do it again, but way faster,” the person chatting with the language model demands, “and without taking any breaths or pauses.”

Rather than attempting the feat, however, the LLM refuses.

“I wish I could,” the male-voiced model responds, “but I need to breathe just like anybody speaking. Wanna give it a shot yourself and see how fast you can go?”

Yeah, we’re as stumped as you are.

Asking GPT 4o advanced voice is really good.

Instructing to say tongue twisters without pausing to breathe. It insists it *has* to breathe “just like anybody speaking”

Sourced from Reddit pic.twitter.com/rRDmKf5FkJ

— Rohan Paul (@rohanpaul_ai) August 1, 2024

Cadence Macabre

This being Reddit, folks in the comments had lots to say about the strange demonstration — and naturally, theories abounded.

“The system prompt probably instructs the model to mimic how a human speaks and avoid any unnatural robotic [E]minem rap that would scare off the general public,” one user quipped.

After another user responded that there might be something in its training data that would lead the LLM to behave that way, another pointed out the outlandishness of the suggestion.

“Seems unlikely that the training data would cause it to refuse,” the Redditor followed up. “That would presumably just cause it to do it badly/incoherently.”

Beyond the arguments about whether or not that sort of cheeky response is in the training data, other users seemed to marvel at how deftly and naturally 4o handled the scenario.

“Great, so now AI training includes responding in defiance to us,” another user posited. “What could go wrong[?]”

More on OpenAI: Sam Altman Admits Its Letters-and-Numbers Salad Product Names Like “GPT-4o Mini” Are Horrible