Whistleblowers Say OpenAI Broke Promise to Rigorously Test AI for Danger Before Releasing

“We basically failed at the process.”

Deadline Danger

Tech leaders have warned about the potential dangers of the very AIs they’re developing, while harping on about the need for regulation.

The sincerity of this cautionary mien has always been suspect, however, and now there’s more evidence to suggest that OpenAI, a leader in the space, hasn’t been practicing what its CEO Sam Altman has been publicly preaching.

Now, The Washington Post reports that members of OpenAI’s safety team said they felt pressured to rush through testing “designed to prevent the technology from causing catastrophic harm” of its GPT-4 Omni large language model, which now powers ChatGPT — all so the company could push out its product by its May launch date. In sum, they say, OpenAI treated GPT-4o’s safeness as a foregone conclusion.

“They planned the launch after-party prior to knowing if it was safe to launch,” an anonymous individual familiar with the matter told WaPo. “We basically failed at the process.”

A venial sin, perhaps — but one that reflects a seemingly flippant attitude towards safety by the company’s leadership.

Weak Effort

These aren’t the first people close to the company to sound the alarm. In June, a group of OpenAI insiders — both current and former employees — warned in an open letter that the company was skirting safety in favor of “recklessly” racing for dominance in the industry. They also claimed there was a culture of retaliation that led to safety concerns being silenced.

This latest disclosure shows that OpenAI is failing to live up to the standards imposed by President Joe Biden’s executive AI order, which laid out somewhat vague rules for how the industry’s leaders, like Google and Microsoft — which backs OpenAI — should police themselves.

The current practice is that companies conduct their own safety tests on their AI models, and then submit the results to the federal government for review. When testing GPT-4o, however, OpenAI squeezed its testing down into a single week, according to WaPo‘s sources.

Employees protested, as they were well within their rights to — for surely that wouldn’t be enough time to rigorously test the model.

One and Done

OpenAI has downplayed these charges with specious language — and it still comes off sounding a little guilty. Spokesperson Lindsey Held insisted that the company “didn’t cut corners on our safety process,” and merely acknowledged that the launch was “stressful” for employees.

Meanwhile, an anonymous member of the company’s preparedness team told the WaPo that there was enough time to complete the tests, thanks to “dry runs” conducted ahead of time, but admitted that the testing had been “squeezed.”

“I definitely don’t think we skirted on [the tests],” the representative added. “After that, we said, ‘Let’s not do it again.'” A mark of trust in the process if there ever was one.

More on AI: OpenAI Researcher Says He Quit When He Realized the Upsetting Truth