Leak Reveals the New York Times Experimented With Using AI to Write Headlines

The New York Times is suing OpenAI and Microsoft for copyright infringement, arguing that the AI companies unlawfully scraped protected work to train the generative AI models that would eventually be used to build a competing product. New reporting from The Intercept, however, reveals that the NYT has meanwhile experimented with applying OpenAI’s AI tools in the newsroom.

According to the Intercept — which is also suing OpenAI and Microsoft over copyright infringement — the NYT’s OpenAI experimentation was revealed last month after a massive chunk of the NYT‘s GitHub data was anonymously leaked on 4Chan. (As the Intercept notes, the NYT confirmed to BleepingComputer that the leak was legitimate.)

Within this massive breach — over three million files were stolen across 6,000 GitHub repositories — the Intercept uncovered more than one AI project, including one dubbed “OpenAI Styleguide.” In that effort, the newspaper explored ways to use OpenAI’s DaVinci to perform tasks like generating headlines and applying the paper’s style guide to a story. Elsewhere, a separate, self-started effort from a staffer during the paper’s “Maker Week” utilized OpenAI’s ChatGPT to draft headlines as well.

Which, right now, are tasks that are still generally performed by human journalists.

Another project, this one unfinished, was intended to automatically generate “counterpoints” to opinion articles — a section of the NYT that’s often been the source of drama at the newspaper in recent years.

As the report notes, nowhere on the NYT’s Research and Development page — where it discusses two dozen use cases for AI and machine learning within its reporting efforts — does it list editorial applications like headline generation and style editing. And in response to the Intercept’s reporting, a spokesperson for the NYT said the “OpenAI Styleguide” effort “was a very early experiment by our engineering team designed to understand generative AI and its potential use cases.”

“In this case, the experiment was not taken beyond testing, and was not used by the newsroom,” the spokesperson added. “We continue to experiment with potential applications of AI for the benefit of our journalists and audience.”

It’s true that the effort seems a bit slipshod: according to the Intercept‘s reporting, the code instructs OpenAI’s chatbot that it is a “headline writter [sic] for The New York Times.”

It’s worth pointing out that the NYT has generally been up-front about its experimentation with AI, and even brought on its first-ever Director of AI Initiatives last year. And before it ultimately decided to sue OpenAI, the paper, like many other publishers, had actually been in discussions to license its vast library to the AI company. Clearly, though, those talks ultimately crumbled, and the newspaper is now asking OpenAI and Microsoft to compensate it for its web-scraped labor to the tune of billions.

The Intercept’s report is a fascinating peek at AI’s ongoing ripples in the world of journalism, the potentially very consequential legal drama between cultural behemoths aside.

Specifically, the media industry is collectively grappling with what AI means for individual publishers as well as for the media industry writ large. Tensions between the journalism world and Silicon Valley’s AI sector only continue to rise, and as more media partnerships and lawsuits take shape, the ways that players are using — or choosing not to use — the burgeoning technology matters. To that end, that the paper of record experimented with AI applications that would seemingly replace or diminish certain human roles is a no small revelation amid the media’s ongoing tumult.

More on AI and journalism: Readers Absolutely Detest AI-Generated News Articles, Research Shows