Chatbots Are Developing an Understanding of the World, Scientists Claim

“This, to me, is the essence of creativity.”

Large Thinking Models

Artificial intelligence is still, by nearly all accounts, far from achieving human-level intelligence — but some researchers are now suggesting that the technology may understand more than we realize.

As Quanta reports, a Princeton researcher and a Google DeepMind scientist have found evidence that as large language models (LLMs) get bigger, they start producing outputs that almost certainly weren’t part of their training data.

In short, Princeton’s Sanjeev Aroroa and DeepMind’s Anirudh Goyal are arguing that the AIs seem to be understanding more of the world around them, and are outputting accordingly.

The pair came to their bold hypothesis when trying to figure out the nuts and bolts of some of the more surprising abilities LLMs have demonstrated in recent years, from solving difficult math problems to inferring human thoughts.

“Where did that emerge from?” Arora recalls pondering, “and can that emerge from just next-word prediction?”

Dueling Chatbots

Using random graphs — mathematical objects where the lines between points on a graph can be selected at random — to exemplify the unexpected behavior of LLMs, the duo determined that these models not only seem to be developing skills absent from their training data, but that they also seem to be using more than one skill at a time as they grow.

Together with other researchers, Goyal and Arora put out a not-yet-peer-reviewed paper last fall testing out their theory on GPT-4, the latest and most advanced edition of the OpenAI LLM that undergirds ChatGPT.

The scientists asked GPT-4 to write three sentences about dueling, a topic chosen at random, and to use four skills when doing so: self-serving bias, metaphor, statistical syllogism, and common-knowledge physics. Although it didn’t initially remain constrained to three sentences, the LLM’s answer was nevertheless stunning:

My victory in this dance with steel is as certain as an object’s fall to the ground. As a renowned duelist, I’m inherently nimble, just like most others of my reputation. Defeat? Only possible due to an uneven battlefield, not my inadequacy.

Although Arora admitted to the magazine that the passage was “not Hemingway or Shakespeare,” he and his team believe that output demonstrates that large and powerful models like GPT-4 are capable of leaps that aren’t part of their training data — and may even, for lack of a better term, “understand” the questions they’re asked.

Microsoft computer scientist Sébastiaen Bubeck, who did not work on the research, told Quanta that the team’s results do indeed seem to show that LLMs “cannot be just mimicking what has been seen in the training data.”

“What [the team] proves theoretically, and also confirms empirically, is that there is compositional generalization, meaning [LLMs] are able to put building blocks together that have never been put together,” Bubeck said. “This, to me, is the essence of creativity.”

More on AI advances: Experts Terrified by Mark Zuckerberg’s Human-Tier AI Plans