AI Can Pretend To Be Stupider Than It Really Is, Scientists Find

“We may underestimate their capabilities for an extended period, which is not a safe situation.”

Psychobabble

A new study suggests that advanced AI models are pretty good at acting dumber than they are — which might have massive implications as they continue to get smarter.

Published in the journal PLOS One, researchers from Berlin’s Humboldt University found that when testing out a large language model (LLM) on so-called “theory of mind” criteria, they found that not only can AI mimic the language learning stages exhibited in children, but seem to express something akin to the mental capabilities related to those stages as well.

In an interview with PsyPost, Humboldt University research assistant and main study author Anna Maklová, who also happens to be a psycholinguistics expert, explained how her field of study relates to the fascinating finding.

“Thanks to psycholinguistics, we have a relatively comprehensive understanding of what children are capable of at various ages,” Marklová told the outlet. “In particular, the theory of mind plays a significant role, as it explores the inner world of the child and is not easily emulated by observing simple statistical patterns.”

Dumb It Down

With a children-oriented theory of mind as a backdrop, the researcher and her colleagues at Charles University in Prague sought to determine if LLMs like OpenAI’s GPT-4 “can pretend to be less capable than they are.”

To figure that out, the mostly-Czech research team instructed the models to act like a child between the progressive ages of one to six years when giving responses. When put through a battery of more than 1,000 trials and cognitive tests, these “simulated child personas” did indeed seem to be advancing much the same as children those ages do — and, ultimately, demonstrated that the models can pretend to be less intelligent than they are.

“Large language models,” Marklová concluded, “are capable of feigning lower intelligence than they possess.”

As the paper itself warns, anthropomorphizing AI, while perhaps a “useful shorthand” for understanding these models on human terms, is generally unhelpful. Instead, they’re proposing a new theory of mind that shifts the paradigm from whether models are “good” or “bad” and “helpful” or “unhelpful” to how well they can “construct personas,” such as the childlike ones from their experiments.

Ultimately, as Maklová told the website, these findings could aid the development of artificial superintelligence (ASI), the next step after human-level artificial general intelligence (AGI) — and help make it safer once we do.

“In the development of…ASI, we must be cautious not to demand that they emulate a human, and therefore limited, intelligence,” she told PsyPost. “Additionally, it suggests that we may underestimate their capabilities for an extended period, which is not a safe situation.”

More on smart AI: Elon Musk Says That Within Two Years, AI Will Be “Smarter Than the Smartest Human”