CEO Suggests That Humans Could Be "Adversarially Attacked" Like Neural Networks

Apropos of almost nothing, the CEO of an AI startup suggested that a bad actor could use a mysterious image to “attack” human brains the way researchers have done with neural networks.

This kerfuffle began when Florent Crivello, the CEO of the Lindy AI staffing startup, agreed with researcher Tomáš Daniš’ take that “there is no evidence humans can’t be adversarially attacked like neural networks can.”

“There could be,” the Germany-based AI researcher wrote on X-formerly-Twitter, “an artificially constructed sensory input that makes you go insane forever.”

Tonally reminiscent of former OpenAI chief scientist Ilya Sutskever’s infamous claim two years ago that “it may be that today’s large neural networks are slightly conscious,” Daniš’ post also appeared out of the ether with little context — and Crivello, while adding citations to the theory, didn’t do much to lend it more credence.

In his own post, the Lindy CEO referenced a 2015 Google study that found that overlaying a noise-laden image atop a photo of a panda would cause a neural network to misidentify it as a monkey. To his mind, it seems “completely obvious” that such an attack — better known in the AI world as a “jailbreak” — could be used on humans as well.

I’m always surprised to find that this isn’t completely obvious to everyone.

There’s precedent suggesting that that’s the case — like that Pokémon episode aired in the 90s in Japan where a pikachu attack caused the screen to flash red and blue at 12hz for 6s, causing at least… pic.twitter.com/RKBooH8qXO

— Flo Crivello (@Altimor) June 5, 2024

Crivello cited the “Pokémon” episode that induced seizures in audiences across Japan in the 1990s when a strong electric attack from Pikachu resulted in strobing blue and red lights. Despite being parodied on “The Simpsons” stateside, the episode was never aired outside of Japan due to concerns that it might be harmful for audiences abroad as well — and as Crivello notes, that sort of thing could set a precedent for anyone wanting to use “very simple” imagery like that in the “Pokémon” episode to hurt the unsuspecting.

“No reason in principle why humans couldn’t be vulnerable to the same kind of adversarial examples,” he wrote, “making a model think that a panda is a gibbon with 99.7% confidence because some seemingly random noise was added to it.”

For the most part, to be clear, the idea of a deadly sensory input has mostly been constrained to the realm of science fiction.

Back in 1988, science fiction writer Robert Langford wrote a chilling short story titled “BLIT” about an image that could drive people mad and even kill them. The in-universe theory behind the story is known as the “Berryman Logical Image Technique,” or “BLIT” for short, in which near-future supercomputers accidentally create so-called “basilisk” images that the brain, through a mysterious process known as “Gödelian spoilers,” cannot compute.

Told from the perspective of Robbo, a right-wing terrorist who uses the imagery for his group’s racist ends, the story follows the young man as he posts one such basilisk, called “The Parrot,” around an unnamed British town, wearing goggles that distort his vision so that it does not harm him the way it does those who view it with their naked eyes.

Robbo is eventually caught with the image and arrested. Although there are no laws besides petty vandalism for which he can be prosecuted for stenciling the dangerous graffiti, he eventually realizes that he’s seen “The Parrot” enough times that his brain filters out the distortion from the goggles, leading him to die in his jail cell before the night is through.

Indeed, in the comments of Crivello’s post, at least one person made the leap to the “BLIT” story — which, if nothing else, shows how powerful that 30-something-year-old meme truly is.

Beyond being fodder for a nearly decade-old story on the popular r/NoSleep horror subreddit, it doesn’t appear that anyone in the AI community or elsewhere has developed a “BLIT”-style adversarial attack of their own. We’ve reached out to Crivello to ask if he knows of any such research or any additional theories, but if he doesn’t, we’ll chalk it up to yet another instance of AI enthusiast yarn-spinning.

More on AI forecasts: OpenAI Insider Estimates 70 Percent Chance That AI Will Destroy or Catastrophically Harm Humanity