Stack Overflow Bans Users for Protesting Against It Selling Their Answers to OpenAI as Training Data

“Why does OpenAI get to profit from our work?”

Scraping By

Stack Overflow, the go-to forum for programmers and software developers, announced a partnership with OpenAI to sell the site’s data — including its users’ forum posts — to train ChatGPT.

But the move has proved highly controversial. As Tom’s Hardware reports, many Stack Overflow users are now attempting to edit or remove their posts to protest their answers being used to bolster OpenAI’s large language model. That mild disobedience has incurred bans by Stack Overflow’s moderators, who say the deletions go against the site’s policy.

One Stack Overflow user by the name of Ben, who claims to be a UI programmer at Epic Games, said they were suspended for seven days for editing their highest-rated answers in protest, writing in one post: “Why does OpenAI get to profit from our work?”

“It’s just a reminder that anything you post on any of these platforms can and will be used for profit,” the user wrote in a thread on X, formerly Twitter. “It’s just a matter of time until all your messages on Discord, Twitter etc. are scraped, fed into a model and sold back to you.”

So instead I changed my highest-rated answers to a protest message.

Within an hour mods had changed the questions back and suspended my account for 7 days. pic.twitter.com/4TyGiPrTYF

— ben 🇵🇸 ui (@_benui) May 6, 2024

Stacked Against ‘Em

Ben and others cite the European Union’s General Data Protection Regulation as giving them the right to delete all data pertaining to them from the website — but Stack Overflow’s Term’s and Conditions contain a specific clause to “irrevocably license” any content users post to the site, Tom’s Hardware notes.

On its face, there’s a seemingly sound rationale behind Stack Overflow’s policy against ‘defacing’ community posts, which is the preservation of organic, community-driven knowledge (not to be undervalued in an era of useless Google Search results filled with corporate drivel).

“Even if the post is no longer useful to the original author, that information is still beneficial to others who may run into similar problems in the future — this is the underlying philosophy of Stack Exchange,” the forum’s moderation team wrote in an email to the banned user.

But good will requires good faith, and it is alarming to see this principle being wielded as a cudgel against the site’s indispensable contributors, who are understandably upset about their posts, which were written for free, being sold off to train a generative AI that could replace them — and in so doing, provide worse, error-ridden versions of human programmers’ advice.

Hypocrisy in this regard is not new for Stack Overflow. As Tom’s Hardware notes, the site once took a hardline against generative AI being used to write responses on the forums. But as the rise of programming-capable AI models saw traffic to the site dip, it changed its tune, announcing the development of its own AI model called OverflowAI. As AI continues to be a massive cash cow, it seems inevitable that more websites and services will follow a similar path, pawning off your privacy to survive in an ever more ruthless digital age.

More on AI: Microsoft Deploys Powerful New AI Completely Disconnected From the Internet