No One Wants Apple To Scrape Their Websites for AI Training

Ouch.

Stop Sign

Wired reports that a slew of major websites, including influential news publishers and top social media platforms, are blocking Apple’s web crawler from scraping their pages for AI training content.

Per the report, media companies that have altered their robots.txt files to lock Applebot out include The New York Times, The Atlantic, The Financial Times, Gannett, Vox Media, and Condé Nast. On the social media side, Facebook, Instagram, and Tumblr all confirmed that they’ve blocked Apple from scraping their sites, as did the enduring internet elder Craiglist.

Robots.txt files are becoming an increasingly fascinating place to study the digital politics of AI. Some of these companies — including Vox, Condé Nast, and The Atlantic — have inked content licensing deals with OpenAI; The New York Times, meanwhile, has drawn a clear line in the sand on AI, and is actively suing OpenAI for copyright infringement. Facebook and Instagram are both owned by Meta, one of Apple’s competitors in the AI field, while platforms built on user content like Tumblr and Craigslist are sitting on some very lucrative troves of quality data. Meanwhile, in the background, Apple has already entered a deal with OpenAI to integrate the chatbot ChatGPT into “Apple experiences.”

In short, the AI industry is intensely competitive, particularly regarding access to high-quality, human-made training material. And as the tentative bonds between AI companies and data wells like journalistic bodies or social media sites continue to take shape, where and how bots like Apple’s are allowed to roam offers an interesting glimpse into AI-focused decision-making — on the publisher side, and on behalf of AI companies as well.

Coal Mines

According to Wired, these websites have specifically blocked “Apple-Extended,” a web crawler that, per an Apple blog post, explicitly provides web publishers with the choice to “opt out of their website content being used to train Apple’s foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools.” An Apple spokesperson confirmed to Wired that blocking Applebot-Extended doesn’t ward off the OG Applebot from trawling a website, and instead prevents any scraped data from being used to train Apple’s AI models.

Applebot, in contrast, scrapes data for Apple’s Siri and Spotlight — a distinction that seems to speak to some caution on Apple’s behalf regarding copyright and IP protection in the AI era.

The NYT isn’t the only company or group suing AI makers, and it may well be in Apple’s best interest to avoid scraping any controversial or currently-in-litigation data, especially if it’s already tapped OpenAI to fill in some of its product gaps. Call it the billion-dollar canary in the coal mine.

More on AI and copyright: Amid New York Times Lawsuit, ChatGPT Is Citing Plagiarized Versions of NYT Articles on an Armenian Content Mill

No One Wants Apple To Scrape Their Websites for AI Training

Ouch.

Stop Sign

Coal Mines

Character.AI Is Hosting Pedophile Chatbots That Groom Users Who Say They’re Underage

Elon Musk’s Grok AI Blasts Elon Musk as Huge Spreader of Misinformation

MAGA Fans Seem to Think AI Videos of Barron Trump Singing Patriotic Songs Are Totally Real

Future of AI in Business: Trends, Opportunities, and Challenges – AI Time Journal

Major Sites Are Saying No to Apple’s AI Scraping

Google, Apple, and Discord Let Harmful AI ‘Undress’ Websites Use Their Sign-On Systems

Musk Threatens to Ban Apple Devices Over OpenAI Deal | Daybreak: Europe 06/11/2024

Apple’s Sales in China Are Stalling. What Will It Sacrifice to Turn Things Around?

AIPressRoom Exclusive | Assaf on Revolutionizing Research with GPT Researcher

AIPressRoom Exclusive | Thomas Bradley on Enhancing the Home Cooking Experience with Drizzlelemons

AIPressRoom Exclusive | How Butternut AI is Transforming the Future of Website Creation

AIPressRoom Exclusive | Hermann on Transforming Tech Funding with PitchMastr

OpenAI Raising At $100 Billion Valuation

Tom Hanks Warns That Deepfake Scams Are Stealing His Face

Ouch.

Stop Sign

Coal Mines

Log In

With social network:

Or with username:

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections