Don’t type anything into Gemini, Google’s family of GenAI apps, that’s incriminating — or that you wouldn’t want someone else to see.
That’s the PSA (of sorts) today from Google, which in a new support document outlines the ways in which it collects data from users of its Gemini chatbot apps for the web, Android and iOS.
Google notes that human annotators routinely read, label and process conversations with Gemini — albeit conversations “disconnected” from Google Accounts — to improve the service. (It’s not clear whether these annotators are in-house or outsourced, which might matter when it comes to data security; Google doesn’t say.) These conversations are retained for up to three years, along with “related data” like the languages and devices the user used and their location.
Now, Google affords users some control over which Gemini-relevant data is retained — and how.
Switching off Gemini Apps Activity in Google’s My Activity dashboard (it’s enabled by default) prevents future conversations with Gemini from being saved to a Google Account for review (meaning the three-year window won’t apply). Individual prompts and conversations with Gemini, meanwhile, can be deleted from the Gemini Apps Activity screen.
But Google says that even when Gemini Apps Activity is off, Gemini conversations will be saved to a Google Account for up to 72 hours to “maintain the safety and security of Gemini apps and improve Gemini apps.”
“Please don’t enter confidential information in your conversations or any data you wouldn’t want a reviewer to see or Google to use to improve our products, services, and machine learning technologies,” Google writes.
To be fair, Google’s GenAI data collection and retention policies don’t differ all that much from those of its rivals. OpenAI, for example, saves all chats with ChatGPT for 30 days regardless of whether ChatGPT’s conversation history feature is switched off, excepting in cases where a user’s subscribed to an enterprise-level plan with a custom data retention policy.
But Google’s policy illustrates the challenges inherent in balancing privacy with developing GenAI models that feed on user data to self-improve.
Liberal GenAI data retention policies have landed vendors in hot water with regulators in the recent past.
Last summer, the FTC requested detailed information from OpenAI on how the company vets data used for training its models, including consumer data — and how that data’s protected when accessed by third parties. Overseas, Italy’s data privacy regulator, the Italian Data Protection Authority, said that OpenAI lacked a “legal basis” for the mass collection and storage of personal data to train its GenAI models.
As GenAI tools proliferate, organizations are growing increasingly wary of the privacy risks.
A recent survey from Cisco found that 63% of companies have established limitations on what data can be entered into GenAI tools, while 27% have banned GenAI altogether. The same survey revealed that 45% of employees have entered “problematic” data into GenAI tools, including employee information and non-public files about their employer.
OpenAI, Microsoft, Amazon, Google and others offer GenAI products geared toward enterprises that explicitly don’t retain data for any length of time, whether for model training or any other purpose. Consumers though — as is often the case — get the short end of the stick.