Build User Authentication into your GenAI App Accessing Database

Generative AI agents introduce immense potential to transform enterprise workspaces. Enterprises from almost every industry are exploring the possibilities of generative AI, adopting AI agents for purposes ranging from internal productivity to customer-facing support. However, while these AI agents can efficiently interact with data already in your databases to provide summaries, answer complex questions, and generate insightful content, concerns persist around safeguarding sensitive user data when integrating this technology.

The data privacy dilemma

Before discussing data privacy, we should define an important concept: RAG (Retrieval-Augmented Generation) is a machine-learning technique to optimize the accuracy and reliability of generative AI models. With RAG, an application retrieves information from a source (such as a database), augments it into the prompt of the LLM, and uses it to generate more grounded and accurate responses.

Many developers have adopted RAG to enable their applications to use the often-proprietary data in their databases. For example, an application may use a RAG agent to access databases containing customer information, proprietary research, or confidential legal documents to correctly answer natural language questions with company context.

A RAG use- case: Cymbal Air

Like any data access paradigm, without a careful approach there is risk. A hypothetical airline we’ll call Cymbal Air is developing an AI assistant that uses RAG to handle these tasks:

Book flight ticket for the user (write to the database)
List user’s booked tickets (read from user-privacy database)
Get flight information for the user, including schedule, location, and seat information (read from a proprietary database)

This assistant needs access to a wide range of operational and user data, and even potentially to write information to the database. However, giving the AI unrestricted access to the database could lead to accidental leaks of sensitive information about a different user. How do we ensure data safety while letting the AI assistant retrieve information from the database?

A user-centric security design pattern

One way of tackling this problem is by putting limits on what the agent can access. Rather than give the foundation model unbounded access, we can define specific tool functions that the agent uses to access database information securely and predictably. The key steps are:

Create a tool function that runs a pre-written SQL query
Attach a user authentication header to the tool function
When a user asks for database information, the agent parses parameters from user input, and feed the parameters to the tool function

In essence, we designed the authentication system based on the usage of user-authenticated tool functions that is opaque to the foundation model agent.