Today we are announcing the open-source GenAI Databases Retrieval App, now available on GitHub. This project is a sample application that demonstrates production-quality practices for using techniques like Retrieval Augmented Generation (RAG) and ReACT to extend your gen AI application with information from Google Cloud databases.
Large language models (LLMs) have become increasingly powerful and accessible in recent years, opening up new possibilities for generative AI applications. LLMs can now generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.
However, LLM-based applications also have some limitations – such as being unable to answer questions about data the model wasn’t trained on (such as live or real-time data), or hallucinating fake or misleading information. These limitations often compound on each other, leading an LLM to provide incorrect answers when asked to answer a question about information it doesn’t know.
The GenAI Database Retrieval App shows how to work around these limitations by extending LLM-based applications with information from a Google Cloud database, such as AlloyDB for PostgreSQL. Cloud databases provide a managed solution for storing and accessing data in a scalable and a reliable way. By connecting an LLM to a cloud database, developers can give their applications access to a wider range of information and reduce the risk of hallucinations.
Reducing hallucinations with RAG
One of the best tools for reducing hallucinations is to use Retrieval Augmented Generation (RAG). RAG retrieves some data or information, augments your prompt to the LLM, and allows it to generate more accurate responses based on the data included in the prompt. This grounds the model’s response, making it less likely to hallucinate. This technique also gives the LLM access to data it didn’t have when it was trained. And unlike fine-tuning, the information retrieved for RAG does not alter the model or otherwise leave the context of the request, making it suitable for use cases where information privacy and security are important.
One important limitation to RAG is prompt size. LLMs have limited ability to process prompts above a certain length (the context window) and longer prompts increase cost and latency. Because of this limitation, it’s important to only retrieve the most relevant information needed by the LLM to generate the correct response. Fortunately, databases are designed for precise queries to help retrieve only the most relevant data, especially when they combine support for both structured queries (using SQL) and semantic similarity using vector embeddings.
Using ReACT to trigger RAG
Another increasingly popular technique for use with LLMs is ReACT Prompting. ReACT (a combination of “Reason” and “Act”) is a technique for asking your LLM to think through verbal reasoning. This technique establishes a framework for the model (acting as an ‘agent’) to “think aloud” using a specific template — things like “thoughts,”, “actions,” and “observations.” Thoughts encourage the agent to think out what it needs to do, while LLMs interpret actions as concrete actions to take, and observations are the results of those actions.
The prompt/response to an LLM using ReACT might start like this: