Trained on an enormous corpus of publicly available data from a broad range of topics, large language models (LLMs) are powerful in many ways but can be improved in other areas.
Due to the size of the training data, it can be resource-intensive to train them frequently. As a result, they may not have the most up-to-date information. Moreover, because they are trained on available data, anything behind a corporate firewall is unknown to them. Ask an LLM who won the latest sports game or what the premium is for your health insurance, and it will likely not know the answer. These limitations may be fine for general knowledge questions, but enterprises are looking to leverage LLMs to create generative AI apps that offer high accuracy, can access real-time information, and support complex conversational experiences.
An increasingly popular approach to this problem is to “ground” LLMs by utilizing a technique called Retrieval Augmented Generation (RAG). This opens up new opportunities for enterprises to build gen AI apps that can leverage fresh or proprietary data by enriching LLM prompts to deliver relevant and accurate information. This is especially crucial for companies and industries that are bound by regulations on sensitive information.
The RAG approach
Let’s take a look at how RAG works, using a customer service chatbot example that can answer a wide range of questions including availability, pricing, and return policies. If you asked a typical LLM a generic question such as “what are some popular toys for kids under 5 years old?” it would likely be able to respond with an answer — but since the LLM has no idea about current inventory in stores, the answer is not going to be relevant for shoppers. To make the customer support chatbot use the latest data and policies for the answers, the RAG approach may prove to be effective.
Composed of a pre-step and four steps, this simplified RAG example flows through the process of how an app can provide grounded answers by utilizing the similarity search feature of a database that supports vector indexing.