RAG with BigQuery and Langchain in Cloud

Unfortunately, this response doesn’t answer our question. This is no surprise, because the 2024 Cymbal Starlight is a fictional vehicle and its owners manual wasn’t included in the LLM’s training data. To solve this constraint, we can use retrieval-augmented generation, which augments the LLM with proprietary or first-party data, like the 2024 Cymbal Starlight owner’s manual!

Enter retrieval augmented generation (RAG)

LLMs are powerful tools, but can be limited by their internal knowledge. RAG addresses this by incorporating data from external sources, allowing LLMs to access relevant information in real-time and without having to fine-tune or retrain a model. A simple RAG pipeline has two main components:

Data preprocessing:
- Input data like documents are split into smaller chunks, converted into vector embeddings, and sent to a vector store for later retrieval
Query and retrieval:
- A user asks a question in natural language. This is turned into an embedding relevant context is retrieved from a vector search
- The context is provided to an LLM to augment its knowledge
- The LLM generates a response that weaves together retrieved chunks with its pretrained knowledge and summarization capabilities

LangChain

LangChain is an open source orchestration framework to work with LLMs, enabling developers to quickly build generative AI applications on their data. Google Cloud contributed a new LangChain integration with BigQuery that can make it simple to pre-process your data, generate and store embeddings, and run vector search, all using BigQuery.

In this demo, we’ll handle both the pre-processing and runtime steps with LangChain. Let’s take a look!

Building a RAG pipeline with BigQuery and LangChain

This blog post highlights a few of the major steps to building a simple RAG pipeline using BigQuery and LangChain. To view other steps, get more in-depth, or To follow along and view additional steps, you can make a copy of the notebook, Augment Q&A Generation using LangChain and BigQuery Vector Search, which allows you to run the following example in Colab using your own Google Cloud environment.

Data preprocessing

We begin by reading our document, the 2024 Cymbal Starlight Owner’s Manual, into memory using a LangChain Document Loader, called PyPDFLoader, which loads objects from Google Cloud Storage.

Once loaded, we split the document into smaller chunks. Chunking makes RAG more efficient, as chunks allow for more targeted retrieval of relevant information and reduced computational load. This improves the accuracy and contextuality of generated responses and improves response time. We use LangChain’s RecursiveTextSplitter, which splits text based on rules we define.