RAG and grounding on Vertex AI

Introduced in April, Vertex AI Agent Builder gathers all the surfaces and tools developers need to build enterprise-ready generative AI experiences, apps, and agents.

Some of the most powerful tools are the components for retrieval augmented generation (RAG), and the unique ability to ground Gemini outputs with Google Search.

Today, I’m pleased to share that we are expanding these grounding capabilities to help our customers build more capable agents and apps:

Grounding with Google Search, now generally available, will soon offer dynamic retrieval, a new capability to help customers balance quality with cost efficiency by intelligently selecting when to use Google Search results and when to use the model’s training data.
Grounding with high-fidelity mode, announced in experimental preview today, is a new feature of our grounded generation API that will further reduce hallucinations.
Grounding with third-party datasets is coming in Q3 this year. These capabilities will help customers build AI agents and applications that offer more accurate and helpful responses. We are working with specialized providers like Moody’s, MSCI, Thomson Reuters and Zoominfo to enable access to their datasets.
We’re also expanding Vector Search, the engine powering embeddings-based RAG, to offer hybrid search, now in Public Preview.

Grounding models in world knowledge with Google Search

When customers select Grounding with Google Search for their Gemini model, Gemini will use Google Search, and generate an output that is grounded with the relevant search results. It is simple to use, and it makes the world’s knowledge available to Gemini.

These capabilities address some of the most significant hurdles limiting the adoption of generative AI in the enterprise: the fact that models do not know information outside their training data, and the tendency of foundation models to “hallucinate,” or generate convincing yet factually inaccurate information. Retrieval Augmented Generation (RAG), a technique developed to mitigate these challenges, first “retrieves” facts about a question, then provides those facts to the model before it “generates” an answer – this is what we mean by grounding. Getting relevant facts quickly to augment a model’s knowledge is ultimately a search problem.

Leading companies like Quora and Palo Alto Networks are using Google Cloud’s grounding capabilities to power generative AI experiences.

“Grounding with Google Search translates into more accurate, up-to-date, and trustworthy answers,” said Spencer Chan, Product Lead at Quora, which offers Grounding with Google Search on its Poe platform. “We’ve been delighted with the positive feedback so far, as users are now able to interact with Gemini bots with even greater confidence.”

“We sought to optimize both the customer experience and maximize the efficiency of our support agents. In partnership with Google Cloud, this was achieved by integrating generative AI into Palo Alto Networks solutions which enhanced the ability to understand and respond to complex security inquiries,” said Alok Tongaonkar, Senior Director of Data Science at Palo Alto Networks. “This not only empowers customers with self-service troubleshooting, but also alleviates pressure on our support teams. By harnessing the grounding capabilities of Vertex AI Agent Builder alongside the power of Gemini models, we constructed our agents to deliver accurate and timely answers, all grounded in trustworthy data sources. The continuous advancements in Agent Builder’s grounding functionalities promise further refinements in information retrieval and overall efficacy.”

Grounding with Google Search entails additional processing costs, but because Gemini’s training knowledge is very capable, grounding may not be needed for every query. To help customers balance the need for response quality with cost efficiency, Grounding with Google Search will soon offer dynamic retrieval, a novel capability that lets Gemini dynamically choose whether to ground user inquiries in Google Search or use the intrinsic knowledge of the models, which is more cost-efficient.

The model does this based on its ability to understand which prompts are likely to be related to never-changing, slowly-changing, or fast-changing facts. Consider scenarios like inquiring about the latest movies, where Grounding with Google Search can provide the most up-to-date information. Conversely, for general questions, like “Tell me the capital of France,”, Gemini can instantly draw from its extensive knowledge, providing responses without the need for external grounding.

Grounding models in enterprise context At Google Cloud, we firmly believe that the key to unlocking the full potential of generative AI lies in grounding it in “enterprise truth.” This involves connecting AI models to a wealth of reliable information sources, including web data, company documents, operational and analytical databases, enterprise applications, and other relevant sources.

Private data is not on the internet and Google Search wouldn’t be able to find it, so in addition to Grounding with Google Search, we offer multiple ways to apply Google-quality search to your enterprise data. Vertex AI Search works out-of-the-box for most enterprise use cases. And for customers looking to build custom RAG workflows, create semantic search engines, or simply upgrade existing search capabilities, we offer our search component APIs for RAG. This suite of APIs, now generally available, provides high-quality implementations for document parsing, embedding generation, semantic ranking, and grounded answer generation, as well as a fact checking service called check-grounding.

“Deloitte’s mission is to help our clients identify and realize tangible outcomes that can create differentiated business value. Using Vertex AI Agent Builder’s grounding capabilities, we have built both internal applications to accelerate our own knowledge base as well as external applications for industry clients, such as assisting the application process for an insurance provider-to-care provider search for a healthcare client,” said Gopal Srinivasan, Global Generative AI Leader for Alphabet Google alliance, Deloitte Consulting LLP. “Agent Builder offered us an out-of-the-box RAG system to build trustworthy and relevant generative applications at speed. The new search component APIs in Agent Builder can provide us with even more flexibility and control when creating applications, thereby streamlining the specialized needs of our internal and industry client teams.”

Grounding with high-fidelity mode

The answers generated with RAG-based agents and apps typically merge the provided context from enterprise data with the model’s internal training. While this may be helpful for many use cases, like a travel assistant, industries like financial services, healthcare, and insurance often require the generated response to be sourced from only the provided context. Grounding with high-fidelity mode, announced in experimental preview today, is a new feature of the Grounded Generation API that is purpose-built to support such grounding use cases.

The feature uses a Gemini 1.5 Flash model that has been fine-tuned to focus on customer-provided context to generate answers. The service supports key enterprise use cases such as summarization across multiple documents or data extraction against a corpus of financial data. This results in higher levels of factuality, and a reduction in hallucinations. When high-fidelity mode is enabled, sentences in the answer have sources attached to them, providing support for the stated claims. Grounding confidence scores are also provided.