RAGs powered by Google Search technology, Part 1

As you can see in the results above, the semantics of queries and answers are vastly different in many cases. Queries often represent the user’s intent (i.e., looking for warm clothing) rather than the answers (i.e., puffer jacket or vest) directly. A production-grade semantic search is not just a similarity search, but must provide smart recommendation to users.

Production-grade semantic search + LLM reasoning

In advanced RAG systems, LLM reasoning is commonly employed to overcome the limitations of simple similarity searches. By combining it with production-grade semantic search, it can greatly enhance the efficiency of an advanced RAG system.

As a basic example of LLM reasoning, you may dynamically build the following personalized prompt:

Given that it’s the beginning of winter, a customer is browsing for clothing on an e-commerce site. Winters are cold in their city. They entered “warm clothing for winter” as a search term on the site. What other search terms might they use to find related and cross-sell items?

Responses from an LLM may include the following queries:

Type-specific: Warm winter jackets, Cozy knitwear, Thermal leggings, Waterproof snow boots
Activity-specific: Ski clothing, Winter running gear, Work-appropriate winter outfits, Cozy homewear
Style-specific: Cashmere sweaters, Puffer vests, Statement scarves, Athleisure-inspired winter looks

By building a RAG system that can conduct searches on Vertex AI Search with these queries, you benefit from both the power of LLM reasoning and product-grade semantic search. The result is a system that can discover a broad array of relevant products that match different requirements and attributes, including type, activity, and style.

Vector search and AI processor serving billions

Another common misconception is that semantic search is a relatively new innovation that has gained popularity with the rise of LLMs. While semantic search may be among the current hot topics, it’s actually the result of years of research and development. Google has been at the forefront of semantic search development for nearly a decade, starting with a strategic decision to invest in developing its family of custom, in-house AI processors back in 2013 — the Tensor Processing Unit (TPU).

TPUs are specifically tailored to provide the underlying power needed to support machine learning and AI workloads, but their genesis is rooted in the goal of supporting the deep learning needed to deliver a production-grade semantic search experience. The first TPU was deployed to Google Search production serving infrastructure in 2015. This substantial investment has helped to reduce costs and latency, enabling us to bring production-grade semantic search experience to billions of users.

Google has spent years investing in and developing a powerful set of search technologies. For instance, Google Search processes semantic search with query and document embeddings with ScaNN, one of the world’s largest and fastest vector search infrastructures. ScaNN powers Google Search and many other Google services, quickly finding highly relevant documents and content to help users get the information they need in seconds. According to ANN benchmarking graphs, ScaNN is one of the industry’s core, state-of-the-art algorithms for recalling queries.

Overall, Google’s suite of breakthrough search technologies, including RankBrain, neural matching, ScaNN, and its family of TPUs, represents some of the most valuable technology assets built over the last decade. Inherited by Vertex AI Search, these same technologies enable the delivery of Google-quality semantic search capabilities with millisecond-level latency and at a reasonable cost, all while letting developers allocate more TPU resources to access the power of AI and large language models as a commercial service.