-
The Visual Scout API creates a vector match request for a given item
-
The request first calls Vertex AI Feature Store to retrieve an item’s latest image embedding vector
-
Using the item embedding, Visual Scout then searches a Vertex AI Vector Search index for the most similar embedding vectors and returns the associated item IDs
-
For each visually similar item, product-related metadata (e.g., inventory availability) is used to filter for only items available at the user’s selected store location
-
The available items and their metadata are sent back to the Visual Scout API for serving on lowes.com
-
A daily trigger launches an update job to compute image embeddings for any new items
-
Once triggered, Dataproc processes any new item images, converting them to embeddings with the registered computer vision model
-
Streaming updates add new image embeddings to the Vertex AI Vector Search serving index
-
New image embedding vectors are ingested to Vertex AI Feature Store online serving nodes; vectors indexed by item ID and the timestamp of the ingestion
Low latency serving with Vertex AI
Each time items are replaced in the recommendation panel, Visual Scout relies on two Vertex AI services to do this in real time: Vector Search and Feature Store.
Vertex AI Feature Store is used to store the latest embedding representation of an item. This includes net new additions to the product catalog, as well as any new images that become available for an item. In the latter case, the previous embedding representation for an item is moved to offline storage, and the latest embedding is kept in online storage. At serving time, the Feature Store look-up retrieves the query item’s most up-to-date embedding representation from the online serving nodes, and passes this to the downstream retrieval task.
Next, Visual Scout must find, within a database of diverse items, the most similar products to the query item based on their embedding vectors. This kind of nearest neighbor search requires calculating the similarity between the query and candidate item vectors, and at this scale, this calculation can quickly become a computational bottleneck for retrieval, especially if performing an exhaustive (i.e., brute-force) search. To address this bottleneck, Vertex AI Vector Search implements an approximate search, allowing the vector retrieval to meet our low latency serving requirements.
Both of these services help Visual Scout process a high volume of requests while maintaining low-latency responses. The 99th percentile response times of approximately 180 milliseconds align with our performance expectations, and ensures a smooth and responsive user experience.
Why is Vertex AI Vector Search so fast?
Vertex AI Vector Search is a managed service providing efficient vector similarity search and retrieval from a billion-scale vector database. As these capabilities are critical to many projects across Google, this service builds upon years of internal research and development. It’s worth mentioning that several foundational algorithms and techniques are also publicly available with ScaNN, an open-source vector search library from Google Research. The purpose of ScaNN is to establish reproducible and credible benchmarking that ultimately advances research in the field. The purpose of Vertex AI Vector Search is to provide a scalable vector search solution for production-ready applications.
ScaNN primer
ScaNN provides an implementation of Google Research’s 2020 ICML paper, “Accelerating Large-Scale Inference with Anisotropic Vector Quantization”, which applies a novel compression algorithm to achieve state-of-the-art performance on nearest neighbor search benchmarks. ScaNN’s high-level workflow for vector similarity search can be described in four phases:
-
Partitioning: To reduce the search space, ScaNN performs hierarchical clustering to partition the index and represent its contents as a search tree, where each partition is represented by the partition’s centroids. This is usually (but not always) a k-means tree
-
Vector quantization: using the asymmetric hashing (AH) algorithm, this step compresses each vector into a sequence of 4-bit codes, where ultimately a codebook is learned. Its “asymmetric” because only database vectors are compressed, not the query vectors
-
Approximate scoring: at query time, AH creates partial-dot-product lookup-tables; uses tables to estimate
dot products -
Rescoring: given top-k items from approximate scoring, re-compute distances with greater precision (e.g., lower distortion or even raw datapoint)
Building an index optimized for serving
To build an index optimized for low-latency serving, Vertex AI Vector Search uses ScaNN’s tree-AH algorithm. The term “tree-AH” refers to a tree-X hybrid model consisting of (1) a partitioning “tree” and (2) a leaf searcher (in this case “AH” or asymmetric hashing). Essentially, it combines two complementary algorithms:
-
Tree-X, a k-means tree; a hierarchical clustering algorithm that reduces the search space by partitioning the index into a search tree, where each partition in the tree is represented by the centroid of the data points belonging to that partition.
-
Asymmetric Hashing (AH), a highly optimized approximate distance computation routine used to score the similarity between a query vector and the partition centroids at each level of the search tree