How Wayfair improves its feature engineering with Vertex AI

Here at Wayfair, our data scientists rely on multiple sources of data to obtain features for model training. An ad hoc approach to feature engineering led to multiple versions of feature definitions, making it challenging to share features between different models. Most of the features were stored and used with minimal oversight on freshness, schema, and data guarantees. As a result, our data scientists frequently encountered discrepancies in model performance between development and production environments, making the feedback loop for retraining cumbersome. The whole process of curating new stable features and developing new model versions often took several months.

To address these issues, the Service Intelligence team at Wayfair decided to create a centralized feature engineering system. Our goal was to standardize feature definitions, automate ingestion processes, and simplify maintenance. We worked with Google to adopt different Vertex AI offerings, especially Vertex AI Feature Store and Vertex AI Pipelines. The former provides a centralized repository for organizing, storing, and serving ML features, and the latter helps to automate, monitor, and manage ML workflows. These offerings became the two main components of our feature engineering architecture.

On the data side, we developed workflows to streamline the flow of raw features data into BigQuery tables. We created a centralized repository of feature definitions that specify how each feature should be pulled, processed, and stored in the feature store. Using the Vertex AI Feature Store’s API, we automatically create features based on the given definitions. We use GitHub’s PR approval process to enforce governance and track changes.