How Vertex AI helps Wayfair achieve real-time model serving

The Service Intelligence team here at Wayfair is responsible for maintaining multiple machine learning models that continuously improve Wayfair’s customer service experience. For example, we are providing our support staff with new capabilities such as predicting a customer’s intent or calculating the optimal discount to offer for a damaged item. Historically, most of our models have made batch predictions which were then cached for online serving. In 2023 we moved our models from batch inference to real-time inference, and in doing so we wanted to design an architecture that would allow us to deploy and serve real-time models safely, effectively, and efficiently.

Wayfair already has an in-house solution for real-time model serving, but our team appreciates the simplicity and ease of use offered by Vertex AI prediction endpoints. One feature that is important to us is that the creation and deletion of Vertex AI endpoints can be automated in code, something that is more challenging with our in-house solution.

To facilitate the rapid deployment of new models, we require that all models be registered in MLflow, as this allows us to take advantage of the uniform interface MLflow provides. We follow an infrastructure-as-code paradigm by storing the desired state of all deployed models in a GitHub repository.