Choosing a self-hosted or managed solution for AI app development

In today’s technology landscape, building or modernizing applications demands a clear understanding of your business goals and use cases. This insight is crucial for leveraging emerging tools effectively, especially generative AI foundation models such as large language models (LLMs).

LLMs offer significant competitive advantages, but implementing them successfully hinges on a thorough grasp of your project requirements. A key decision in this process is choosing between a managed LLM solution like Vertex AI and a self-hosted option on a platform such as Google Kubernetes Engine (GKE).

In this blog post, we equip developers, operations specialists, or IT decision-makers to answer the critical questions of “why” and “how” to deploy modern apps for LLM inference. We’ll address the balance between ease of use and customization, helping you optimize your LLM deployment strategy. By the end, you’ll understand how to:

deploy a Java app on Cloud Run for efficient LLM inference, showcasing the simplicity and scalability of a serverless architecture.
use Google Kubernetes Engine (GKE) as a robust AI infrastructure platform that complements Cloud Run for more complex LLM deployments

Let’s get started!

Why Google Cloud for AI development

But first, what are some of the factors that you need to consider when looking to build, deploy and scale LLM-powered applications? Developing an AI application on Google Cloud can deliver the following benefits:

Choice: Decide between managed LLMs or bring your own open-source models to Vertex AI.
Flexibility: Deploy on Vertex AI or leverage GKE for a custom infrastructure tailored to your LLM needs.
Scalability: Scale your LLM infrastructure as needed to handle increased demand.
End-to-end support: Benefit from a comprehensive suite of tools and services that cover the entire LLM lifecycle.

Managed vs. self-hosted models

When weighing the choices for AI development in Google Cloud with your long-term strategic goals, consider factors such as team expertise, budget constraints and your customization requirements. Let’s compare the two options in brief.

Managed solution

Pros:

Ease of use with simplified deployment and management
Automatic scaling and resource optimization
Managed updates and security patches by the service provider
Tight integration with other Google Cloud services
Built-in compliance and security features

Cons:

Limited customization in fine-tuning the infrastructure and deployment environment
Potential vendor lock-in
Higher costs vs. self-hosted, especially at scale
Less control over the underlying infrastructure
Possible limitations on model selection

Self-hosted on GKE

Pros:

Full control over deployment environment
Potential for lower costs at scale
Freedom to choose and customize any open-source model
Greater portability across cloud providers
Fine-grained performance and resource optimization

Cons:

Significant DevOps expertise for setup, maintenance and scaling
Responsibility for updates and security
Manual configuration for scaling and load balancing
Additional effort for compliance and security
Higher initial setup time and complexity

In short, managed solutions like Vertex AI are ideal for teams for quick deployment with minimal operational overhead, while self-hosted solutions on GKE offer full control and potential cost savings for strong technical teams with specific customization needs. Let’s take a couple of examples.

Build a gen AI app in Java, deploy in Cloud Run

For this blog post, we wrote an application that allows users to retrieve quotes from famous books. The initial functionality was retrieving quotes from a database, however gen AI capabilities offer an expanded feature set, allowing a user to retrieve quotes from a managed or self-hosted large-language model.

The app, including its frontend, are being deployed to Cloud Run, while the models are self-hosted in GKE (leveraging vLLM for model serving) and managed in Vertex AI. The app can also retrieve pre-configured book quotes from a CloudSQL database.