in

Choosing a self-hosted or managed solution for AI app development


In today’s technology landscape, building or modernizing applications demands a clear understanding of your business goals and use cases. This insight is crucial for leveraging emerging tools effectively, especially generative AI foundation models such as large language models (LLMs).

LLMs offer significant competitive advantages, but implementing them successfully hinges on a thorough grasp of your project requirements. A key decision in this process is choosing between a managed LLM solution like  Vertex AI and a self-hosted option on a platform such as Google Kubernetes Engine (GKE).

In this blog post, we equip developers, operations specialists, or IT decision-makers to answer the critical questions of “why” and “how” to deploy modern apps for LLM inference. We’ll address the balance between ease of use and customization, helping you optimize your LLM deployment strategy. By the end, you’ll understand how to:

  • deploy a Java app on Cloud Run for efficient LLM inference, showcasing the  simplicity and scalability of a serverless architecture.

  • use Google Kubernetes Engine (GKE) as a robust AI infrastructure platform that complements Cloud Run for more complex LLM deployments

Let’s get started!

Why Google Cloud for AI development

But first, what are some of the factors that you need to consider when looking to build, deploy and scale LLM-powered applications? Developing an AI application on Google Cloud can deliver the following benefits:

  • Choice: Decide between managed LLMs or bring your own open-source models to Vertex AI.

  • Flexibility: Deploy on Vertex AI or leverage GKE for a custom infrastructure tailored to your LLM needs.

  • Scalability: Scale your LLM infrastructure as needed to handle increased demand.

  • End-to-end support: Benefit from a comprehensive suite of tools and services that cover the entire LLM lifecycle.

Managed vs. self-hosted models

When weighing  the choices for AI development in Google Cloud with your long-term strategic goals, consider factors such as team expertise, budget constraints and your customization requirements. Let’s compare the two options in brief.

Managed solution 

Pros:

  • Ease of use with simplified deployment and management

  • Automatic scaling and resource optimization

  • Managed updates and security patches by the service provider

  • Tight integration with other Google Cloud services

  • Built-in compliance and security features

Cons:

  • Limited customization in fine-tuning the infrastructure and deployment environment

  • Potential vendor lock-in

  • Higher costs vs. self-hosted, especially at scale

  • Less control over the underlying infrastructure

  • Possible limitations on model selection

Self-hosted on GKE

Pros:

  • Full control over deployment environment

  • Potential for lower costs at scale

  • Freedom to choose and customize any open-source model

  • Greater portability across cloud providers

  • Fine-grained performance and resource optimization

Cons:

  • Significant DevOps expertise for setup, maintenance and scaling

  • Responsibility for updates and security

  • Manual configuration for scaling and load balancing

  • Additional effort for compliance and security

  • Higher initial setup time and complexity

In short, managed solutions like Vertex AI are ideal for teams for quick deployment with minimal operational overhead, while self-hosted solutions on GKE offer full control and potential cost savings for strong technical teams with specific customization needs. Let’s take a couple of examples. 

Build a gen AI app in Java, deploy in Cloud Run

For this blog post, we wrote an application that allows users to retrieve quotes from famous books. The initial functionality was retrieving quotes from a database, however gen AI capabilities offer an expanded feature set, allowing a user to retrieve quotes from a managed or self-hosted large-language model.

The app, including its frontend, are being deployed to Cloud Run, while the models are self-hosted in GKE (leveraging vLLM for model serving) and managed in Vertex AI. The app can also retrieve pre-configured book quotes from a CloudSQL database.


The Apartment Rental Market Is Rigged by Algorithms, a DOJ Lawsuit Alleges

The Apartment Rental Market Is Rigged by Algorithms, a DOJ Lawsuit Alleges

Camanchaca innovates its employee experience with real-time generative agents

Selecting GPUs for LLM serving on GKE