in

Vertex AI PaLM and Gemini APIs using Workflows


Introduction

Everyone is excited about generative AI (gen AI) nowadays and rightfully so. You might be generating text with PaLM 2 or Gemini Pro, generating images with ImageGen 2, translating code from language to another with Codey, or describing images and videos with Gemini Pro Vision

No matter how you’re using gen AI, at the end of the day, you’re calling an endpoint either with an SDK or a library or via a REST API. Workflows, my go-to service to orchestrate and automate other services, is more relevant than ever when it comes to gen AI. 

In this post, I show you how to call some of the gen AI models from Workflows and also explain some of the benefits of using Workflows in a gen AI context.

Generating histories of a list of countries

Let’s start with a simple use case. Imagine you want the large language model (LLM) to generate a paragraph or two on histories of a list of countries and combine them into some text. 

One way of doing this is to send the full list of countries to the LLM and ask for the histories for each country. This might work but LLM responses have a size limit and you might run into that limit with many countries. 

Another way is to ask the LLM to generate the history of each country one-by-one, get the result for each country, and combine histories afterwards. This might go around the response size limit but now you have another problem: it’ll take much longer because each country’s history will be generated sequentially by the LLM. 

Workflows offers a third and better alternative. Using Workflows parallel steps, you can ask the LLM to generate the history of each country in parallel. This would avoid the big response size problem and it would also avoid the sequential LLM calls problem, as all the calls to the LLM happen in parallel.

Call Vertex AI PaLM 2 for Text from Workflows in parallel

Let’s now see how to implement this use-case with Workflows. For the model, let’s use Vertex AI’s PaLM 2 for Text (text-bison) for now. 

You should familiarize yourself with the Vertex AI REST API that Workflows will use, PaLM 2 for Text documentation and predict method that you’ll be using to generate text with the text-bison model. 

I’ll save you some time and show you the full workflow (country-histories.yaml) here:

https://storage.googleapis.com/gweb-cloudblog-publish/images/gaming.max-1800x1800.png

The case for running ML in the cloud for live service games

https://storage.googleapis.com/gweb-cloudblog-publish/images/Blog_Gemma_1b.max-2500x2500.jpg

Gemma model available in Vertex AI and via GKE