Google Cloud Gemini, Image 2, and MLOps updates

With access to the widest variety of foundation models from any hyperscale provider, robust infrastructure options, and a deep set of tools for model development and MLOps, Vertex AI is a one-stop platform for not only building generative AI apps and agents, but also deploying and maintaining them. Today, at Google Cloud Next, we’re introducing exciting model updates and platform capabilities that continue to enhance Vertex AI:

Gemini 1.5 Pro is now available in public preview in Vertex AI, bringing the world’s largest context window to developers everywhere. Imagen 2.0, our family of image generation models, can now be used to create short, 4-second live images from text prompts. We’re also making image editing generally available in Imagen 2.0, including inpainting/outpainting and digital watermarking. Additionally, we’re adding CodeGemma to Vertex AI, a new model from our Gemma family of lightweight models.

Because response accuracy is critical for gen AI services, we are expanding our grounding capabilities in Vertex AI, including the ability to directly ground responses with Google Search, now in public preview. Vertex AI users now have access to fresh, high-quality information that significantly improves accuracy of model responses.

To help our customers manage and deploy models in production, we’re expanding our MLOps capabilities for gen AI, including new prompt management and evaluation services for large models. These features make it easier for organizations to get the best performance from gen AI models at scale, and to iterate more quickly from experimentation to production.

Let’s dive deeper into these announcements.

Giving customers the best selection of enterprise-ready models

We’re doubling down on our mission to give customers the best selection of enterprise-ready models. In just the last two months, we’ve added access in Vertex AI to a variety of cutting-edge first-party, third-party, and open models, from Google’s Gemini 1.0 Pro to Gemma, the lightweight family of open models based on the research and technology we used to create Gemini, and Anthropic’s Claude 3 family of models.

Announced in February, Gemini 1.5 Pro is now in public preview, bringing the power of the world’s first 1 million-token context window to customers. This breakthrough allows natively multimodal reasoning over enormous amounts of data specific to a request.

We’re seeing customers create entirely new use cases, including building AI-powered customer service agents and academic tutors, analyzing large collections of complex financial documents, detecting gaps in documentation, and exploring entire codebases or data collections via natural language.

United Wholesale Mortgage is using Gemini 1.5 Pro to enrich the underwriting process and to automate the mortgage application process, for example.

SAP is exploring opportunities to include the model in the SAP generative AI hub, which facilitates relevant, reliable, and responsible business AI and provides instant access to a broad range of large language models.

TBS, one of the main commercial broadcasters in Japan, is using Gemini 1.5 Pro to automate metadata tagging on their large media archives, improving efficiency and creating space for more creative work.

And Replit is testing Gemini 1.5 Pro to generate, explain, and transform code with higher speed, accuracy, and performance.

In addition, we are announcing that Gemini 1.5 Pro on Vertex AI now supports the ability to process audio streams including speech, and even the audio portion of videos. This enables seamless cross-modal analysis that provides insights across text, images, videos, and audio — such as using the model to transcribe, search, analyze, and answer questions across earnings calls or investor meetings.

Imagen delivers advanced generative media capabilities

While the Gemini models are great for advanced reasoning and general-purpose use cases, task-specific gen AI models can help enterprises deliver specialized capabilities. We’re seeing organizations like Shutterstock and Rakuten leverage Imagen 2.0 to generate high-quality, highly accurate images at enterprise scale.

Today’s preview of text-to-live image capabilities makes Imagen even more powerful for enterprise workloads. This allows marketing and creative teams to generate animated images such as GIFs, and more, from a text prompt. Initially, live images will be delivered at 24 frames per second (fps) with a resolution of 360×640 pixels and a duration of 4 seconds, with plans for continuous enhancements.

Given the focused design of this model for enterprise applications, it’s adept at themes such as nature, food imagery, and animals. It can generate a range of camera angles and motions while supporting consistency over the entire sequence. Upholding our commitment to trust between creators and users, Imagen for live image generation is equipped with safety filters and digital watermarks.