Networking capabilities optimize traffic for generative AI apps

Many enterprises are exploring ways to incorporate the benefits of generative AI (gen AI) into their business. The 2023 Gartner® report We Shape AI, AI Shapes Us: 2023 IT Symposium/Xpo Keynote Insights, 16 October 2023 states that “most organizations are using, or plan to use, everyday AI to boost productivity. In the 2024 Gartner CIO and Technology Executive Survey, 80% of respondents said they are planning adoption of generative AI within three years.“¹

Enterprises looking to deploy large language models (LLMs) face a unique set of networking challenges compared with serving traditional web applications. That’s because generative AI applications exhibit significantly different behavior versus most other web applications.

For example, web applications usually exhibit predictable traffic patterns, with requests and responses being processed in relatively small amounts of time, typically measured in milliseconds. In contrast, due to their multimodal nature, gen AI inference applications exhibit varying request/response times, which can present some unique challenges. At the same time, an LLM query can often consume 100% of a GPU’s or TPU’s compute time vs. more typical request processing that runs in parallel. Due to the computational cost, inference latencies range from seconds to minutes.

‘Typical’ web traffic	Gen AI traffic

As a result, traditional round-robin or utilization-based traffic management techniques are not generally suited for gen AI applications. To achieve the best end-user experience for gen AI applications, and to gain efficient use of limited and costly GPU and TPU resources, we recently announced several new networking capabilities that optimize traffic for AI applications.

Many of these innovations are built into Vertex AI. Now, they are available in Cloud Networking so you can use them regardless of which LLM platform you choose.

Let’s take a deeper look.

1. Accelerated AI training and inference with Cross-Cloud Network

According to an IDC report, 66% of enterprises list generative AI and AI/ML workloads as one of their top use cases for using multi cloud networking.² This is because the data required for model training / fine-tuning, retrieval-augmented generation (RAG), or grounding, resides in many disparate environments. This data needs to be remotely accessed or copied so it is accessible to LLM models.

Last year, we introduced Cross-Cloud Network, which provides service-centric, any-to-any connectivity built on Google’s global network, making it easier to build and assemble distributed applications across clouds.

Cross-Cloud Network includes products that provide reliable, secure and SLA-backed cross-cloud connectivity for high-speed data transfer between clouds, helping to move the vast volumes of data required for gen AI model training. Products in the solution include Cross-Cloud Interconnect, which offers a managed interconnect with 10 Gbps or 100 Gbps bandwidth, backed with a 99.99% SLA and end-to-end encryption.

Besides secure and reliable data transfer for AI training, Cross-Cloud Network also lets customers run AI model inferencing applications across hybrid environments. For example, you may access models running in Google Cloud from application services running in another Cloud environment.