in

Learn Live: Load Balancing Azure OpenAI instances using APIM and Container



Full series information: https://aka.ms/learnlive-fta3

More info here: https://aka.ms/learnlive-fta3-Ep10

In this session we will show how to effectively load balance Azure OpenAI instances to mitigate throttling challenges (TPM & RPM limitations) using API Management custom policies.

We will also cover load balancing Azure OpenAI instances using a container deployed via Azure Container Apps

———————

Learning objectives

– Discover strategies to enhance the performance and reliability of Azure OpenAI while minimizing throttling due to quota limitations.

———————

Chapters
——–
00:00 – Welcome and introductions
01:29 – Learning objectives
02:50 – Tokens
05:36 – Azure OpenAI Service quotas and limits
11:16 – Token Per Minute (TPM)
17:58 – Requests Per Minute (RPM)
20:43 – Dynamic Quota
24:35 – Best practices
27:30 – Challenges
30:24 – Load balancing multiple AOAI instances
33:03 – Review challenges
36:38 – Load balancing strategies
40:10 – Load balancing AOAI with Azure API Management
42:05 – Demo
1:22:47 – Summary and conclusion

———————

Presenters

Andre Dewes
Senior Customer Engineer
Microsoft
– LinkedIn: https://www.linkedin.com/in/andre-dewes-480b5b62/

Srini Padala
Senior Data Engineer
Microsoft
– LinkedIn: https://www.linkedin.com/in/srinivasa-padala/

Moderators

Chris Ayers
Senior Customer Engineer
Microsoft
– LinkedIn: https://www.linkedin.com/in/chris-l-ayers/
– Twitter: https://twitter.com/Chris_L_Ayers

'Piece by Piece' Director Morgan Neville Will Never Use AI Again

‘Piece by Piece’ Director Morgan Neville Will Never Use AI Again

End to end RAG LLM App Using Llamaindex and OpenAI- Indexing and Querying Multiple pdf's