Full series information: https://aka.ms/learnlive-fta3
More info here: https://aka.ms/learnlive-fta3-Ep10
In this session we will show how to effectively load balance Azure OpenAI instances to mitigate throttling challenges (TPM & RPM limitations) using API Management custom policies.
We will also cover load balancing Azure OpenAI instances using a container deployed via Azure Container Apps
———————
Learning objectives
– Discover strategies to enhance the performance and reliability of Azure OpenAI while minimizing throttling due to quota limitations.
———————
Chapters
——–
00:00 – Welcome and introductions
01:29 – Learning objectives
02:50 – Tokens
05:36 – Azure OpenAI Service quotas and limits
11:16 – Token Per Minute (TPM)
17:58 – Requests Per Minute (RPM)
20:43 – Dynamic Quota
24:35 – Best practices
27:30 – Challenges
30:24 – Load balancing multiple AOAI instances
33:03 – Review challenges
36:38 – Load balancing strategies
40:10 – Load balancing AOAI with Azure API Management
42:05 – Demo
1:22:47 – Summary and conclusion
———————
Presenters
Andre Dewes
Senior Customer Engineer
Microsoft
– LinkedIn: https://www.linkedin.com/in/andre-dewes-480b5b62/
Srini Padala
Senior Data Engineer
Microsoft
– LinkedIn: https://www.linkedin.com/in/srinivasa-padala/
Moderators
Chris Ayers
Senior Customer Engineer
Microsoft
– LinkedIn: https://www.linkedin.com/in/chris-l-ayers/
– Twitter: https://twitter.com/Chris_L_Ayers