Azure API Management Circuit Breaker and Load Balancing by info.odysseyx@gmail.com September 10, 2024 written by info.odysseyx@gmail.com September 10, 2024 0 comment 1 views 1 script In this blog post, we will show you how to: Azure API Management Improves resiliency and capacity of OpenAI services.Azure API Management is a tool that helps you create, publish, manage, and secure APIs. It provides features like routing, caching, throttling, authentication, and transformation.With Azure API Management, you can: To distribute requests across multiple instances of the Azure OpenAI service, use: Priority-based load balancing technologyIt includes groups with weights distributed within the group. This helps distribute the load across different resources and regions, improving the availability and performance of the service. Implement circuit breaker Patterns prevent backend services from being overloaded by excessive requests. This helps prevent cascading failures and improves the stability and resilience of your services. You can configure circuit breaker properties on backend resources and define rules for tripping the circuit breaker, such as the number or percentage of failure conditions within a given time frame, or a range of status codes that indicate failures. Diagram 1: API management through circuit breaker implementation. >Reference: Backends in lower priority groups are only used if the circuit breaker rule trips and all backends in higher priority groups become unavailable. Diagram 2: API Management Load Balancer with Circuit Breaker in Action. In the following sections, we will walk through how to deploy a circuit breaker using API Management and Azure Open AI services. The same solution can be used with the default OpenAI service. The GitHub repository for this article can be found here: github.com/eladtpro/api-management-ai-policies Prerequisites Step 1: Provisioning the Azure API Management Backend Pool (Lee Dong-guk) Biceps CLI Install or upgrade Bicep CLI. # az bicep install az bicep upgrade Deploying Backend Pools Using Bicep Sign in to Azure. az login >Important: Update the name of the backend service. Distribution. Biceps Please edit the file before running the following commands. Create a deployment to a resource group from a remote template file and update the parameters in the file. az deployment group create --resource-group --template-file --name apim-deployment >Reference: You can learn more about the Biceps Backend Resources. Microsoft.ApiManagement Service/Backend. Also about Circuit Breaker Rules NOTE: When you run the above command, you may see the following warning: >Reference: When you run the above command, you may see the following warning: /path/to/deploy.bicep(102,3) : warning BCP035: The specified “object” declaration is missing the following required properties: “protocol”, “url”. If this is a documentation inaccuracy, please report it to the Bicep Team. [https://aka.ms/bicep-type-issues] calculation: { "id": "", "location": null, "name": "apim-deployment", "properties": { "correlationId": "754b1f5b-323f-4d4d-99e0-7303d8f64695", . . . "provisioningState": "Succeeded", "templateHash": "8062591490292975426", "timestamp": "2024-09-07T06:54:37.490815+00:00", }, "resourceGroup": "azure-apim", "type": "Microsoft.Resources/deployments" } >Reference: To view failed tasks, filter for tasks with a status of ‘Failed’. az deploy task group list –resource-group –name apim deploy –query “[?properties.provisioningState==’Failed’]” Next is Distribution. Biceps Configure backend circuit breakers and load balancers: resource apiManagementService 'Microsoft.ApiManagement/service@2023-09-01-preview' existing = { name: apimName } resource backends 'Microsoft.ApiManagement/service/backends@2023-09-01-preview' = [for (name, i) in backendNames: { name: name parent: apiManagementService properties: { url: 'https://${name}.openai.azure.com/openai' protocol: 'http' description: 'Backend for ${name}' type: 'Single' circuitBreaker: { rules: [ { acceptRetryAfter: true failureCondition: { count: 1 interval: 'PT10S' statusCodeRanges: [ { min: 429 max: 429 } { min: 500 max: 503 } ] } name: '${name}BreakerRule' tripDuration: 'PT10S' } ] } } }] Here’s the part about the backend pool: resource aoailbpool 'Microsoft.ApiManagement/service/backends@2023-09-01-preview' = { name: 'openaiopool' parent: apiManagementService properties: { description: 'Load balance openai instances' type: 'Pool' pool: { services: [ { id: '/backends/${backendNames[0]}' priority: 1 weight: 1 } { id: '/backends/${backendNames[1]}' priority: 2 weight: 1 } { id: '/backends/${backendNames[2]}' priority: 2 weight: 1 } ] } } } Step 2: Create an API Management API >Reference: The following policy can be used in existing or new APIs. The important part is to set the backend service to the backend pool created in the previous step. Option I: Add to existing API Just add the following: Setting up backend services and Try again Policy to enable load balancer using Circuit Breaker module: Option II: Create a new API Add new API Go to your API Management instance. Click on API. Click Add API. Select the ‘HTTP’ API. Name it and set the URL suffix to ‘openai’. >Reference: The URL suffix is the path that will be added to the API Management URL. For example, if the API Management URL is ‘https://apim-ai-features.azure-api.net‘, the URL suffix is ’openai’ and the full URL is ‘.https://apim-ai-features.azure-api.net/OpenAI‘. Added “Capture All” action Click on the API you just created. Click the ‘Design’ tab. Click Add Task. Set the method to ‘POST’. Set the URL template to ‘/{*path}’. Set a name. Click ‘Save’. >Reference: The ‘include all’ task is designed to match all OpenAI requests, and this is achieved by setting the URL template to ‘/{*path}’. For example:The base URL is: https://my-apim.azure-api.net/openaiThe Postfix URL is: /deployments/gpt-4o/chat/completions?api-version=2024-06-01The full URL is: https://my-apim.azure-api.net/openai/deployments/gpt-4o/chat/completions?api-version=2024-06-01 Add a load balancer policy Select the task you just created. Click the ‘Design’ tab. Click the ‘Inbound Processing’ policy button ‘>’. Replaces existing policy with the following: This policy (shown below). Click ‘Save’. This policy is set to distribute requests across the backend pool and retry requests if the backend service is unavailable. @((string)context.Variables["traceparentHeader"]) @(context.Request.Url.Host) @(context.LastError.Reason) >Important: The main policies involved in load balancing, which distributes requests across the backend pools created in the previous step, are: Setting up backend services: This policy sets the backend service to the backend pool created in the previous step. Try again: This policy retries requests when the backend service is unavailable. If a circuit breaker trips, the request is immediately retried on the next available backend service. >Important: value of count It must be equal to the number of backend services in the backend pool. Step 3: Configure monitoring Go to your API Management instance. Click on ‘API’. Click on the API you just created. Click ‘Settings’. Scroll down and find ‘Diagnostic Log’. Check the ‘Override Global’ checkbox. Add ‘backend-host’ and ‘Retry-After’ headers to the log. Click ‘Save’. >Reference: ‘Backend host‘ The header is the host of the backend service to which the request was actually sent. ‘Try again – after‘ header overrides TripDuration, which is the number of seconds the client should wait before retrying a request sent by the Open AI service. Backend circuit breaker environment. >Reference: You can also add request and response bodies to your HTTP requests in the ‘Advanced Options’ section. Step 4: Prepare the OpenAI service Model deployment >Important: To seamlessly use the load balancer configuration, the same model must be deployed across all OpenAI services. Models must be deployed with the same name and version across all services.. Go to OpenAI services. Select the ‘Deploy Model’ blade. Click the ‘Manage Distribution’ button. Build a model. Click ‘Create’. Repeat the above steps for all OpenAI services, making sure the model is deployed with the same name and version across all services. Set up managed identities >Reference: The API Management instance must have the system/user ‘Managed Identity’ set to the OpenAI service. Go to OpenAI services. Select the ‘Access Control (IAM)’ blade. Click ‘Add Role Assignment’. Select the ‘Cognitive Services OpenAI User’ role. Select your API Management Management ID. Click ‘Review + Assign’. Repeat the above steps for all OpenAI services. Step 5: Test the Load Balancer >Reference: To call the API Management API, you must set the ‘api-key’ header to the subscription key of your API Management instance. We will run Chat Completion API In the OpenAI service via the API Management API. The API Management API distributes requests to the backend pool created in the previous step. Running a Python load test script Run a test Python script. main.py Test your load balancer and circuit breaker configuration. python main.py --apim-name apim-ai-features --subscription-key APIM_SUBSCRIPTION_KEY --request-max-tokens 200 --workers 5 --total-requests 1000 --request-limit 30 explanation python main.py: Runs the main.py script. –apim-name apim apim-ai-features: The name of the API management. –key APIM_SUBSCRIPTION_KEY: Pass the API subscription key. –request-max-tokens 200: Maximum number of tokens to generate per request upon completion (optional, defaults to 200). –workers 5: Number of parallel requests to send (optional, default is 20). –total_requests 1000: Set total requests to 1000 (optional as default is 1000). –request-limit 30: Number of requests to send per second (optional, default is 20) >Reference: You can adjust the value –Batch size and –Total number of requests As required. If omitted, the script uses the default values specified in the argparse configuration. Test Results ApiManagementGatewayLogs | where OperationId == "chat-completion" | summarize CallCount = count() by BackendId, BackendUrl | project BackendId, BackendUrl, CallCount | order by CallCount desc | render barchart conclusion In conclusion, leveraging Azure API Management significantly improves the resiliency and capacity of your Azure OpenAI service by distributing requests across multiple instances and implementing a load balancer using the retry/circuit breaker pattern.These strategies improve service availability, performance, and reliability. For more information, see: Backend of API Management. References Azure API Management Azure API Management TerminologySee API Management PolicyAPI Management Policy ExpressionsBackend of API ManagementError handling in API management policies Azure Tech Community AI Hub Gateway Landing Zone Accelerator Source link Share 0 FacebookTwitterPinterestEmail info.odysseyx@gmail.com previous post Microsoft at Open Source Summit Europe 2024 next post “Exciting Data Science Engineer Job Openings in Dharwad by Freshersworld Client” You may also like Get to know Microsoft 365 Copilot in Microsoft OneDrive October 4, 2024 Connecting to Azure Cache for Redis with Entra ID in Azure Government October 4, 2024 Modern Charts in Microsoft Access is GA! October 4, 2024 Cowrie honeypot and its Integration with Microsoft Sentinel. October 4, 2024 Improved Accessibility ribbon in PowerPoint for Windows and Mac October 4, 2024 Introducing the Use Cases Mapper workbook October 4, 2024 Leave a Comment Cancel Reply Save my name, email, and website in this browser for the next time I comment.