Azure API Management Circuit Breaker and Load Balancing

script

In this blog post, we will show you how to: Azure API Management Improves resiliency and capacity of OpenAI services.
Azure API Management is a tool that helps you create, publish, manage, and secure APIs. It provides features like routing, caching, throttling, authentication, and transformation.
With Azure API Management, you can:

To distribute requests across multiple instances of the Azure OpenAI service, use: Priority-based load balancing technologyIt includes groups with weights distributed within the group. This helps distribute the load across different resources and regions, improving the availability and performance of the service.
Implement circuit breaker Patterns prevent backend services from being overloaded by excessive requests. This helps prevent cascading failures and improves the stability and resilience of your services. You can configure circuit breaker properties on backend resources and define rules for tripping the circuit breaker, such as the number or percentage of failure conditions within a given time frame, or a range of status codes that indicate failures.

Diagram 1: API management through circuit breaker implementation.

>Reference: Backends in lower priority groups are only used if the circuit breaker rule trips and all backends in higher priority groups become unavailable.

Diagram-circuit breaker.png

Diagram 2: API Management Load Balancer with Circuit Breaker in Action.

In the following sections, we will walk through how to deploy a circuit breaker using API Management and Azure Open AI services. The same solution can be used with the default OpenAI service.

The GitHub repository for this article can be found here: github.com/eladtpro/api-management-ai-policies

Prerequisites

Step 1: Provisioning the Azure API Management Backend Pool (Lee Dong-guk)

Biceps CLI

Install or upgrade Bicep CLI.

# az bicep install
az bicep upgrade

Deploying Backend Pools Using Bicep

az login

>Important: Update the name of the backend service. Distribution. Biceps Please edit the file before running the following commands.

Create a deployment to a resource group from a remote template file and update the parameters in the file.

az deployment group create --resource-group  --template-file  --name apim-deployment

>Reference: You can learn more about the Biceps Backend Resources. Microsoft.ApiManagement Service/Backend. Also about Circuit Breaker Rules NOTE: When you run the above command, you may see the following warning:

>Reference: When you run the above command, you may see the following warning:

/path/to/deploy.bicep(102,3) : warning BCP035: The specified “object” declaration is missing the following required properties: “protocol”, “url”. If this is a documentation inaccuracy, please report it to the Bicep Team. [https://aka.ms/bicep-type-issues]

calculation:

{
  "id": "",
  "location": null,
  "name": "apim-deployment",
  "properties": {
    "correlationId": "754b1f5b-323f-4d4d-99e0-7303d8f64695",
    .
    .
    .
    "provisioningState": "Succeeded",
    "templateHash": "8062591490292975426",
    "timestamp": "2024-09-07T06:54:37.490815+00:00",
  },
  "resourceGroup": "azure-apim",
  "type": "Microsoft.Resources/deployments"
}

>Reference: To view failed tasks, filter for tasks with a status of ‘Failed’.

az deploy task group list –resource-group –name apim deploy –query “[?properties.provisioningState==’Failed’]”

Next is Distribution. Biceps Configure backend circuit breakers and load balancers:

resource apiManagementService 'Microsoft.ApiManagement/service@2023-09-01-preview' existing = {
  name: apimName
}

resource backends 'Microsoft.ApiManagement/service/backends@2023-09-01-preview' = [for (name, i) in backendNames: {
  name: name
  parent: apiManagementService
  properties: {
    url: 'https://${name}.openai.azure.com/openai'
    protocol: 'http'
    description: 'Backend for ${name}'
    type: 'Single'
    circuitBreaker: {
      rules: [
        {
          acceptRetryAfter: true
          failureCondition: {
            count: 1
            interval: 'PT10S'
            statusCodeRanges: [
              {
                min: 429
                max: 429
              }
              {
                min: 500
                max: 503
              }
            ]
          }
          name: '${name}BreakerRule'
          tripDuration: 'PT10S'
        }
      ]
    }
  }
}]

Here’s the part about the backend pool:

resource aoailbpool 'Microsoft.ApiManagement/service/backends@2023-09-01-preview' = {
  name: 'openaiopool'
  parent: apiManagementService
  properties: {
    description: 'Load balance openai instances'
    type: 'Pool'
    pool: {
      services: [
        {
          id: '/backends/${backendNames[0]}'
          priority: 1
          weight: 1
        }
        {
          id: '/backends/${backendNames[1]}'
          priority: 2
          weight: 1
        }
        {
          id: '/backends/${backendNames[2]}'
          priority: 2
          weight: 1
        }
      ]
    }
  }
}

Step 2: Create an API Management API

>Reference: The following policy can be used in existing or new APIs. The important part is to set the backend service to the backend pool created in the previous step.

Option I: Add to existing API

Just add the following: Setting up backend services and Try again Policy to enable load balancer using Circuit Breaker module:

Option II: Create a new API

Add new API

Go to your API Management instance.
Click on API.
Click Add API.
Select the ‘HTTP’ API.
Name it and set the URL suffix to ‘openai’.

>Reference: The URL suffix is the path that will be added to the API Management URL. For example, if the API Management URL is ‘https://apim-ai-features.azure-api.net‘, the URL suffix is ’openai’ and the full URL is ‘.https://apim-ai-features.azure-api.net/OpenAI‘.

Added “Capture All” action

Click on the API you just created.
Click the ‘Design’ tab.
Click Add Task.
Set the method to ‘POST’.
Set the URL template to ‘/{*path}’.
Set a name.
Click ‘Save’.

Additional work.png

>Reference: The ‘include all’ task is designed to match all OpenAI requests, and this is achieved by setting the URL template to ‘/{*path}’. For example:
The base URL is: https://my-apim.azure-api.net/openai
The Postfix URL is: /deployments/gpt-4o/chat/completions?api-version=2024-06-01
The full URL is: https://my-apim.azure-api.net/openai/deployments/gpt-4o/chat/completions?api-version=2024-06-01

Add a load balancer policy

Select the task you just created.
Click the ‘Design’ tab.
Click the ‘Inbound Processing’ policy button ‘>’.
Replaces existing policy with the following: This policy (shown below).
Click ‘Save’.

This policy is set to distribute requests across the backend pool and retry requests if the backend service is unavailable.


    
        
        
        
        
        
            
            
        
        
        
        
            @((string)context.Variables["traceparentHeader"])
        
    
    
        
            
        
    
    
        
        
            @(context.Request.Url.Host)
        
        
    
    
        
        
            @(context.LastError.Reason)

>Important: The main policies involved in load balancing, which distributes requests across the backend pools created in the previous step, are: Setting up backend services: This policy sets the backend service to the backend pool created in the previous step.

Try again: This policy retries requests when the backend service is unavailable. If a circuit breaker trips, the request is immediately retried on the next available backend service.

>Important: value of count It must be equal to the number of backend services in the backend pool.

Step 3: Configure monitoring

Go to your API Management instance.
Click on ‘API’.
Click on the API you just created.
Click ‘Settings’.
Scroll down and find ‘Diagnostic Log’.
Check the ‘Override Global’ checkbox.
Add ‘backend-host’ and ‘Retry-After’ headers to the log.
Click ‘Save’.

>Reference: ‘Backend host‘ The header is the host of the backend service to which the request was actually sent. ‘Try again – after‘ header overrides TripDuration, which is the number of seconds the client should wait before retrying a request sent by the Open AI service. Backend circuit breaker environment.

>Reference: You can also add request and response bodies to your HTTP requests in the ‘Advanced Options’ section.

Step 4: Prepare the OpenAI service

Model deployment

>Important: To seamlessly use the load balancer configuration, the same model must be deployed across all OpenAI services. Models must be deployed with the same name and version across all services..

Go to OpenAI services.
Select the ‘Deploy Model’ blade.
Click the ‘Manage Distribution’ button.
Build a model.
Click ‘Create’.
Repeat the above steps for all OpenAI services, making sure the model is deployed with the same name and version across all services.

Set up managed identities

>Reference: The API Management instance must have the system/user ‘Managed Identity’ set to the OpenAI service.

Go to OpenAI services.
Select the ‘Access Control (IAM)’ blade.
Click ‘Add Role Assignment’.
Select the ‘Cognitive Services OpenAI User’ role.
Select your API Management Management ID.
Click ‘Review + Assign’.
Repeat the above steps for all OpenAI services.

Step 5: Test the Load Balancer

>Reference: To call the API Management API, you must set the ‘api-key’ header to the subscription key of your API Management instance.

We will run Chat Completion API In the OpenAI service via the API Management API. The API Management API distributes requests to the backend pool created in the previous step.

Running a Python load test script

Run a test Python script. main.py Test your load balancer and circuit breaker configuration.

python main.py --apim-name apim-ai-features --subscription-key APIM_SUBSCRIPTION_KEY --request-max-tokens 200 --workers 5 --total-requests 1000 --request-limit 30

explanation

python main.py: Runs the main.py script.
–apim-name apim apim-ai-features: The name of the API management.
–key APIM_SUBSCRIPTION_KEY: Pass the API subscription key.
–request-max-tokens 200: Maximum number of tokens to generate per request upon completion (optional, defaults to 200).
–workers 5: Number of parallel requests to send (optional, default is 20).
–total_requests 1000: Set total requests to 1000 (optional as default is 1000).
–request-limit 30: Number of requests to send per second (optional, default is 20)

>Reference: You can adjust the value –Batch size and –Total number of requests As required. If omitted, the script uses the default values specified in the argparse configuration.

Test Results

ApiManagementGatewayLogs
| where OperationId == "chat-completion"
| summarize CallCount = count() by BackendId, BackendUrl
| project BackendId, BackendUrl, CallCount
| order by CallCount desc
| render barchart

conclusion

In conclusion, leveraging Azure API Management significantly improves the resiliency and capacity of your Azure OpenAI service by distributing requests across multiple instances and implementing a load balancer using the retry/circuit breaker pattern.
These strategies improve service availability, performance, and reliability. For more information, see: Backend of API Management.

References

Azure API Management

Azure API Management Terminology
See API Management Policy
API Management Policy Expressions
Backend of API Management
Error handling in API management policies

Azure Tech Community

AI Hub Gateway Landing Zone Accelerator

Source link

script

Prerequisites

Step 1: Provisioning the Azure API Management Backend Pool (Lee Dong-guk)

Biceps CLI

Deploying Backend Pools Using Bicep

Step 2: Create an API Management API

Option I: Add to existing API

Option II: Create a new API

Add new API

Added “Capture All” action

Add a load balancer policy

Step 3: Configure monitoring

Step 4: Prepare the OpenAI service

Model deployment

Set up managed identities

Step 5: Test the Load Balancer

Running a Python load test script

explanation

Test Results

conclusion

References

Azure API Management

Azure Tech Community

Our Company

About Links

Useful Links

Newsletter

Laest News

Azure API Management Circuit Breaker and Load Balancing

script

Prerequisites

Step 1: Provisioning the Azure API Management Backend Pool (Lee Dong-guk)

Biceps CLI

Deploying Backend Pools Using Bicep

Step 2: Create an API Management API

Option I: Add to existing API

Option II: Create a new API

Add new API

Added “Capture All” action

Add a load balancer policy

Step 3: Configure monitoring

Step 4: Prepare the OpenAI service

Model deployment

Set up managed identities

Step 5: Test the Load Balancer

Running a Python load test script

explanation

Test Results

conclusion

References

Azure API Management

Azure Tech Community

Microsoft at Open Source Summit Europe 2024

“Exciting Data Science Engineer Job Openings in Dharwad by Freshersworld Client”

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Laest News