Home NewsX Automating Azure Monitor VM Insights Deployment with Python and Azure Functions

Automating Azure Monitor VM Insights Deployment with Python and Azure Functions

by info.odysseyx@gmail.com
0 comment 11 views


Real-time deployment of VM Insights

In today’s rapidly changing digital environment, the need for automated and scalable solutions for real-time deployment has never been greater. As many organizations use virtual machines as their primary computing layer, implementing an effective observability strategy is essential to ensure optimal performance and reliability. A robust observability framework provides businesses with critical insights into their infrastructure, allowing them to identify and resolve issues before they impact service delivery. This means it is important for customers to adopt a more proactive rather than reactive approach.

Azure Monitor VM Insights It provides powerful features to achieve this level of visibility. Easily enabling VM Insights allows your organization to monitor VMs in real time, providing actionable data and analytics to make informed decisions. By leveraging these tools, companies can strengthen their observability strategies to maintain the resilience and efficiency of their infrastructure and respond to evolving demands.

vminsights-performance-aggview-01 (1).png

source: Chart performance with VM insights – Azure Monitor | microsoft run

index:

  • Introduction to Azure Monitor
  • Practical implementation of the solution
  • How to get started
  • Security Considerations
  • Limitations and Future Improvements
  • resources

1. Introduction to Azure Monitor

Azure Monitor is a comprehensive solution that collects and analyzes monitoring data from cloud and on-premises environments, enabling full-stack observability and improving availability and performance. Aggregate data from different layers across multiple Azure and non-Azure environments and store it in Azure for central analysis and visualization. For analysis we use the following tools: Azure Monitor Metrics (for real-time metrics) and Log Analysis Workspace (for log querying and analysis).

1.1 VM Insights

The remainder of the article will focus on monitoring virtual machines using VM Insights. The latter is a powerful feature within Azure Monitor that provides deep visibility into the performance and health of virtual machines. Provides an easy and efficient way to start monitoring client workloads on virtual machines and virtual machine scale sets. VM Insights supports Windows and Linux operating systems on:

  • Azure virtual machine.
  • Azure virtual machine scale set.
  • Hybrid virtual machines connected to Azure Arc.
  • On-premises virtual machines.
  • A virtual machine hosted in another cloud environment.

VM Insights provides a set of predefined workbooks and curated visualizations that allow you to: Performance monitoring We also analyze dependencies using: map function Gain a better understanding of application components on your VM.

To enable VM Insights, please refer to the following documentation: Using VM Insights Overview – Azure Monitor | microsoft run.

1.2 Azure Monitor Agent

As explained in documentationVM Insights has two different agents In the background there is an Azure Monitor agent and a dependency agent. The former collects data from your machines and stores it in a Log Analytics workspace in Azure. The dependency agent, on the other hand, uses the Azure Monitor agent and captures data about processes running on virtual machines and their external process dependencies.

Data captured by the Azure Monitor agent is used in: performance dashboardExamples: CPU utilization and memory usage. The data captured by the dependency agent is used by the mapping feature in VM Insights.

1.3 Data Collection Rules

After the agent is installed on the machine, the next steps are Data Collection Rules. Data collection rules are used by the Azure Monitor agent to specify what data to collect and how to process it. To establish this connection, you need to associate the machine with an Azure Monitor agent running using a data collection rule, as shown below.

dcr.png

source: Data collection rules in Azure Monitor – Azure Monitor | microsoft run

2. Practical implementation of the solution

The goal of the solution is to automate VM Insights activation on virtual machines. Several tools are used for this purpose.

  • Azure Functions using Event Grid triggers
  • Event Grid system items
  • Python scripting
  • Visual Studio Code for development and local testing

The architecture of the solution is as follows:

Architect.png

The main component of this architecture is a Python script that executes the following high-level steps:

  • Authenticate to Azure using a service principal
  • Environmental Assessment and Machine Recovery
  • Enable system-assigned managed identities on discovered machines
  • Deploy the Azure Monitor agent and optionally dependency agents
  • Associate machines with VM Insights data collection rules

Next, to fully automate the process, the above steps are applied automatically whenever a new system is deployed or activated through Arc, eliminating the need for manual intervention. To achieve this, a Python script is deployed to the Azure Function app using an Event Grid trigger.

Azure Functions is connected to an Event Grid system topic that subscribes to virtual machine creation events at the subscription level. This means that whenever a new virtual machine is deployed or activated through Arc, an event is generated and captured by an Event Grid system topic. This event then triggers Azure Functions to run the Python script.

More information about the tool can be found here.

3. How to get started

If you’d like to test the solution in your own environment, see the step-by-step tutorials available in this GitHub repository. claestom/AMA-Distribution—DCR-Association–Linux-Windows- (github.com).

The repository provides detailed instructions for:

  • prerequisites
  • Configuring your local environment
  • Create data collection rules
  • Configure required permissions
  • Local testing of scripts before deploying to Azure
  • Configure your environment in Azure
  • Azure Functions
  • Event Grid system items
  • Deploy script to Azure
import os
from dotenv import load_dotenv
from azure.identity import ClientSecretCredential
from azure.mgmt.subscription import SubscriptionClient
from azure.mgmt.compute import ComputeManagementClient
from azure.mgmt.monitor import MonitorManagementClient
from azure.mgmt.monitor.models import DataCollectionRuleAssociationProxyOnlyResource
from azure.mgmt.resource import ResourceManagementClient
from azure.core.exceptions import HttpResponseError
from azure.mgmt.compute.models import VirtualMachineIdentity, ResourceIdentityType

load_dotenv()

# Retrieve the credentials from environment variables
TENANT_ID = os.getenv("TENANT_ID")
CLIENT_ID = os.getenv("CLIENT_ID")
CLIENT_SECRET = os.getenv("CLIENT_SECRET")
DATA_COLLECTION_RULE_ID = os.getenv("DATA_COLLECTION_RULE_ID")

# Tag key/value pairs

VM_TAG = ["", ""]
SUBSCRIPTION_TAG = ["", ""]

# Dependency agent installation

DEP_AGENT = True

def enable_system_assigned_identity(resource_group_name, vm_name, subscription_id, credential):
    compute_client = ComputeManagementClient(credential, subscription_id)
    vm = compute_client.virtual_machines.get(resource_group_name, vm_name)

    if vm.identity and vm.identity.type == ResourceIdentityType.system_assigned:
        print(f"VM: {vm_name} already has a system-assigned managed identity enabled. Proceeding with the script.")
    else:
        print(f"Enabling system-assigned managed identity for VM: {vm_name}")
        vm.identity = VirtualMachineIdentity(type=ResourceIdentityType.system_assigned)
        async_vm_update = compute_client.virtual_machines.begin_create_or_update(resource_group_name, vm_name, vm)
        async_vm_update.result()
        print(f"System-assigned managed identity enabled for VM: {vm_name}")

def check_tag_subscription(subscription_id, credential):
    resource_client = ResourceManagementClient(credential, subscription_id)
    subscription_tags = resource_client.tags.get_at_scope(f"/subscriptions/{subscription_id}")
    tags = subscription_tags.properties.tags
    if tags.get(SUBSCRIPTION_TAG[0]) == SUBSCRIPTION_TAG[1] or all(element == "" for element in SUBSCRIPTION_TAG):
        print(f"Subscription {subscription_id} has the required tags. Proceeding.")
        return True
    else:
        print(f"Subscription {subscription_id} does not have the required tags. Skipping")
        return False

def install_ama_extension(compute_client, extension_name, vm, vm_name, resource_group):
    extension_parameters = {
        "location": vm.location,
        "publisher": "Microsoft.Azure.Monitor",
        "type": extension_name,
        "type_handler_version": "1.10",
        "auto_upgrade_minor_version": True,
        "settings": {}
    }
    extensions_result = compute_client.virtual_machine_extensions.list(resource_group, vm_name)
    extensions = extensions_result.value  # Access the list of extensions
    if not extensions or all(extension.name != extension_name for extension in extensions):
        print(f"No {extension_name} extension found on VM {vm_name}. Proceeding with installation.")
        try:
                compute_client.virtual_machine_extensions.begin_create_or_update(
                    resource_group_name=resource_group,
                    vm_name=vm_name,
                    vm_extension_name=extension_name,
                    extension_parameters=extension_parameters
                ).result()
                print(f"{extension_name} installed on VM {vm_name}.")
        except HttpResponseError as e:
                print(f"Failed to install {extension_name} on VM {vm_name}. Error: {e}. Potential issue with the VM's OS.")
    else:
        print(f"{extension_name} already installed on VM {vm_name}.")

def install_map_extension(compute_client, extension_name, vm, vm_name, resource_group):
    extension_parameters = {
        "apiVersion" : "2015-01-01",
        "location": vm.location,
        "publisher": "Microsoft.Azure.Monitoring.DependencyAgent",
        "type": extension_name,
        "type_handler_version": "9.10",
        "auto_upgrade_minor_version": True,
        "settings": {"enableAMA": "true"}
    }

    extensions_result = compute_client.virtual_machine_extensions.list(resource_group, vm_name)
    extensions = extensions_result.value  # Access the list of extensions
    if not extensions or all(extension.name != extension_name for extension in extensions):
        print(f"No {extension_name} found on VM {vm_name}. Proceeding with installation.")
        try:
                compute_client.virtual_machine_extensions.begin_create_or_update(
                    resource_group_name=resource_group,
                    vm_name=vm_name,
                    vm_extension_name=extension_name,
                    extension_parameters=extension_parameters
                ).result()
                print(f"{extension_name} installed on VM {vm_name}.")
        except HttpResponseError as e:
                print(f"Failed to install {extension_name} on VM {vm_name}. Error: {e}")
    else:
        print(f"{extension_name} already installed on VM {vm_name}.")

def associate_data_collection_rule(monitor_client, vm, vm_name):
    association_parameters = DataCollectionRuleAssociationProxyOnlyResource(
        data_collection_rule_id=DATA_COLLECTION_RULE_ID,
        description="Data Collection Rule Association"
    )
    try:
        monitor_client.data_collection_rule_associations.create(
            resource_uri=vm.id,
            association_name=vm_name,
            body=association_parameters
        )
        print(f"VM {vm_name} associated with Data Collection Rule.")
    except HttpResponseError as e:
        print(f"Failed to associate VM {vm_name} with Data Collection Rule. Error: {e}")

def process_vm(vm, compute_client, monitor_client, subscription_id, credential):
    vm_name = vm.name
    resource_group = vm.id.split("https://techcommunity.microsoft.com/")[4]
    instance_view = compute_client.virtual_machines.instance_view(resource_group, vm_name)
    
    is_running = any(status.code == 'PowerState/running' for status in instance_view.statuses)
    
    if not is_running:
        print(f"VM {vm_name} is not running. Skipping.")
        return

    print(f"VM {vm_name} is running. Proceeding with installation of Azure Monitor agent.")
    
    tags = vm.tags
    os_profile = vm.os_profile
    
    if tags and tags.get(VM_TAG[0]) == VM_TAG[1] or all(element == "" for element in VM_TAG):
        enable_system_assigned_identity(resource_group, vm.name, subscription_id, credential)
        if os_profile.windows_configuration and DEP_AGENT:
            install_ama_extension(compute_client, "AzureMonitorWindowsAgent", vm, vm_name, resource_group)
            install_map_extension(compute_client, "DependencyAgentWindows", vm, vm_name, resource_group)
        elif os_profile.windows_configuration and not DEP_AGENT:
             install_ama_extension(compute_client, "AzureMonitorWindowsAgent", vm, vm_name, resource_group)
        elif os_profile.linux_configuration and DEP_AGENT:
            install_ama_extension(compute_client, "AzureMonitorLinuxAgent", vm, vm_name, resource_group)
            install_map_extension(compute_client, "DependencyAgentLinux", vm, vm_name, resource_group)
        elif os_profile.linux_configuration and not DEP_AGENT:
            install_ama_extension(compute_client, "AzureMonitorLinuxAgent", vm, vm_name, resource_group)
        else:
            print(f"VM {vm_name} has an unsupported OS. Skipping.")
            return

        associate_data_collection_rule(monitor_client, vm, vm_name)
    else:
        print(f"VM {vm_name} does not have the required tags. Skipping.")

def process_subscription(subscription, credential):
    subscription_id = subscription.subscription_id
    print(f"Processing subscription: {subscription_id}")
    
    compute_client = ComputeManagementClient(credential, subscription_id)
    monitor_client = MonitorManagementClient(credential, subscription_id)
    
    for vm in compute_client.virtual_machines.list_all():
        process_vm(vm, compute_client, monitor_client, subscription_id, credential)

def main():
    # Authenticate using the service principal
    credential = ClientSecretCredential(tenant_id=TENANT_ID, client_id=CLIENT_ID, client_secret=CLIENT_SECRET)

    # Get a list of subscriptions
    subscription_client = SubscriptionClient(credential)
    
    for subscription in subscription_client.subscriptions.list():
        if check_tag_subscription(subscription.subscription_id, credential) or all(element == "" for element in SUBSCRIPTION_TAG):
            process_subscription(subscription, credential)

if __name__ == "__main__":
    main()

4. Security considerations

This solution uses client secrets for authentication through service principal and app registration. To maintain strong security, our systems enforce expiration and periodic renewal of these sensitive credentials to mitigate potential vulnerabilities. It is important to maintain vigilant monitoring to proactively detect and remediate any issues caused by expired secrets to avoid disruption of solution functionality.

To achieve this, you have several options, using Azure services to periodically check for credential expiration.

The solution also enables system-assigned managed identities on the system so that data is collected into the VM Insights dashboard and dependency maps. This is fine for test and development environments, but when moving to production we recommend switching to a user-assigned managed identity as described in the following article: Best practice recommendations for managed system identities – Managed identities for Azure resources….

5. Limitations and future improvements

This section describes the current limitations and development features of the solution.

  • First, regarding compliance checks of your environment: The goal is to automatically receive compliance reports. This includes information such as the percentage of non-compliant virtual machines and a list of those machines.
  • Asynchronous checking of discovered virtual machines is added to improve the performance and speed of scripts. Currently, each subscription and virtual machine is processed synchronously.
  • Another limitation lies in the filtering capabilities. Currently, users can only filter by subscription and resource level using tagging. In the future, it will be important to introduce filtering mechanisms based on workload characteristics, such as identifying idle or unused systems.
  • Lastly, the solution is limited to Azure Virtual Machines and Azure Arc enabled virtual machines. This means that only virtual machines are recognized in the Azure portal. We are working to extend this to Virtual Machines beyond Azure Arc. The approach is similar to the current flow, but instead of using Event Grid triggers within Azure to activate Azure Functions, you can use HTTP triggers. This allows you to trigger Azure Functions from outside of Azure via HTTP requests. However, some code modifications are still required.

If you have any new feature ideas or feedback, please feel free to raise an issue or submit a PR (pull request) on our GitHub repository!

6. Resources





Source link

You may also like

Leave a Comment

Our Company

Welcome to OdysseyX, your one-stop destination for the latest news and opportunities across various domains.

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Laest News

@2024 – All Right Reserved. Designed and Developed by OdysseyX