Boosting Power BI Performance with Azure Databricks through Automatic Aggregations

This post was written in collaboration with Yatish AnandChief Solutions Architect at Databricks Andrey MirskySenior Specialist Solutions Architect at Databricks.

Figure 1. Power BI automatic aggregation overview source.

introduction

In today’s rapidly changing data environment, timely insights can make a big difference. Power BI’s automatic aggregation feature is a groundbreaking feature designed to push performance boundaries by delivering low-latency query results for large data sets. Simplify DirectQuery models with AI-powered caching and combine best-in-class performance. of Power BI on Azure Databricks Use low-latency BI for modern reporting needs. When used with DirectQuery mode, automatic aggregation does not face data volume limitations and can scale regardless of data size without compromising BI performance.

These innovations allow Power BI users of all skill levels to easily take advantage of advanced performance without worrying about backend burden or complex data modeling. Imagine your reports updating in real time even as billions of records are playing. This approach delivers actionable insights faster than ever, freeing up your time to focus on your business rather than your data infrastructure. This blog demonstrates the integration of Azure Databricks with Power BI automatic aggregation and how this integration can help improve the performance of Power BI reports.

What is automatic counting?

Auto-aggregation simplifies the process of improving BI query performance by maintaining an in-memory cache of aggregated data. This means that a significant portion of report queries can be served directly from this in-memory cache instead of relying on backend data sources. Power BI uses AI to automatically build these aggregates based on query patterns, then intelligently decides which queries can be served from the in-memory cache and which queries should be routed to the data source via DirectQuery, accelerating visualizations and backend systems. Reduce the load. .

Key benefits of automatic counting

Faster report visualization: Automatic aggregation optimizes most report queries by pre-caching aggregated query results, including results generated as users interact with the report. Only outlier queries that cannot be resolved through the cache are passed to the data source.
Balanced Architecture: Compared to using pure DirectQuery mode, using automatic aggregation allows for a more balanced approach. The most frequently used queries are served from the Power BI query in-memory cache. This reduces processing load on data sources during peak reporting times, improves scalability, and reduces costs.
Simplified setup: Model owners can easily enable automatic aggregation and schedule regular refreshes. Once initial training and refresh are complete, the system automatically develops an aggregation framework tailored to your specific queries and data patterns.

Configure automatic aggregation

Setting up automatic counting is simple. Users can enable this feature in model settings and schedule one or more refresh tasks. It is important to review the comprehensive guidance for the auto-aggregation feature to determine whether it is appropriate for your particular environment.

Figure 2. Enable automatic aggregation source.

Once configured, Power BI leverages query logs to track user interactions and optimize the aggregate cache over time. The training job to evaluate query patterns occurs during the first scheduled refresh, allowing Power BI to adapt to changing usage patterns.

Automatic aggregation requirements

Auto-aggregation is compatible with several Power BI plans, including:

Power BI premium per capacity
Fabric F Sku Capacity
Power BI premium per user
Power BI Embedded model

Automatic aggregation is designed specifically for DirectQuery models, including composite models that leverage both import tables and DirectQuery connections.

Auto-aggregation walkthrough with Azure Databricks integration

This example shows how to enable and train auto-aggregation in a Power BI semantic model to improve the performance of reports using Azure Databricks as a data source.

Prerequisites

Before you start, make sure you have the following:

Azur Databricks Accountaccess not Azure Databricks workspace and Databricks SQL Warehouse.
Power BI desktop It’s installed on your computer. The latest version is highly recommended.
Power BI workspace
DAX Studio or another DAX parser tool

Step-by-step instructions

1. Create an initial Power BI semantic model based on: sample inventory, teach outline. Add tables and relationships as shown in the screenshot below. dimension table customer and nation Must be set to dual storage mode. fact table command and item Must be set to DirectQuery storage mode. Below is the data model for the sample report.

For best practices for Power BI storage modes, see: this Git Repo

2. Create a simple tabular report that displays the number of orders, minimum delivery date, discount total, and quantity total. Also add a slicer with country names as shown below.

3. Now publish this report to your Power BI workspace.

4. When you run the report as shown below, Power BI takes approximately 20 seconds to run the query. Here is a snapshot of the network trace:

The screenshot below also shows query hits and 38M records read against Databricks SQL Warehouse.

5. Enable automatic aggregation in the semantic model settings. You can set the query scope as needed. Enabling this setting increases the number of user queries that are analyzed and considered to improve performance. A higher percentage of query coverage increases the potential benefit because more queries are analyzed, but aggregate training takes longer.

6. For Power BI to be able to create aggregates, you need to populate the Power BI query log, which stores internal queries generated by Power BI when users interact with reports. So, you can open a deployed Power BI report and select different country names in the slicer to interact with the report, or you can open DAX Studio and run the sample DAX query described below.
For better model training, you should set different values for slicers or filters in your DAX query and run them multiple times.

TREATAS({"BRAZIL"}, 'nation'[n_name])

One of the guidelines for populating the query log is that report publishers should open the report and try out various slicer filters before making the report available to users. In the scenario mentioned above, the query log was populated by selecting a different name from the report slicer. This step helps end users render reports faster.

7. You can now manually start or schedule model training.

8. Once the model is trained, Power BI will have the aggregated values in its in-memory cache. The next time you interact with a report using a similar pattern (dimension, measure, filter), Power BI leverages the cached aggregates to serve the query and does not send the query to Databricks SQL Warehouse. Therefore, you can expect report refresh performance of less than 1 second.

As you can see in the screenshot post below with automatic aggregation enabled, you can see that the report visual now renders in ~1.6 seconds compared to 20 seconds previously. This is because data is now being read from the query log cache.

Additionally, SQL queries are not executed in DBSQL as shown below.

Automatic aggregate monitoring and management

Power BI continuously improves the in-memory aggregate cache through scheduled refresh. Semantic model owners can choose to trigger training operations on demand, if necessary. It is also important to monitor refresh history to ensure that tasks are completed successfully and to identify potential problems.

Power BI provides detailed refresh history logs that show the performance of each operation, allowing users to track memory usage and other important metrics.

conclusion

In today’s data-driven world, Azure Databricks and Power BI automatic aggregation A groundbreaking product that delivers unparalleled performance in even the most demanding data environments. while Azure Databricks Excellent for processing multi-terabyte scale datasets. automatic counting It uses AI in query patterns to intelligently cache aggregates, dramatically accelerating performance and reducing costs. This combination improves operational efficiency while addressing the limitations of limited import and Direct Lake modes when working with large volumes. DirectQuery model. As stated in our blog automatic counting to DirectQuery model Now you can achieve sub-second report performance without constantly querying the underlying data source. This innovative approach allows you to focus on delivering lightning-fast BI reports at any scale rather than manually tuning semantic models.

Source link