Home NewsX Visualizing Data as Graphs with Fabric and KQL

Visualizing Data as Graphs with Fabric and KQL

by info.odysseyx@gmail.com
0 comment 11 views


I have been very interested in data visualization for quite some time. In the past few years, I have been focusing on how to visualize graph databases (regardless of where the data came from). Using forced directed graphs to highlight similarities or “connected communities” in the data is very powerful. The purpose of this post is to highlight some recent work. Cousteau Explorer The team worked on visualizing graphs in an Azure Data Explorer database using data pulled from a Fabric KQL database.

Note: Cousteau Explorer The application used to visualize the graph is currently only supported on Windows.

Azure Data Explorer (ADX) is a fully managed, high-performance analytics engine from Microsoft that specializes in near-real-time queries on large amounts of data. It is very useful for log analytics, time series, and Internet of Things type scenarios. ADX is similar to the traditional relational database model in that it organizes data into tables with a strongly typed schema.

In September 2023, the ADX team Extensions to the query language (KQL) enabled graph semantics on top of tabular data. This extension allowed users to contextualize data and its relationships as a graph structure of nodes and edges. Graphs are often an easier way to represent and query complex or networked relationships. These graphs are typically difficult to query because they require recursive joins in standard tables. Examples of common graphs include social networks (friends of friends), product recommendations (similar users also bought product x), connected assets (assembly lines), or knowledge graphs.

Fast forward to February 2024 and Microsoft Fabric is here. Event House As a workload in the Fabric workspace, this brings the power of KQL and real-time analytics to the Fabric ecosystem.

Now I have a lot of data in my Fabric Eventhouse and I want to visualize it as a force direction graph…

Let’s get started!

You will need a Microsoft Fabric account to follow along.Get started with Fabric for free).

Next, in this post, we used the Bureau of Transportation Statistics’ open data set. The following files were used:

  • Air Support Table – Master Coordinate Data
    • When you download this file, you can choose which fields to include. In this example, we only used AirportID, Airport, AirportName, AirportCityName, and AirportStateCode.
    • This airport data is loaded directly into a table in KQL.
    • This file does not necessarily need to be unzipped.
  • Airline Service Quality Performance 234 (On-time Performance Data)
    • In this blog, I only used the “April 2024” file from this link.
    • This data is accessed using the Lakehouse shortcut.
    • Unzip this file to a local folder and change the extension from “.asc” to “.psv”, as this is a pipe-delimited file.

To use the downloaded file, we uploaded it to the Lakehouse “Files” section of the Fabric Workspace. If you don’t have a Lakehouse in your workspace, first go to your workspace, select “New” -> “Additional Options” and select “Lakehouse” from the Data Engineering workload. Name the new Lakehouse and click “Create”.

If you have a Lakehouse, you can click on it to bring up the Lakehouse Explorer and upload your files. First, create a “Flights” folder by clicking on the three dots next to “Files” in the Lakehouse Explorer and selecting “New subfolder.” Next, click on the three dots next to the “Flights” subfolder and select “Upload” from the drop-down menu and select your on-time performance file. Refresh the page to see if your files have been uploaded to the file.

Brian Sherwin_0-1724168328265.png

Next, we will use Eventhouse to host our KQL cluster and collect data for analysis. If you do not have an Eventhouse in your workspace, select “New” -> “More Options” and select “Eventhouse” from the “Real-time Intelligence” workload. Name the new Eventhouse and click “Create”.

Finally, we will use: Cousteau Explorer An application for visualizing graphs (available only on Windows). This is a one-click deployment application, so it can run application updates on startup.

When Eventhouse was created, a default KQL database with the same name was created. To import data into the database, click the three dots next to the database name and select “Import Data” -> “Local File”. In the dialog that pops up, under “Select or create a target table”, click “New Table” and name the table, in this case “airports”. If there is a valid table name, the dialog will update to allow you to drag or browse for the file you want to load.

Note: If your file size is less than 1GB, you can upload it in compressed file format.

Click “Next” to examine the data to be imported. For the airport data, you need to change the “Format” to CSV and enable the “First row is column headers” option.

Brian Sherwin_1-1724168328270.png

Click “Finish” to load the file into the KQL table.

Now that the airport data is loaded into the table, you can query the table to see the results.

Brian Sherwin_2-1724168328272.png

Here is a sample query to check if the data has been loaded:

airports
| take 100;

For On-Time performance data, we do not collect it in KQL. Instead, we create shortcuts to files in the Lakehouse repository.

Go back to the KQL Database Explorer and click the “+ New -> OneLake Shortcut” menu item at the top.

Brian Sherwin_3-1724168328275.png

In the dialog box that appears, select “Microsoft OneLake” and in “Select Data Source Type”, select Lakehouse where the data was previously uploaded and click “Next”.

Brian Sherwin_4-1724168328279.png

Once OneLake’s tree view is populated with tables and files, open the file, select the subfolder that was created when you uploaded the On-Time data, and then click “Create” to complete the shortcut creation.

Brian Sherwin_5-1724168328280.png

Once the shortcut is created, you can view the data by clicking “Explore Data” and running the following query to validate the data.

external_table(‘flights’)
| count;

NOTE: When accessing shortcut data, use “external_table” and the generated shortcut name. The shortcut name cannot be changed.

Now that we have our data connected to the Eventhouse database, we want to start analyzing this data. Fabric has a way to run KQL queries directly, but the results of the query are expected to be tables. The only way to display a graph visualization is to use: Cousteau Explorer.

To connect to a KQL database, you need to get the URI of the cluster from Fabric. When you navigate to the KQL database in Fabric, there is a panel with “Database Details”.

Brian Sherwin_6-1724168328288.png

Using “Copy URI” to the right of the query URI will copy the cluster URI to your clipboard.

In the Kusto.Explorer application, right-click on “Connections” and select “Add Connection”.

Brian Sherwin_7-1724168328289.png

In the popup, paste the query URI into the “Cluster Connection” text box, replacing the text there. You can also specify an alias for the connection instead of using a URI. Finally, we decided to use AAD for security. You can choose whichever is appropriate for your client access.

At this point you can open a “New Tab” (Home Menu) and type the same query we used above.

let nodes = airports;
let edges = external_table('flights')
| project origin = Column7, dest = Column8, flight = strcat(Column1, Column2), carrier = Column1;
edges
| make-graph origin --> dest with nodes on AIRPORT

Note: You may need to modify the table name (Airport, Flight) depending on the shortcut or table name you used when loading the data. These values ​​are case sensitive.

In the graph, the points of interest are airports (nodes) and the connections (edges) are individual delayed flights. I use the “make-graph” extension in KQL to create an edge graph from the origin to the destination, using the 3-letter airport codes as links.

Visualize with make-graph

When you run this query, if the last line of the query is “make-graph”, Cousteau Explorer A new window titled “Chart” will automatically pop up to show you the data. In the image below, I’ve changed the visualization to a dark theme and then colored the corners based on the “Carrier” column in the flight data.

Brian Sherwin_8-1724168328321.png

Note: I have zoomed in on the cluster of interest.

If you drag a few nodes around, you’ll notice that there are a few nodes (airports) with a lot of orange connections. If you click on the orange links, you’ll quickly see that the orange lines are Delta Flights, and the three nodes pulled out in the image below are Atlanta, Minneapolis, and Detroit.

Brian Sherwin_9-1724168328339.png

I started with a text-based data table and ended up with a beautiful “network” visualization of my flight data. The power of graph visualizations to see relationships between data rather than just reading a table is invaluable.

Next, we’re excited to start exploring data visualizations for supply chain and product recommendations.





Source link

You may also like

Leave a Comment

Our Company

Welcome to OdysseyX, your one-stop destination for the latest news and opportunities across various domains.

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Laest News

@2024 – All Right Reserved. Designed and Developed by OdysseyX