Home NewsX Introducing Lineage Tracking for Azure Databricks Unity Catalog in Microsoft Purview

Introducing Lineage Tracking for Azure Databricks Unity Catalog in Microsoft Purview

by info.odysseyx@gmail.com
0 comment 9 views


We are very excited to announce the release of this highly anticipated feature. microsoft furview: Genealogy Tracing Azure Databricks Unity inventory. This is an important milestone in our ongoing efforts to improve data governance and visibility across cloud environments.

By leveraging this new feature, users can now track data flow across Azure Databricks notebooks, improving their ability to audit, monitor, and manage data movement. As more and more data flows through complex cloud-based platforms like Azure Databricks, having clear end-to-end visibility is critical for compliance, troubleshooting, and operational excellence.

Karansha_0-1729841200003.png

What is data lineage?

Data lineage refers to the ability to trace the origin, movement, and transformation of data as it flows across various systems and processes. This helps organizations answer key questions such as:

  • Where does this data come from?
  • How is the data converted and used?
  • What process or user modified the data?

In the context of Azure Databricks Unity inventoryLineage shows how data flows through a notebook, allowing users to see which sources feed their analysis and where the processed data is stored. By providing this visibility, data lineage improves transparency, making it easier to understand the data lifecycle, diagnose errors, and ensure compliance with data governance policies.

Microsoft Purview can capture lineage from both Unity and . inventory Table/view level and column level.

What are the prerequisites to enable Genealogy?

In addition to the standard prerequisites for Azure Databricks Unity inventory Searching in Microsoft Purview (i.e. active Azure subscriptions, Purview settings, and integration runtime), you will find: Key Requirements It is specifically used to retrieve genealogy.

  1. Activate system schema: system.access You need to enable the schema in Unity. inventoryThis is because lineage data is stored in system tables.
  2. user permissions: The scanning account requires SELECT permission on the following system tables:
  • system.access.table_lineage
  • system.access.column_lineage

These permissions are essential for Purview to retrieve lineage from Azure Databricks.

How do I get genealogy during a scan?

To enable lineage while Scan Settings Follow the standard steps for configuration in Microsoft Purview. no way The important actions required for Azure Databricks Search (source registration, runtime configuration, etc.) lineage are:

  • Toggle Genealogy Extraction: When configuring scanning, check the following: Lineage Extraction is set to to. This allows Microsoft Purview to obtain lineage of scanned Azure Databricks assets, including data flow through notebooks.

Karansha_2-1729841249539.png

Then run your scan and enjoy a cup of coffee while Microsoft Purview does its magic!

Example: Comparing lineage views in Azure Databricks and Microsoft Purview

After activating lineage and running the check, catalog On Azure Databricks Unity inventory will start to appear. Microsoft Purview Data Map. This means you get a unified view of your data sources in both systems, making it easy to track data flows and transformations.

Karansha_3-1729841268092.png

Azure Databricks lineage: Displays lineage and highlights dependencies for datasets and transformations within a notebook.

Karansha_4-1729841277863.png

Microsoft Purview Genealogy: Displays genealogy. catalog Visual end-to-end data flow

Karansha_5-1729841287224.png

This visual comparison provides a clear understanding of how each platform captures and displays data lineage, making it easier to manage and track data flows.

What’s next in the Azure Databricks lineage?

Currently only Azure Databricks notebook lineage is available, but we don’t stop there!

Microsoft is actively working with Azure Databricks to job and pipelineEnsure comprehensive data tracking across your Azure Databricks environment. We continue to push the boundaries of data governance to make it easier for organizations to gain full visibility into their data processes.

We’ll be expanding this feature to provide even more insight and control, so stay tuned for future updates!





Source link

You may also like

Leave a Comment

Our Company

Welcome to OdysseyX, your one-stop destination for the latest news and opportunities across various domains.

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Laest News

@2024 – All Right Reserved. Designed and Developed by OdysseyX