Updated Fabric GitHub Repo for 250M rows of CMS Healthcare data

by info.odysseyx@gmail.com · October 14, 2024

Last year, I worked with my colleague Inder Rana to build and launch a GitHub repository for using CMS Medicare Part D data within Microsoft Fabric. This repository is intended to provide examples of Fabric’s end-to-end analytics solutions that can be easily deployed by anyone using a Fabric environment. We’ve updated our analytics solution with several important improvements.

The Extract, Load, and Transform (ELT) process is now complete from the CMS to the Gold layer in Lakehouse. Less than 20 minutes Runs with enhanced automation.
The repository now contains logic to fetch new data from 2022, so your solution includes: 10 years of data (2013-2022) and almost 250 million rows.
There are two simple options to move data from CMS servers to the gold layer in less than 20 minutes.
1. Move data to the gold tier using either 2) Spark notebooks orchestrated by pipelines, or 2) Spark notebooks and SQL stored procedures.
2. Option 2 deploys the Gold tier in Fabric Warehouse for users using SQL and Python.

The updated GitHub repository can be found at this link. If you found it useful, please leave a “star”!: main fabric-samples-healthcare/analytics-bi-directlake-starschema · isinghrana/fabric-samples-hea…

The first option, using three Spark Notebooks with a single pipeline, is reviewed in the video below. A video reviewing SQL stored procedure versions will be released soon.

Below is a diagram reviewing the new and updated processes.

Our Company

About Links

Useful Links

Newsletter

Laest News

Updated Fabric GitHub Repo for 250M rows of CMS Healthcare data

Demystify potential data leaks with Insider Risk Management insights in Defender XDR

Azure PostgreSQL with Azure Open AI to innovate Banking Apps: Unlocking the Power of AI Extension

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Laest News