Microsoft Fabric: A New Unified Data Platform

By

MicroPopular data services and tools often specialize in specific aspects of the data analytics pipeline, serving teams in the data lifecycle. For instance, Snowflake addresses large-scale data warehousing challenges, while Databricks focuses on data engineering and science. Power BI and Tableau have become standard tools for business intelligence tasks. So, where does Microsoft Fabric create value? 

Microsoft Fabric aims to integrate Azure services—Data Factory, Power BI, and Synapse—for data management. This unified platform within Power BI serves to simplify data project workflows. This blog post explores Microsoft Fabric’s features, its suitability for team requirements, pricing, and a comparison with similar SaaS (software as a service) tools like Databricks.

 

In this article, we will cover:

What is Microsoft Fabric?

Fabric is a fresh SaaS offering from Microsoft, designed to centralize the tools and components required for data analytics and pipelines. Fabric is designed as an extension of Power BI. Many of the objects available to create in Fabric may seem familiar to those already exposed to tools like Azure Synapse, Azure Data Factory, and the various past iterations of Power Query.

New applications like OneLake File Explorer, as well as existing software like SQL Server Management Studio, Azure Storage Explorer, and the VS Code Synapse extension can be used to view, manage, and edit Fabric objects.

Like Databricks, Fabric emphasizes using delta tables, where data is stored in the parquet file format and published as Delta Lake Logs. This has several benefits, especially enabling cross-engine interoperability, or the ability to use Spark, Power BI, and other Fabric components to directly connect to it in addition to SQL.

Fabric uses ‘OneLake’ as data storage – a single data lake built on top of Azure Data Lake Storage (ADLS) Gen2 that will house content for an entire organization.

Content in OneLake is broken out into Workspaces, with their own permissions and content.

Warehouses and Lakehouses can be created in Workspaces. The decision on which to use is based on your requirements.

  • Warehouse – Supports transactions, DDL, and DML queries. Can store structured data.
  • Lakehouse – Supports read only queries and creation of views. Can store semi-structured or unstructured data as well as structured.

DirectLake

  • Business users can build Power BI reports directly on top of OneLake using the new direct lake mode in the Analysis Services engine.
  • Microsoft states that Direct Lake mode gives users the speed of an ‘Import’ data connection without needing to copy data, combining the best of ‘Import’ and ‘Direct Query’ connection types.

Microsoft Fabric’s Current Fit for Team Needs

A good way to show how Fabric can help fit team needs is to walk through steps that data from a specific source would go through in the Fabric environment, using medallion architecture.

Data Sources, Prepare and Transform, Analyze - OneLake

Data Engineering

  • Data engineering tasks involve utilizing a combination of Pipelines, Dataflows, and Notebooks (such as PySpark) to ingest data into the bronze zone, representing raw data. Typically, this zone corresponds to a lakehouse in Fabric. Subsequently, the same tools are employed to clean and standardize the data, moving it to the silver zone for enrichment, and finally to the gold zone based on specific business or team requirements.

Data Science & Modeling

  • Data science and modeling efforts leverage Notebooks supporting PySpark/Python and SparklyR/R for machine learning model training. These Notebooks can directly connect to lakehouse data. Fabric tools like MLFlow aid in tracking model training by logging experiments and models,

Data Analysts & BI Developers

  • Data analysts and BI developers utilize Power BI along with the Direct Lake connection capability to access gold-layer data. Additionally, SQL analytics endpoints are available for all Fabric lakehouses.

In addition to those team-specific uses, Fabric offers Git integration through Azure DevOps. This integration is at a workspace level. 

Fabric Limitations & Development Roadmap

Microsoft maintains a release plan for upcoming Fabric content here. As of this blog post, Fabric is still routinely releasing new content and updates for the platform.

Jan 2024 Updates

 

Known issues with Microsoft Fabric 

Migrating Content from Azure Synapse / ADF

  • There is currently no seamless method for migrating existing content like Dataflows, or Azure Synapse objects into a Fabric. Microsoft offers guides and runbooks for various migration scenarios here.

Object Ownership

  • Changing ownership of workspace objects (notebooks, dataflows, etc.) is not possible as of this blog post, and can represent an issue when trying to change. and standardize object ownership (e.g., service accounts).

Git

  • Only certain objects support git integration: Lakehouses, Notebooks, and Paginated Reports.
  • Excluded objects include Dataflows and Pipelines.
  • Currently, only Git in Azure Repos is supported. Azure DevOps on-prem isn’t supported.

Pricing & Capacity Structure

Pricing is based on a combination of the Fabric capacity chosen and the volume of OneLake storage used.

Capacity reservation discounts are available for a one-year commitment, but are solely for compute costs and exclude coverage for Fabric storage and networking expenses. Reservations do not automatically renew; instead, billing returns to pay-as-you-go rates upon expiration.

All capacity units (CUs) are pooled and remain available for use across various workloads to minimize idle resource costs. Pricing varies by Azure region. The table below reflects US East compute and storage rates as of February 2024.

Microsoft Fabric SKU tableMicrosoft Fabric Storage Prices

Microsoft Fabric vs Databricks

In comparison to similar SaaS tools like Databricks, Microsoft Fabric stands out for its seamless integration with existing Azure tools and services. While Databricks offers powerful big data analytics capabilities, Fabric provides a more intuitive and user-friendly interface, making it easier for teams to onboard and leverage its features effectively.  

Setup 

  • Databricks setup varies based on the cloud provider chosen to host Databricks. Some cloud providers offer templates for setup.
  • Fabric is relatively easy to set up, being an extension of the Power BI environment.

Audience

  • Databricks is geared towards technical users like data engineers and scientists.
  • Fabric is geared towards less technical users compared to Databricks, although it offers tools for more technical needs.

Integration

  • Databricks is cloud-agnostic and can be hosted on any major cloud provider (AWS, Azure, Google Cloud Platform).
  • Fabric integrates well with Azure services (being a Microsoft product), and offers some integration with other cloud providers (e.g., Amazon S3, and Google storage).

Security

  • Fabric and Databricks both maintain SOC 2 Type 2, ISO 27001, and HIPAA security certifications.

Documentation & Support 

  • Databricks has a depth of documentation and reference material, as well as a large and active community base. 
  • Fabric has less documentation and community support. This may change with time as the user community grows and documentation continues to be written.

Differentiators 

  • Databricks specializes in big data processing and machine learning with careful consideration for Spark. For data pipelines with large volume, Databricks may be more efficient for processing.
  • Fabric offers robust visualization capability through Power BI. Databricks offers SQL dashboards, but to get the same capability as Power BI another non-Databricks tool would need to be used as well.

Unlocking the Potential of Microsoft Fabric

Microsoft Fabric emerges as a versatile solution for data analytics, aiming to streamline workflows by integrating Azure services. This centralized platform, an extension of Power BI, offers familiar tools for data management and analytics tasks. While Fabric shares similarities with Databricks, it distinguishes itself with easier setup and integration with Azure services.

Fabric caters to a broader audience, including less technical users, while still providing tools for advanced analytics needs. Both Fabric and Databricks prioritize security and offer documentation and support, albeit with differences in depth and community size. Databricks excels in big data processing and machine learning, while Fabric shines in visualization capabilities through Power BI. 

If your organization wants to modernize their data infrastructure and data management, UDig can help! Our team has both the knowledge and experience to assist you in creating one unified platform. We can compare Fabric with tools such as Snowflake or Databricks and make recommendations tailored to your specific needs.  

Let’s connect and discuss how to best modernize your data infrastructure.  

 

Additional Resources 

About The Author

Ben is a senior consultant on the data team.