Your Privacy

This site uses cookies to enhance your browsing experience and deliver personalized content. By continuing to use this site, you consent to our use of cookies.
COOKIE POLICY

Data Vault | Next Level Data

Data Vault | Next Level Data
Back to insights

History 

Traditional data warehousing has been around since the mid-80s and has been adopted as a foundation of modern business intelligence platforms. By incorporating different data architecture fundamentals compared to an operational system, the data warehouse allows businesses to use their data as a basis for reporting.  By pre-aggregating, transforming, and contextualizing transactional data, analysis of historical data becomes easier to accomplish, allowing businesses to garner significant value and make databased decisions promptly.  This system of data management comes not without disadvantages. Once a traditional warehouse grows to a certain size and complexity, changes to the architecture require more time to develop and regression test.     

Data Warehouse 2.0

An official definition states that a data vault is a detailoriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3nf) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of todays enterprise data warehouses. 

Quite verbose, but what does all this mean? To simplify, the data vault provides a flexible framework for adding all of your business data reliably and efficiently.  Some of the key advantages that the data vault methodology provides include a simpler data ingestion pipeline, an easy and reliable auditability mechanism, flexibility and ease, and new sources and relationships without impact to the existing design 

What does it look like? 

A typical implementation of data vault architecture uses three primary table types: 

  1. Hubs contain a list of unique business keys as well as some related metadata. The business keys provide the backbone of the warehouse as all data revolves around them.  
  2. Links bring business keys together and represent associations or transactions between said business keys (HUBs). LINKs often provide the basis for creating facts in a dimensional model, as they effectively represent a many-to-many relationship between attached HUBs. 
  3. Satellites, which are joined to one hub or one link, contain all contextual data relevant to its parent table. They also contain any historical data that can be used to facilitate recordlevel auditing. This is accomplished by a recordlevel Valid Date/Invalid Date that enables data lineage across all business keys for the related hub. 

Consider a data vault application that stores a persons demographic and employment information.  Unique entities would be the Person and the Employer, and data that uniquely identifies individual entities would be stored in the related HUB table (eg. customer_id for the PERSON entity).  Demographic information related to an individual entry in the PERSON HUB table (fname, lname, etc.) would be stored in the PERSON SATELLITE table.  Establishing a relationship between a person and an employer would involve creating a LINK table that incorporates only the identifying information for related HUBs (the ids from PERSON_HUB and EMPLOYER_HUB). 

The real power of the data vaults flexibility lies in its ability for fast, on the fly, addition of new relations (LINKs). This principle allows the incorporation of new data sources and relations with much less overhead when compared to a traditional data warehouse implementation. Placing the associations between the new sections and the old sections in LINK tables enables the new relations to be created from new domains without requiring re-engineering the existing data structures. This architectural difference is where the secret sauce of data vaulting applies. 

Depending on the use case, data vault could be right for you.  While traditional data warehousing will always be a tried and true method, data vaults advantages outweigh the old.  Today’s software development teams live and breathe in Agile, and the notion of incremental delivery and cyclical development align strongly to data vaults flexibility. The idea of auditability is baked into the data vault – this includes change tracking and source system tracking through metadata provided within each table.  As changes are always incremental and additive, old data is never lost, and an automatic audit trail is baked into any data vault implementation. Take your data to the next level with a data vault!

 

Further Reading:
https://danlinstedt.com/solutions-2/data-vault-basics/ 
https://www.talend.com/blog/2015/03/27/what-is-the-data-vault-and-why-do-we-need-it/ 
https://en.wikipedia.org/wiki/Data_vault_modeling

Digging In

  • Data & Analytics

    Unlocking the Full Potential of a Customer 360: A Comprehensive Guide

    In today’s fast-paced digital economy, understanding your customer has never been more critical. The concept of a customer 360 view has emerged as a revolutionary approach to gaining a comprehensive understanding of consumers by integrating data from different touchpoints to offer a holistic view.  A customer 360 view is about taking an overarching approach to […]

  • Data & Analytics

    Microsoft Fabric: A New Unified Data Platform

    MicroPopular data services and tools often specialize in specific aspects of the data analytics pipeline, serving teams in the data lifecycle. For instance, Snowflake addresses large-scale data warehousing challenges, while Databricks focuses on data engineering and science. Power BI and Tableau have become standard tools for business intelligence tasks. So, where does Microsoft Fabric create […]

  • Data & Analytics

    Improve Member Experience: Maximize Engagement & Value for Associations

    As you know, member engagement is key to providing value and retaining members over time. However, you must also recognize that member needs and preferences are evolving rapidly, especially as they desire more seamless digital experiences. Additionally, member expectations for personalized, omnichannel interactions have risen in recent years, and this means that associations must strategically […]

  • Data & Analytics

    A Guide to Data Strategy Success in Your Association

    While countless organizations aim to harness the potential of data, few possess a clear strategy to transform raw information into actionable insights that fuel their operations and marketing efforts. Don’t fall into the trap of investing in limited, tactical solutions.

  • Data & Analytics

    ChatGPT & Your Data Strategy – Revolution or Evolution?

    You would be hard-pressed to find a single person who was not some degree of impressed when they first tried out ChatGPT. After its public release, the conversation in the tech space seemingly changed overnight about how AI would change everything. But much like past hot topics in the tech world – such as the […]

  • Data & Analytics

    Revamping Data Pipeline Infrastructure to Increase Owner Satisfaction at Twiddy

    In an ever-evolving technological landscape, embracing new methodologies is vital for enhancing efficiency. Our data and analytics interns recently undertook a significant overhaul of one of Twiddy’s data pipeline infrastructures, implementing Airbyte pipelines with Kestra orchestration to replace an existing Java application. Motivated by several challenges with the previous system, most importantly a complete loss […]