Rapid Iteration with Data Vault 2.0

By

As discussed in a previous blog post, Data Vault 2.0 offers numerous advantages over traditional data warehousing approaches. Among the benefits discussed, the flexibility of the Data Vault was highlighted as making the approach attractive to Agile development teams. Agile delivery is so baked into the Data Vault Methodology that an Agile workflow is considered the best practice when deploying a Data Vault, according to its creator, Dan Lindstedt. Another advantage Data Vault offers over alternatives is its simplicity. At its core, Data Vault employs a simple set of rules for deploying objects into the warehouse. This simplicity paves the way for increased speed, repeatability, and automation.

Business Focused

The Data Vault Methodology puts the business front and center as they are best positioned to know both the data and their ultimate needs. Delivering early and often using Agile practices allows the project to stay aligned with the end users’ needs and expectations throughout the process. Rather than developing and deploying each data warehouse layer in its entirety, the Data Vault Methodology encourages breaking the deployment into sprints (1–4-week iteration periods). Breaking the deployment into sprints allows the customer to take delivery at regular intervals and start reviewing artifacts for acceptance earlier in the project.

Simplicity

The goal for each sprint is to deliver a testable, discrete feature to the customer for review. This includes the minimum footprint across all data warehouse layers: the complete source tables in the staging layer, raw vault tables to house the data in the warehouse layer, and views or tables to present the unified data in the information mart layer. The creation of these objects, in turn, is quite simple. In fact, the patterns at the heart of the data vault model creation are representable in code — enabling scripts to manage the creation of most objects.

Using automation, file delivery can trigger the creation and loading of a staging table to house the data. From there, scripts can run to create HUB, SAT, and LINK tables in the raw vault to house the staged data. From there, a script can run to generate views in an information mart to present the most recent point-in-time version of the data. With this automation in place, delivery times can be reduced, and delivery consistency can be increased, enabling more time and energy to deliver quality data products to the business.

 

About The Author

Brock is a Senior Consultant on the Data team.