The Data Vault Must Flow
Pragmatic guide to Building a Data Vault
Blog Catalogue
Advantage Data Vault 2.0
Highlighting what DV2.0 gives more than any other interpretation of DV
Learning Data Vault is Like Learning How to Make Beer!
All it takes are three or four “things” to start any learning journey, your cognitive load. At the time of writing the article I was learning how to make beer!
Data Vault or: how I learnt to stop worrying and love Data Governance
Following an Atomic Space Age theme, a glimpse into Data Vault with DataOps.
Time to upgrade your thinking on Data Vault
Data Vault is more than just a data modelling methodology, it is designed to change and flex as the business evolves and matures around core business capabilities.
Data Vault Recipes
A holistic look at what it means to adopt Data Vault 2.0 methodology, inspired by baking of course!
a DATA VAULT ANALOGY
Data Vault in the industry has two standards, one following the Hans Hultgren method (Ensemble Modelling) and the other follows Dan Linstedt (Data Vault 2.0). Sometimes the terms of the two are confused and to the untrained eye it is difficult to tell who is following which method, which of course adds to the confusion of learning about Data Vault. Ensemble tends to lean towards replacing Kimball, Data Vault 2.0 does not — instead DV2.0 keeps the patterns simple and repeatable.
The art was inspired by “in the land of the blind the one-eyed man is king” proverb.
Data Vault Elevator Pitch
One’s point of view is usually biased toward their own interests, and it is the same when you pitch a Data Vault to different professions within a business.
Data Vault Dream Team
Ideas on how to get a Data Vault project started and build momentum
Building Data Vault modelling capability through the Mob
How to go about modelling your Data Vault through collaboration and having the right people in the right place. Inspired by work done at a major customer and extreme programming principles
Bring out your Dead… Data
The first DV article on consideration on what to do with defunct data! Inspired by Pet Semetary, Poltergeist and the Sixth Sense!
Data Vault Industry Verticals
An outcome of a Data Vault model review, this article explains some of the pitfalls of attempting to conform a data vault to an industry model. Art inspired by Sim City.
Data Vault Loader Traps
Articulating some of the pitfalls of not doing a Data Vault properly!
Decided to build your own Data Vault automation tool?
Based on experience building a home-grown Data Vault automation tool, this post covers most of the patterns you will encounter in a Data Vault 2.0 model, with examples!
Data Vault 2.0 on Snowflake…To hash or not to hash… that is the question
To hash or not to hash on Snowflake…? An article justifying why you should and how Snowflake’s MPP interpretation can still be used to deliver a Data Vault. Any guess to whom that is in the title page?
Why EQUIJOINS Matter!
Evidence on how PIT tables (when designed right) take advantage of inherent OLAP capabilities for querying facts and dimensions. Inspired by 12 Angry Men and Juror #8.
Data Vault Test Automation
Reconciliation between staged and target and between target tables is a must. This test framework is designed to keep the data vault implementation honest, and it is insert-only as well.
Data Vault Dashboard Monitoring
How to set up and track Data Vault dashboard reporting based out of Snowsight and the same INSERT-ONLY paradigm of DV2.0
Data Vault PIT Flow Manifold.
A little bit of Snowflake engineering in Conditional Multi-Table INSERTS and Point in Time (PIT) tables. Images inspired by the dozens of online manuals I read when trying to fix my lawnmower or motorcycle!
The Lost Art of Building Bridges
Where to use Bridge Tables and what problems do they solve?
Data Vault’s XTS pattern on Snowflake
Solving Time Crime in Data Vault, using Snowflake. How does the timeline correction pattern perform on Snowflake?
Data Vault Agility on Snowflake
Partly inspired by Tron! Some practical consideration for deploying a Data Vault on Snowflake and taking advantage of some little-known nuances of the platform.
Kappa Vault
Ease of use of Snowflake for Data Vault streaming pipelines, how the loading patterns have changed.
You might be doing #datavault Wrong!
A long list of considerations when building your Data Vault, what to do, and not to do! Inspired by… people doing it wrong!
Seven Deadly Sins of Fake Vault
Born out of observing Data Vault implementation seen in the wild that do not follow the standards, DV2 practitioners have seen various unguided interpretations; these are the main sins we see in the industry
Theme and images inspired by Seven and Milton
Data Vault Mysteries… Business Vault
Just what is a Business Vault and why is its creation a mystery, it really shouldn’t be if you follow the standards!
Theme based on 1950s culture and story telling
Is it Business Vault or is it not?
An often-foggy area of Data Vault is how to define a Business Vault, here is some guidance
Apache Spark GraphX and the Seven Bridges of Königsberg
An example of building a Business Vault Link but using Big Data (Spark + Parquet) to get there. Theme inspired by the story of Euler and the origins of Graph theory.
Data Vault Mysteries… Effectivity Satellite and Driver Key
Just what does the Effectivity Satellite solve? And why do you need to define a driving key for it?
Effectivity Satellites are designed to deal with a gap in Data Vault modelling that there is no other way to solve.
Data Vault Mysteries… Zero Keys & Ghost Records
DV2.0 has a few esoteric concepts, this article describes the difference between default keys, ghost records and zero keys
Say NO to Refactoring Data Models!
Facing the same problems every data platform face is the challenge of making changes without regression testing and escalating costs. Sticking to the Data Vault 2.0 patterns rises to that challenge by promoting data agility.
Data Vault Naming Standards
Theory behind what naming standards should look like
A Rose by any other name… Wait.. is it still the same Rose?
Initially this article was released on Valentine’s Day, it delves into Passive Integration and Business Key Collision Codes by way of an example.
The Data Vault Guru: a pragmatic guide on building a data vault
A summary of what is in the book.
Data Vault has a new Hero
Originally titled “Solving Time Crime in Data Vault 2.0”; this article delves into how to deal with batch data that arrives out of sequence; this is an authorised extension of the DV2.0 standards called the eXtended Record Tracking Satellite (XTS). A data driven approach to dynamically enable the DV model to self-heal.
How I can get away without paying the Pied Piper… in Data Vault 2.0
What you learn on DV2.0 training is that a Data Vault model is not easy to query, to make it easier and to support your Information Models you build Point-in-Time and/or Bridge tables but the expense of querying the data vault is pushed to the creation of the PIT tables themselves. But what if you don’t have to?
Business Key Treatments
What do you do when a source provides business keys that don’t quite follow the standard business key assignment best practices? An approach to ensure passive integration without sacrificing automation.
What does dbt give you?
A gloss over dbt and its power of transformation
Passive integration explained…
Another take on explaining passive integration
Ep1: Immutable Store, Virtual End-Dates
Why Snowflake is well suited for Data Vault
Ep2: Snowsight dashboards for Data Vault
Using Snowsight for Data Observability over a Data Vault
Ep3: Point-in-Time constructs & Join Trees
How to build PIT tables tosolve getting data out of a Data Vault
Ep4: Querying really BIG satellite tables
A look at how to use Dynamic Pruning to solve querying of really big satellite tables
Ep5: Streams & Tasks on Views
Animated version of Kappa Vault
Ep6: Conditional Multi-Table INSERT, and where to use it
Another look at building PIT Flow Manifold
Ep7: Row Access Policies + Multi-Tenancy
How to combine multi-tenancy in Data Vault with Row Access Policies
Ep8: Hub locking on Snowflake
An interactive look at hub table locking in Snowflake, transaction isolation levels
Ep10: Virtual Warehouses & Charge Back
An approach on how to deploy your data architecture to suite Data Vault and a Charge-back model
Ep9: Out-of-sequence data
How do you handle data that arrives out of sequence dynamically and without needing to replay your loads?
[BONUS]: Handling Semi-Structured Data
An easy framework for handling semi-structured data in data vault on Snowflake
Snowflake, the Data Cloud
What’s in the box? Simplifying the concepts to your cognitive loads makes learning Snowflake so much more easier, and that’s the aim of this article.
Data Vault and Domain Driven Design
Delve into DDD and DV, a corner stone of Data Mesh.
Data Vault as a Product
Expanding on DDD with Data Products through a DV, another Data Mesh concept.
Data Vault and and Domain Oriented Architecture
Architecture patterns for Data Mesh and Data Vault.
Data Vault semantics & ontologies
Final blog in this series, linking Data Vault to the Semantic Layer and Domain Ontologies.
Data Vault and Analytics Maturity
Bonus blog discussing Data Vault and other methods for framing and modelling data.
Book 1: the data vault guru
a pragmatic guide on building a data vault
The data vault methodology presents a unique opportunity to model the enterprise data warehouse using the same automation principles applicable in today’s software delivery, continuous integration, continuous delivery and continuous deployment while still maintaining the standards expected for governing a corporation’s most valuable asset: data. This book provides at first the landscape of a modern architecture and then as a thorough guide on how to deliver a data model that flexes as the enterprise flexes, the data vault. Whether the data is structured, semi-structured or even unstructured one thing is clear, there is always a model either applied early (schema-on-write) or applied late (schema-on-read). Today’s focus on data governance requires that we know what we retain about our customers, the data vault provides that focus by delivering a methodology focused on all aspects about the customer and provides some of the best practices for modern day data compliance.
The book will delve into every data vault modelling artefact, its automation with sample code, raw vault, business vault, testing framework, a build framework, sample data vault models, how to build automation patterns on top of a data vault and even offer an extension of data vault that provides automated timeline correction, not to mention variation of data vault designed to provide audit trails, metadata control and integration with agile delivery tools.
- US: https://amzn.to/3d7LsJV
- UK: https://amzn.to/3nsqTfR
- AU: https://amzn.to/30IxOYF
- DE: https://amzn.to/2TiAsAb
- FR: https://amzn.to/37yfnKl
- ES: https://amzn.to/3jl5tOr
- IT: https://amzn.to/37Awag6
- NL: https://amzn.to/35sCpjc
- JP: https://amzn.to/3dNJgYq
- BR: https://amzn.to/3dRvIek
- CA: https://amzn.to/3jl5LVx
- MX: https://amzn.to/35pkslI
- IN: https://amzn.to/3jl65DJ
Other
- Snowflake Data Clean Room, bit.ly/3IeDzSu and bit.ly/3PhM8P7
- Snowflake, the Data Cloud, bit.ly/3NJqfGO and bit.ly/36Ho2we
- Github — https://github.com/PatrickCuba/the_data_must_flow
- Data Vault UK Interview — bit.ly/3baadp9
- Data Vault UK Presentation — youtube.com/watch?v=7lUn3eBiuyU
- Data Vault Munich Presentation — youtube.com/watch?v=tRPgijauH2w
- Meet the Expert: Data Vault — bit.ly/3t1hBe1
- Snowflake Data Vault User Group: DataOps — bit.ly/3qbnm7P
- DataVault Interview — data-vault.co.uk/patrick-cuba-interview/
- Integrating SAS and Data Vault, bit.ly/2YUw1xT
- Data Mapping, bit.ly/3s0kcEj, bit.ly/32FnFQI
- 3 Ways to load data into SQL Server MDS, bit.ly/3mirsbs
- My Hash of Hashes, bit.ly/2MGKE5L
- SAS indexing tricks, bit.ly/2L9gsiW
- SAS Parallelism, bit.ly/3oiQubn
- SAS SQL vs Data Step part 3, bit.ly/3s0Hbie
- SAS SQL Join vs Data Step Merge part 2, bit.ly/3nnfNaF
- SAS Hash Tables, bit.ly/3hPSwxg
- SAS Data Step Merge vs SQL Joins, bit.ly/3be5jIf