Data Vault Test Automation

Patrick Cuba
10 min readJul 7, 2021

Modern day data analytics platforms sometimes do not enforce referential integrity (foreign key constraints), the idea is that such restrictions enforce strict rules on the underlying data that the latency from business event to analytic value suffers. BUT we still need to have faith on the data that is loaded onto the platform, no less!

Data Vault 2.0 does not impose restrictions either! It is as scalable and flexible as the platforms hosting it. A data vault has repeatable patterns for data modelling, for data architecture, for data loading and of course, test automation! Each component is an independent unit of work, certain loading patterns can also be anti-patterns in certain situations, let’s explore that and a framework for efficient test automation!

Post pre-reading:

· Snowflake Streams: bit.ly/3hdW79o

· Snowflake Multi-table insert: bit.ly/3h97Ypz

· Read isolation: bit.ly/2UlGFz8

· Direct Acyclic Graphs (DAG): bit.ly/369Gb1F

· Apache Airflow pools: bit.ly/3dGBhxj

Patterns, Patterns, Patterns!

Data Vault 2.0 is delivered in patterns.

· Patterns for modelling,

--

--