Data Vault Test Automation
--
Modern day data analytics platforms sometimes do not enforce referential integrity (foreign key constraints), the idea is that such restrictions enforce strict rules on the underlying data that the latency from business event to analytic value suffers. BUT we still need to have faith on the data that is loaded onto the platform, no less!
Data Vault 2.0 does not impose restrictions either! It is as scalable and flexible as the platforms hosting it. A data vault has repeatable patterns for data modelling, for data architecture, for data loading and of course, test automation! Each component is an independent unit of work, certain loading patterns can also be anti-patterns in certain situations, let’s explore that and a framework for efficient test automation!
Post pre-reading:
· Snowflake Streams: bit.ly/3hdW79o
· Snowflake Multi-table insert: bit.ly/3h97Ypz
· Read isolation: bit.ly/2UlGFz8
· Direct Acyclic Graphs (DAG): bit.ly/369Gb1F
· Apache Airflow pools: bit.ly/3dGBhxj
Patterns, Patterns, Patterns!
Data Vault 2.0 is delivered in patterns.
· Patterns for modelling,
o Hub tables, Business Objects based on Business Architecture (the first block of Enterprise Architecture)
o Link tables, Unit of Work of Business Processes that influence Business Objects and may take a Business Object from one state to another
o Satellite tables, change descriptive details of either the Business Objects or the Unit of Work.
· Patterns for table creation,
o Hub — define hub table with standard data vault metadata tags and the business key.
§ Define a hub reconciliation stream on the hub table
o Link — define link table with standard data vault metadata tags
§ Define a link reconciliation stream on the link table
§ Define an orphan stream on the link table for every parent business key the link relates to
o Satellite — define satellite table with standard data vault metadata tags and column attributes