Data Vault Agility on Snowflake

Patrick Cuba
15 min readFeb 8, 2022

Following on from my previous articles about Data Vault on Snowflake in the bullet points below. If you’re already caught up, let’s proceed directly with the purpose of this content!

  • Data Vault on Snowflake (bit.ly/3dn83n8) ← to hash or not to hash in Snowflake
  • Data Vault Dashboard monitoring (bit.ly/3CSP3aV) ← using Snowsight to monitor the automated Data Vault test framework
  • Data Vault PIT Flow Manifold (bit.ly/3iEkBJC) ← data driven PIT builds
  • Why Equijoins Matter! (bit.ly/3wohniH) ← right-deep join trees and star optimized queries with PITs
  • Data Vault’s XTS pattern on Snowflake (bit.ly/2Yt2yz6) ← timeline correction and applied date
  • Say NO to refactoring Data Models! (bit.ly/3tPI66B) ← the data model built to change

Snowflake is the cloud data platform serving multiple types of workloads and a popular workload is loading data into a data vault on Snowflake. Yes, Data Vault implies repeatable patterns, but it is also up to you the architect to understand the Snowflake architecture and how we can leverage those for a Data Vault. In the blogs I have shown some techniques that help with optimizing your data vault, now let’s get into how we can leverage some more technology this time to be able to load hub tables from all directions at the same time and still ensure hub table business key uniqueness. For an in-depth look into this concept visit:

  • A Rose by any other name… wait… is it still the same Rose? (bit.ly/3xlFK0s) ← business key collision codes (BKCC) to ensure passive integration from multiple sources. The same key from a single source system but multiple files should have the same business keys, if we’re lucky, the same business key representing the same business object exists across source systems. If a clash could occur, then we use BKCCs to ensure hub table uniqueness.

So, you see, the hub table is the integration point, the focal point if you will be ingesting multiple sources at any cadence and recognize all link and satellites tables are simply describing details about the hub, the business object, the thing the business capability is based on that make up a business.

--

--

Patrick Cuba

A Data Vault 2.0 Expert, Snowflake Solution Architect