Data Vault 2.0 has a new hero…

Patrick Cuba
12 min readMay 18, 2021

Analytics assumes that data movement will follow a linear path into a historical repository to represent the correct sequence of events. That is, if today is Thursday and we have loaded data for Wednesday we then do not expect data from Tuesday to arrive today because all the data up until today should have already been loaded. Whether it be a missing batch file or a missing data record — a Tuesday file may have arrived and loaded but could have been provided without a complete set of records — there are a myriad of reasons why these scenarios might occur (a locked record due to an update in the database may cause a push file produced from the source system to be incomplete). These are real scenarios in loading a data warehouse and it can skew what we know of a business entity (e.g. customer) and may cause erroneous analytics to be derived for that business entity or worse, the wrong facts are reported to a regulatory body or to the customers themselves.

The late record is (in data parlance) known as “out-of-sequence” data and to us data professionals this is a time crime. The repercussions could have legal ramifications, reputation risk, loss of market share (etc.) if the analytics is influenced by something that might be…

--

--