Some fifteen years ago I created my first data warehouse. This was after a decade in several financial functions like auditing, controlling and reporting.
The experienced consultant that reviewed my data warehouse had a lot of comments. Why didn’t I incorporate more attributes he asked? There must be more information in the source systems … what about age, gender, occupation, marital status of the people in this dimension?
He was right of course but as the business expert I didn’t feel the need. I hadn’t needed this information in my reports and I had my doubts as to the data quality of these attributes in the source systems.
Learning from criticism
But of course I learned from his advice and in later projects I found myself convincing customers that they needed these attributes or would probably need them in the future. And if we didn’t add them now it would cost much more in a future update.
Revisiting my learned behavior
So me and my team spend a lot of time extracting attributes, cleansing them and loading them in dimensions. Spending a lot of time and customer’s money on adding attributes they didn’t need in the first place. Nowadays I will suggest a more agile approach: loading the data as is in an historical staging area or a data vault and only cleanse/add them to the dimensional model when needed.