What is a Data Warehouse?
This blog entry will revisit or reimagine the tired old enterprise data warehouse (EDW) to see if there are any fresh ways we can look at it and its traditional implementations to determine if there are any gains to be made from doing so. I’m calling this exercise data warehouse acceleration.
Before we start this discussion, I think that revisiting the original purpose of a data warehouse may be in order. An enterprise data warehouse is defined as an integrated, line of business and or subject-oriented, collection of time-variant and nonvolatile enterprise-wide data. We can further define each EDW attribute as:
Integrated
Data integration implies a well-organized effort to define and standardize all disparate data elements from throughout the enterprise into one central collection. Integration can be a time-consuming process; however, well-integrated data results in a higher functioning data warehouse. Further benefits include increased data compliance and accessibility for users.
Subject-oriented
The integrated data is arranged and optimized to provide answers to questions coming from diverse functional areas within an organization. Therefore, the data warehouse contains data organized and summarized by topic, such as student demographics and human resources.
Time-variant
The integrated data includes present and historical data, allowing for in-depth historical analysis and therefore, better forecasting capabilities sometimes referred to as predictive analytics.
Nonvolatile
The integrated data that enters the data warehouse is never ―or is almost never – removed. Depending upon the nature of the decision making, data moves from operational entry into the data warehouse at a certain frequency or predetermined schedule. The source system(s) for the data warehouse are usually transactional and therefore volatile, but the source system(s) data is extracted into the nonvolatile data warehouse and preserved for future retrieval. This creates “snapshots in time” of the transactional or volatile source system data and make historical data analysis possible.
Data warehouse implementations, done properly, are massive and expensive undertakings that involve not just the information technology (IT) departments, but all areas of the organization. These projects typically involve months or even years long project timelines and teams of highly paid IT professionals and business analysts as well as subject matter experts throughout the enterprise that must be identified and then highly engaged in the effort. There is an enormous level of “buy in” or investment in thought, manpower and money from the organization necessary to make an EDW project truly successful. Along the road to a successful EDW project there are many common challenges that await such as source system data quality issues, data quality assurance, both efficient and flexible DW design and structure, data movement and ad hoc reporting performance concerns, cost overruns due to inadequate requirements gathering and or poor estimates of effort, and finally, the most important challenge is user acceptance. All the time, effort and associated costs are for naught if at the end of the day (or project, in this case), perhaps due to a lack of training or some other contributing factor, there exists a lack of confidence in the EDW by the users that it is intended to benefit. Gartner estimates that as of 2017, more than 70% of enterprise level data warehouse projects were deemed failures due to one or more of these challenges not being adequately overcome.
What is Data Warehouse Acceleration?
I know, get to the point already! OK, now that we’ve refreshed our memories on what an enterprise data warehouse is and what it can do for an organization as well as the potential pitfalls, let’s turn our attention to defining data warehouse acceleration. Basically any lift or advantage that we can give ourselves in the areas of challenge that were just mentioned can not only make the difference between success or failure in an EDW project, it can also greatly reduce the time necessary to see the positive return on investment (ROI) that is so very critical to any organization implementing such a large scale project. Data warehouse acceleration aims to accomplish exactly that. It is a reexamination of the traditional approaches to each aspect of a DW implementation project to see if there is anything to be gained by non-traditional thinking and approaches to the challenges at hand. Not every area of a DW project will readily benefit from these non-traditional approaches. Sometimes the time honored, best practice approach is still the way to go, but as we mentioned even small advantageous changes can lead to big gains in the overall EDW implementation project. That is not to say that we shouldn’t continue the quest for “the better mouse trap” in all the areas of challenge in the EDW implementation. It is almost always healthy to question accepted paradigms and reexamine traditional approaches. Unless we do so from time to time, the “better mouse trap” will never come into existence.
Sources: