Data Warehouse is designed to meet
some business needs, which is transaction system cannot do (and vice versa).
Data Warehouse is designed to meet some business needs, which is transaction
system cannot do (and vice versa). he big
question is on why we have to create a Data-Warehouse?. The reasons are as
follows.
As a starter, let’s say you want to
know the monthly variations in 3 months running average on your customer
balances over last twelve months grouped by products+ channels+ customer
segments. Let’s see why do you need a data-warehouse for this purpose.
1.2.1 Keeping
Analysis/Reporting and Production Separate.
If you run the above-said query on your
production systems, you will find that it will lock all your tables and will
eat-up most of your resources, as it will be accessing a lot of data doing a
lot of calculations. This results in the production work to come to a virtual
halt. Imagine hundreds of such above-said queries running at the same time on
your production systems.
Reporting and analysis work typically
access data across the database tables, whereas production work typically
accesses specific customer OR product OR channel record at a point of time.
That’s why it is important to have the Information generation work to be done
from an offline platform (aka. Data Warehouse). Purpose of Data Warehouse is to
keep analysis/reporting (non-production use data) separate from production
data.
1.2.2 Information
Integration from multiple systems- Single point source for information
As an example- Let’s say you have
different systems for say a loan product vs. credit card product. The
above-said query, if run on production will need to pick the data on real time
basis from these systems.
This will make the query extremely
slow, and will need to do connects in the intermediate tables OR in run-time
memory. Moreover it will not be a reliable result as at a particular point of
time, the databases may not be in synch as many of such synching happen in the
end of day batch runs.
1.2.3 DW purpose
for Data Consistency and Quality
Organizations are riddled with tens of
important systems from which their information comes. Each of these systems may
carry the information in different formats and also may be having out of synch
information. (Different customer ID formats, mismatch in the supplier
statuses). By bringing the data from these disparate sources at a common place,
one can effectively undertake to bring the uniformity and consistency in data
(Refer to cleansing and Data Transformation).
1.2.4 High Response
Time- Production Databases are tuned to expected transaction load
Even if you run the above-said query on
an offline database, it will take a lot of time on the database design, which
is same as that of production. This is because the production databases are
created to cater to production work. In production systems, there is some level
of expected intensity for different kind of actions. Therefore, the indexing and
normalization and other design considerations are for given transaction loads.
However, the Data-warehouse has to be ready for fairly unexpected loads and
type of queries, which demands a high degree of flexibility and quick response
time.
1.2.5 Data Warehouse
objective of providing an adaptive and flexible source of information.
Its easier for users to define the
production work and functionalities they want, but difficult to define the
analysis they need. The analysis needs keep on changing and Data-Warehouse has
the capabilities to adapt quickly to the changing requirements. Please refer to
'Dimension Modeling'
1.2.6 Establish the
foundation for Decision Support
Decision process of an organization will involve
analysis, data mining, forecasting, decision modeling etc. By having a common
point, which can provide consistent, quality data with high response time
provides the core enabler for making fast and informed decisions.