The first image most people have of the
data warehouse is a large collection of historical,
integrated data. While that image is correct in many regards, there is another very important element of the data warehouse
that is vital - metadata.
Metadata is data about data. Metadata has been
around as long as there have been programs
and data that the programs operate on. Figure 1 shows metadata in a simple
form.
While metadata is not new, the role of
metadata and its importance in the face of the data warehouse certainly is new.
For years the information technology professional has worked in the same
environment as metadata, but in many ways has paid little attention to
metadata. The information professional has spent a life dedicated to process
and functional analysis, user requirements, maintenance, architectures, and the
like. The role of metadata has been passive at best in this milieu.
But metadata plays a very different role in
data warehouse. Relegating metadata to a backwater, passive role in the data
warehouse environment is to defeat the purpose of data warehouse. Metadata
plays a very active and important part in the data warehouse environment.
The reason why metadata plays such an
important and active role in the data warehouse environment is apparent when
contrasting the operational environment to the data warehouse environment
insofar as the user community is concerned.
It serve to identify the contents and
location of data in the ware house metadata us bridge between the DWH and
decision support system application. Meta data is needed to provide an
unambiguous interpretation. Metadata provide a catalogue of data in the DHW and
pointer to this data. Meta data is used
to building, maintaining, managing, and using DWH.
Meta Data
repository should contain:
1. A
description of the structure of the DWH. This includes ware house schemes, view, dimensions,
hierarchies and derived data definition, data marts etc.
2. Operational
meta data such as data linkage, currency of data and monitoring information.
3. Summarization
processes which include dimension definition. Data on granularity partitions,
summary measure etc.
4. Detail
of data source which includes source databases and their content, gateway
description, a data partitions, data extractions etc.
5. Data
related to system performance.
6. Business
meta data, which includes business terms and definition, data owner ship
information and changing policies.
No comments:
Post a Comment