Business Intelligence/documentation frame data sources

From Wikibooks, open books for an open world
Jump to: navigation, search
  • Purpose: To identify data location for frame and associated data sources
  • Input: List of KPI's, dimensions and budgets
  • Activities: Identifying the data sources necessary to populate the dashboard and reports with data
  • Outputs: List of dimensions, facts and budgets
  • Documentation: The output is the documentation of data sources for the frame

Introduction[edit]

This step requires moving from concepts (KPI's, dimensions) to measures (data). It is difficult, if not impossible, to directly measure concepts. How does one measure income? Gross or net revenue? The goal is not to find the correct measure but the most appropriate based on the theory driving the strategy. It is therefore necessary to search databases and data warehouses for measures that approximate concepts. The final step is to determine the lowest level of analysis in the data sources. This completes the link between KPI's (concepts) and facts (measures).

Metadata directory[edit]

Note that the ease of accomplishing this goal depends on the extent of the completeness of the metadata directory. A metadata directory "is the database that carriers all the data describing the data warehouse (Poe et al. 1997)". It is difficult to implement because, regarding a data warehouse project, "the process is the deliverable (Poe et al 1997)". Ideally, the data architecture and metadata directory will be easily accessible. If this is not the case finding the measures will prove much more difficult.

This stage will also require the input of the data warehouse administrator(s). Interviewing the administer should provide knowledge of the architecture that will prove invaluable in understanding how to match concepts to measures.

Documenting the data sources for frames requires:

  • Understanding enterprise data warehouse architecture
  • Obtaining data source metadata
  • Matching the concepts to the appropriate database or data warehouse columns

Understanding enterprise data warehouse architecture[edit]

Identifying company data sources for measures first requires understanding the enterprise data architecture. First, we need to identify the components of a data warehouse architecture. These include (Poe et al. 1997):

  • Data extraction from source systems, databases, files, etc.
  • Data form the source systems is integrated before being loaded into the data warehouse
  • A data warehouse is a read-only database designed specifically for decision support
  • Front-end tool or application used to access the data warehouse

So, an enterprise data architecture includes all the data sources used to populate the data warehouse, the structure of the data warehouse, the extraction of data from the data warehouse into data marts and how this data is then extracted and used by users or business units.

Data warehouse architectures[edit]

Poe et al. (1997) cover data warehouse architectures and provide example diagrams. Drawing from their work, the first enterprise data architecture is presented in the figure showing a Data Warehouse Feeding Data Marts. Note that the database could be any type of data source. The goal is to identify the all data sources in the company. This is because each data source could contain columns, in the case of a database style data store, that measure the KPI's. These could exist in the data mart, data warehouse or database. Note that it is much easier to extract the necessary data from a data warehouse or mart structure than from a database.

Figure: Data Warehouse Feeding Data Marts


Another type of data warehouse architecture is presented in the figure depicting no Enterprise Data Warehouse. This structure presents some difficulties because it is less likely that the measures and data exist outside the databases in an easily extractable structure. This case is generally rare in large organizations.

Figure: No Enterprise Data Warehouse

Finally, there is the case where an integrated Database feeds a Data Warehouse. In this case the business user or business unit both extracts the data for reporting purposes and also updates the data. Regardless of the type of source it is important to obtain the enterprise data warehouse architecture, especially the metadata on each data source.

Figure: Integrated Database Feeding a Data Warehouse

Organizing data source metadata=[edit]

Metadata is data about the data. It provides information about:

  • Data Structures
  • Relationships between the data structures within or between databases

For data warehouses there are two types of metadata:

  1. Integration metadata shows associations between source systems and warehouse
  2. Transformation metadata maps data from warehouse to end users' front-end tools

Transformation metadata is important as it includes business names and hierarchies.

Matching concepts to database or data warehouse facts[edit]

Finally, it is necessary to identify the database columns that closely match the concepts. For instance, if on-time departure is the KPI that we wanted to measure, the goal is to find a sufficient measure of this concept. An operational database or the data warehouse should have a column identifying the date and time the train was scheduled to depart versus the actual departure data and time.

Union Pacific Facts

Matching concepts to database or data warehouse facts[edit]

The other activity is to identify the dimensions that will be used to analyze the facts. For instance, commodity mix is an important dimension. Does the rate per pound of cargo differ based on the commodity mix? What is the average rate? Is it worth targeting a specific customer in order to increase cargo from inthat commodity mix?

Union Pacific Dimensions