Business Analysis Guidebook/Data Modeling and Data Documentation
Business data is a key asset for any organization. The information contained within data plays a large role in the operations, reporting and measurement of how the organization is performing. An information system provides the means to capture and share data, and to monitor business performance. A sound data structure is universally required to ensure data integrity and to minimize future application maintenance costs. Creating the ‘right’ data store for an information application involves data governance, understanding data flows, and developing a reliable foundation to ensure ongoing reliability, stability, and maximal effectiveness for the investment.
Data Governance and Security
The investments in information systems necessary to build new applications, modify existing applications, or to buy and migrate to COTS (Commercial Off-the-Shelf) solutions can be very high for any organization. In order to ensure that those investments result in maximum returns, controls may be implemented in the form of data governance and security. These controls may exist within an organizational subdivision, across subdivisions, or across an entire organization. Some controls are even global in nature, determined by international agreements and guided by generally accepted standards.
Data governance is concerned with who is responsible for the organizational data and how governance decisions and processes are executed. One or more different approaches may be used in the governance of business data. These include:
- Top-down – decisions are authoritative and affect the entire organization
- Bottom-up – decisions relating to data are established at low level (e.g., naming conventions)
- Center-out – typically involves a consultant hired to consider governance needs from all stakeholders and make organization-wide recommendation (results in Top-down for approved decisions)
- Silo-in - multiple groups set governance standards based on collective agreement .
Goals of data governance include ensuring the quality and security of business data, conformance of the data to organizational standards, overarching data rules and definitions, and the assurance that the data will be trustable.
When creating a new information system, the application is generally sponsored by one or more business program areas or departments. Organizational data governance requires the identification of those who will ‘own’ the data. These owners will be responsible for oversight, coordination with the organization’s Governance Board, conformance with adopted standards and law, and securing the data access rights. An ‘Owner’ may be responsible for data within the domain of the application, or the responsibility may be spread out between multiple owners across a group of integrated applications. Regardless of how the owner (or owners) of the data is determined within an organization, the approvals for project data requirements will come from within this ownership role.
Data redundancy is caused when the same information is captured multiple times within the same or multiple data store(s). Most information systems today are supported by data stores built on the relational schema originally designed by Edgar Codd. This type of schema minimizes redundancy and ensures data integrity. Data redundancy means that there are multiple instances of the same information captured within the same or multiple integrated data store(s). Issues relating to data redundancy revolve around conflicts between these multiple instances of the data and lead to higher operational costs associated with the data. These costs include those associated with data entry, retrieval and maintenance. Proper data definition ensures that any redundancy is identified and can be corrected or mitigated before the system is actually designed, lowering the associated development and ongoing maintenance costs .
Documentation of application data is critical to ensure a common understanding of the information contained within the data and to assure the reliability of that information. Business Analysts are often responsible for creating data documentation for information system projects and the alignment of this documentation with governance processes and rules. Documentation artifacts will span the entire SDLC from Initiation through Implementation.
Data Flow Diagrams
Tracing the flow of data through an application includes identifying the sources, transformations and destinations of the data that will be supported. The destinations include both the data stores and the downstream processes that use the data. Data sources may be shared within and across the organization or may involve external entities. They include data entry, existing databases, computer files, and data streams from external applications. Once the sources are determined, the processes that must occur to transform or manipulate the data are defined. These processes include the conversion of data from the source format to the destination format. The destinations include databases, files, reports and external applications that receive the output from the transformation processes. A Data Flow Diagram captures this information for verification and design purposes.
There are many tutorials available on the Internet to refer to for the creation of a data flow diagram. Generally, consistency in rendering the data input(s), processing and data output(s) will accomplish the goals of defining data flows in a manner that is easily understood by those who will review and verify the diagram contents. An example of a Data Flow diagram using the Yourdon/DeMarco notation is included in the image at right.
The most common format used to capture the definition of data elements is the Data Dictionary. This artifact is critical for purposes of data governance and for ensuring that all stakeholders share a common understanding of what the data is. Typical information that is contained in the data dictionary includes identification of where the data resides (information system, relational table), the exact database field or data element, what the data element represents and the format of the element. The image at right provides an example of what a data dictionary may look like.
Common Data Matrix
A Common Data Matrix is used during the Requirements Gathering phase of a project when there is at least some level of data integration required involving multiple applications, and /or some degree of MDM, and /or some form of organizational data governance. It is a form of a data dictionary that encompasses the scope of the governance domain. To construct this type of matrix, data elements are listed for the scope domain, with information about that data in relation to the organizational units, or applications, that interact with that data. This type of diagram should be constructed to include information applicable to the problem under consideration (see image at right).
Consider an information system that uses external data elements from other organizational (internal) applications. The definitions, meanings and formats of these elements become critical for integration. Within a high governance maturity level organization, a customer first name (for example) would have the same data element definition and format across the organization. In reality, most organizations have not fully conformed all data definitions across the entire organization; data elements for integration projects should be summarized using a Common Data Matrix when determining project requirements for the integration effort. In such an instance, the information included within the matrix (the matrix cells) might contain data element formatting information to identify any transformation(s) that may be needed for integrating the data with the project information.
Modeling Data Requirements
For purposes of establishing a new application and for application enhancements, modeling the data at a high level provides guidance in the design efforts. Typically, Business Analysts are not involved in the actual development of an application, but the requirements associated with a project will determine what is needed to support the project deliverables. This information is captured in data models, expressed at varying levels of detail.
A conceptual data model provides an illustration of business rules in relation to the data entities, including organizations, things, and/or applications, that must be captured within an application and the relationships needed between those elements to support operations. It provides an outline of the data interactions between those entities included within the scope of a project, illustrating the context for the data flows that need to occur. This type of model is very high-level, providing justifications for the design of the data store for an application, and does not include any specific entity attributes. The model is independent of whatever technology is used to implement the solution.
Logical Entity Relationship Model
- Thomas, Gwen. "Choosing Governance Models". Data Governance Institute. http://www.datagovernance.com/gbg_governance_models.html. Retrieved 02/05/2014.
- Thomas, Gwen. "Assigning Data Ownership". Data Governance Institute. http://www.datagovernance.com/gbg_assigning_data_ownership.html. Retrieved 02/05/2014.
- "Edgar F. Codd". IBM Research News. 2003. http://www-03.ibm.com/ibm/history/exhibits/builders/builders_codd.html. Retrieved 02/12/2014.
- Streiff, John (2001). "Estimating the Costs of Data Redundancy?". http://eai.ittoolbox.com/groups/vendor-selection/eai-select/estimating-the-cost-of-data-redundancy-10000. Retrieved 02/12/2014.
- Yourdon, E. (2006). "Just Enough Structured Analysis, Chapter 9". http://www.yourdon.com/PDF/oldJESA/SAch9.pdf. Retrieved 02/05/2014.
- Seiner, Robert S. (2006). "The Data Stewardship Approach to Data Management". The Data Administration Newsletter - TDan.com. http://www.tdan.com/view-articles/4042. Retrieved 02/05/2014.
- Gottesdiener, Ellen (2005). The Software Requirements Memory Jogger, 1st edition. GOAL/APC. p. 183.