Relational Database Design
From Wikibooks, the open-content textbooks collection
Contents |
[edit] Intended Audience
This page is intended for students that have at least an undergraduate introduction to elementary computer science. This page attempts to present relational databases in an introductory manner that
[edit] Introduction
[edit] What is a database?
(From Wikipedia: http://en.wikipedia.org/wiki/Database) A database is an integrated collection of logically related records or files which consolidates records into a common pool of data records that provides data for many applications. A database is a collection of information that is organized so that it can easily be accessed, managed, and updated.
[edit] What is a relational database?
A relational database is a database that organizes its data into collections of Tables, Rows, Attributes, and Domains. Predicate logic is used to both describe the information contained in the database and to query information from it. (See Wikipedia: http://en.wikipedia.org/wiki/Relational_model#Overview)
[edit] Terms
[edit] Domains
Domains are the set of allowable data values for a Column. For example, the FiveDigitZipCode Column on the customer entity can be in the integer domain. As such, the database would not allow you to place values like 123.45 (floating point) or ABC (character) into that Column. This is often referred to as a column's type.
[edit] Columns
Columns are the attributes that describe an entity in the database model. For example, the customer entity may have attributes for First Name, Last Name, Address, City, State, and FiveDigitZipCode.
[edit] Rows
A Row is a complete set of Columns that describe the entity that you are trying to model.
[edit] Tables
Tables are collections of Rows that act as a logical groupings of entities.
[edit] Databases
A collection of related Tables and any supporting objects (e.g. stored procedures) is often referred to as a Database (or schema). Multiple Databases are usually logically separate from one another.
[edit] Relations
[edit] Arity
[edit] Cardinality
[edit] Constraints
Constraints are ways to further constrain data above and beyond what Domains allow. Constraints are also used to enforce Referential Integrity, which is what prevents logically incomplete data from residing in the database.
[edit] Primary keys
A Primary Key is a Column that uniquely identifies a particular Row in a Table. For example, a person entity may have a Column for SSN. If in your data model each person has a unique SSN, then it may be a candidate for a Primary Key. (Primary Keys can consist of two or more Columns, but this is not covered here.)
Primary Keys are also the means by which Foreign Keys work. Because of this, SSN may actually not be a good choice as a Primary Key. In practice, Rows often have a unique numeric identifier (often called an identity or sequence value) that uniquely identifies a particular Row. These kinds of values are often used as Primary Keys.
It should be noted that RDBMSes often use a Table's Primary Key's column(s) to automatically create a Structured Index on that Table. A Structured Index is an index that physically re-orders the data to match the index. This is done to improve query performance, but can actually hurt performance if the wrong column(s) are used as the Primary Key.
[edit] Foreign keys
A Foreign Key is a way to further constrain the allowable values of a Column to data that exists in another Table. For example, if you have to process orders in your system, you may create a Table called OrderInfo to store order information. An order has to be associated with a customer, so you may have a Column in the OrderInfo Table called CustomerID that somehow connects to an associated Row in the Customer Table.
Most likely you do not want to be able to create orders for customers that do not exist, and you would not want to delete a customer that is associated with any orders. Doing so would break the Referential Integrity of the data. A Foreign Key relationship ensures that these two rules are enforced.
By creating a Foreign Key relationship between OrderInfo's CustomerID column and the Primary Key of the Customer Table, the RDBMS will ensure that CustomerID always refers to a single existing Row in the Customer Table, and will also prevent you from deleting that associated Row because one or more Rows in OrderInfo depend on it.
[edit] Other Constraints
[edit] Relationships
[edit] Relating two entities
[edit] Entity-Relationship diagrams
[edit] Moving to a physical design
[edit] Normalization
If you've been working with databases for a while, chances are you've heard the term normalization. Perhaps someone has asked you "Is that database normalized?" or "Is that in BCNF?" All too often, the reply is "Uh, yeah." Normalization is often brushed aside as a luxury that only academics have time for. However, knowing the principles of normalization and applying them to your daily database design tasks really isn't all that complicated and it could drastically improve the performance of your DBMS.
In this article, we'll introduce the concept of normalization and take a brief look at the most common normal forms. Future articles will provide in-depth explorations of the normalization process.
So, what is normalization? Basically, it's the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminate redundant data (for example, storing the same data in more than one table) and ensure data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.
The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen and won't be discussed in this article.
Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it's extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies. That said, let's explore the normal forms.
First normal form (1NF) sets the very basic rules for an organized database: Eliminate duplicate columns from the same table. Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
[edit] The normal forms
Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it's extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies. That said, let's explore the normal forms.
[edit] 1NF
First normal form (1NF) sets the very basic rules for an organized database: Eliminate duplicate columns from the same table. Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
[edit] 2NF
Second normal form (2NF) further addresses the concept of removing duplicate data: Meet all the requirements of the first normal form. Remove subsets of data that apply to multiple rows of a table and place them in separate tables. Create relationships between these new tables and their predecessors through the use of foreign keys.
[edit] 3NF
Third normal form (3NF) goes one large step further: Meet all the requirements of the second normal form. Remove columns that are not dependent upon the primary key.
[edit] Boyce-Codd NF
[edit] 4NF
Finally, fourth normal form (4NF) has one additional requirement: Meet all the requirements of the third normal form. A relation is in 4NF if it has no multi-valued dependencies.
[edit] Domain-Key NF
[edit] Denormalization
After reading about Normalization above and how things should be broken up, you may be thinking:
Well, now I have all these tables all over the place, and I have to join a buncha them just to get simple sets of data out of the system! How is that gonna affect the performance?
Well, you may be right. In some cases where performance is an issue, it may be necessary to denormalize parts of your database/schema and have wider tables that may contain duplicated/redundant data.
However, such decisions are best left in the hands of an experienced DBA, because there may be other ways to improve performance such as physical design, indexes, query optimization, etc. that DBA's specialize in. Database Denormalization, just like Normalization, is best left in the hands of the experienced.