Structured Query Language/Foreign Key

From Wikibooks, open books for an open world
Jump to navigation Jump to search



Foreign Keys (FK) define a directed reference from one table (the child) to another table (the parent). This reference acts as long as the involved columns of the two tables contain identical values. It couples one row of the child table to a single row of the parent table - a row of the parent table may be coupled by many rows of the child table.

E.g.: You may have the table department with column id and the table employee with column dept_id. If you want to assign an employee to a distinct department, you store the department-id in its column dept_id. This can be done in every case-independent from any Foreign Key definition. But in such cases people often have two additional requirements: First, employees shall only be assigned to departments which really exist. Second, as long as employees are assigned to a distinct department, it shall be impossible to delete this department. The main purpose of Foreign Keys is to guarantee these two requirements.

In other words: Foreign Keys guarantee that no orphans will arise.

Foreign Key vs. Join[edit | edit source]

Within RDBMs, identical values are used to link rows of different - and sometimes of the same - table together. Because this linking works on the basis of values and not of any link or special reference, it has no direction. In general, we call this technique a JOIN. Foreign Keys have a very similar concept because they also link rows with identical values together. But there are important differences:

  • Foreign Keys have a direction. It is important to know which one of the two affected tables is the child table and which one is the parent table.
  • Joins must be expressed within every DML statement, which is interested in this join (with the exception of views). In contrast, Foreign Keys are part of table definitions. All DML commands bear them in mind without expressing them within a DML statement.

Syntax[edit | edit source]

-- As part of CREATE TABLE command
CREATE TABLE <table_name> (
  ...
  CONSTRAINT <constraint_name> FOREIGN KEY (<column_name>) REFERENCES <parent_table_name> (<other_column_name>)
);

-- As part of ALTER TABLE command
ALTER TABLE <table_name> ADD  CONSTRAINT <constraint_name> ... ; -- same as above
ALTER TABLE <table_name> DROP CONSTRAINT <constraint_name>;      -- throw the definition away


Rules:

  • FK-constraints can be defined during table definition (CREATE TABLE) or afterward (ALTER TABLE). On this page, we focus on the CREATE TABLE statement. The syntax of the ALTER TABLE statement is very similar.
  • FK-constraints belong to the child table definition.
  • Despite an existing FK-constraint, it is possible that rows of the child table don't belong to any parent row. This occurs if the column value of the child row is NULL. If you want to avoid such situations, define the column as 'NOT NULL'.
  • Although the FK-constraints belong to the child table, they also have consequences for the parent table such that rows of the parent table, which have existing rows in the child table, can not be deleted.
  • The denoted parent table must exist.
  • The denoted column of the parent table must be its Primary Key or a column, which is UNIQUE.
  • It is perfectly all right to use the same table as parent and child table within one FK-constraint, see: Exercises.
  • One table may be subject to a lot of FK-constraints.

Example[edit | edit source]

The example defines the tables department and employee. The Foreign Key definition of employee declares department as the parent table of employee.

--
-- The parent table: DEPARTMENT
CREATE TABLE department (
  id         DECIMAL,
  dept_no    CHAR(10),
  dept_name  VARCHAR(100),
  CONSTRAINT dept_pk PRIMARY KEY (id)
);

-- The child table: EMPLOYEE
CREATE TABLE employee (
  id         DECIMAL,
  emp_name   VARCHAR(100),
  dept_id    DECIMAL,
  CONSTRAINT emp_pk PRIMARY KEY (id),
  CONSTRAINT emp_dept_fk FOREIGN KEY (dept_id) REFERENCES department(id)
);

-- This INSERT will fail because currently there is no department with id 10.
INSERT INTO employee   (id, emp_name, dept_id)  VALUES (1, 'Mike Baker', 10);
COMMIT;

-- It's necessary to store the department first.
INSERT INTO department (id, dept_no, dept_name) VALUES (10,'D10', 'E-Bike Development');
INSERT INTO employee   (id, emp_name, dept_id)  VALUES (1, 'Mike Baker', 10);
COMMIT;

-- The department may have a lot of employees
INSERT INTO employee   (id, emp_name, dept_id)  VALUES (2, 'Elenore McNeal', 10);
INSERT INTO employee   (id, emp_name, dept_id)  VALUES (3, 'Ted Walker', 10);
COMMIT;

-- This DELETE will fail because currently there are employees within the department.
DELETE FROM department WHERE dept_name = 'E-Bike Development';
COMMIT;

This kind of modeling allows the representation of hierarchical tree structures. One or many child nodes (rows) belong to a single parent node (row). In the context of DBMS, this kind of association is called a 1:m relationship.

n:m Relationship[edit | edit source]

In the real world, there are more association types than 1:m relationships. Often there are so-called n:m relationships where objects (rows) belong to more than 1 other object (row). Thereby the meaning of parent/child tables gets lost. In our example database there is a table hobby and another table person. One person may pursue multiple hobbies. At the same time, multiple persons may pursue the same hobby. This can be designed by creating a third table between the two original tables. The third table holds the IDs of the first and second table. So one can decide which person pursues which hobby.

The technique to realize this n:m situation is the same as shown in the previous chapter with its 1:m association - it is only used twice. We define two Foreign Keys, which start from the 'table-in-the-middle' and refers to the two other tables. In a technical sense, we can say, that the 'table-in-the-middle' is the child table for the two parent tables person and hobby. person and hobby are at the same logical level.

--
CREATE TABLE t1 (
  id               DECIMAL,
  name             VARCHAR(50),
  -- ...
  CONSTRAINT t1_pk               PRIMARY KEY (id)
);
CREATE TABLE t2 (
  id               DECIMAL,
  name             VARCHAR(50),
  -- ...
  CONSTRAINT t2_pk               PRIMARY KEY (id)
);
CREATE TABLE t1_t2 (
  id               DECIMAL,
  t1_id            DECIMAL,
  t2_id            DECIMAL,
  CONSTRAINT t1_t2_pk            PRIMARY KEY (id),       -- also this table should have its own Primary Key
  CONSTRAINT t1_t2_unique        UNIQUE (t1_id, t2_id),  -- every link should occur only once
  CONSTRAINT t1_t2_fk_1          FOREIGN KEY (t1_id) REFERENCES t1(id),
  CONSTRAINT t1_t2_fk_2          FOREIGN KEY (t2_id) REFERENCES t2(id)
);

ON DELETE / ON UPDATE[edit | edit source]

So far, we have assumed that rows of the parent table cannot be deleted if a row in the child table exists, which refers to this parent row. This is the default, but all in all, the SQL standard defines five options to handle this parent/child situation in various ways. The options extend the constraint definition. They are:

  • ON DELETE CASCADE: If a row of the parent table is deleted, then all matching rows in the referencing table are deleted.
  • ON DELETE SET NULL: If a row of the parent table is deleted, then all referencing columns in all matching rows of the child table are set to NULL.
  • ON DELETE SET DEFAULT: If a row of the parent table is deleted, then all referencing columns in all matching rows of the child table are set to the column’s default value.
  • ON DELETE RESTRICT: It is prohibited to delete a row of the parent table if that row has any matching rows in the child table. The point in time when checking occurs can be deferred until COMMIT.
  • ON DELETE NO ACTION (the default): It is prohibited to delete a row of the parent table if that row has any matching rows in the child table. This holds true in ALL cases, even if checking is deferred (see next chapter).

Analog to the ON DELETE option, there is an ON UPDATE option. It defines the same five options for the case of changing a column in the parent table, which is referred by the column of a child table.

  • ON UPDATE CASCADE: Any change to a referenced column in the parent table causes the same change to the corresponding referencing column in matching rows of the child table.
  • ON UPDATE SET NULL: Any change to a referenced column in the parent table causes the corresponding referencing column in matching rows of the child table to be set to null.
  • ON UPDATE SET DEFAULT: Any change to a referenced column in the referenced table causes the corresponding referencing column in matching rows of the referencing table to be set to its default value.
  • ON UPDATE RESTRICT: It is prohibited to change a row of the parent table if that row has any matching rows in the child table. The point in time when checking occurs can be deferred until COMMIT.
  • ON UPDATE NO ACTION (the default): It is prohibited to change a row of the parent table if that row has any matching rows in the child table. This holds true in ALL cases, even if checking is deferred (see next chapter).

If ON DELETE or ON UPDATE are not specified, the default action NO ACTION will occur. In some systems, the NO ACTION is implemented in the sense of the RESTRICT option.

An Example:

--
CREATE TABLE t1_t2 (
  ...
  CONSTRAINT t1_t2_fk_1 FOREIGN KEY (t1_id) REFERENCES t1(id)
                        ON UPDATE CASCADE ON DELETE RESTRICT,
  ...
);

Hint 1: The concept of updating Primary Keys is controversial.

Hint 2: Not all DBMS support all options.

IMMEDIATE / DEFERRED[edit | edit source]

There is an additional option to decide at what point in time the evaluation of the Foreign Key definition shall occur. The default behavior is to check it with each UPDATE and DELETE command. The second possibility is deferring the check until the end of the transaction, which is the COMMIT command. The purpose of this deferring is to put applications in the position to modify parent tables before child tables (which may be helpful if they utilize Hibernate).

To define this option, the constraint definition must be extended by the keywords [NOT] DEFERRABLE, which are pre- or postfixed by INITIALLY IMMEDIATE (the default) or INITIALLY DEFERRED to specify the initial state after the CREATE TABLE point in time.

--
CREATE TABLE t1_t2 (
  ...
  CONSTRAINT t1_t2_fk_1 FOREIGN KEY (t1_id) REFERENCES t1(id)
                        ON UPDATE CASCADE  DEFERRABLE INITIALLY IMMEDIATE
                        ON DELETE RESTRICT DEFERRABLE INITIALLY DEFERRED,
  ...
);

Hint: MySQL does not support the DEFERRABLE option, but the Foreign Key checking can be activated and deactivated dynamically by 'SET foreign_key_checks = 0/1;'

The Chicken-Egg Problem[edit | edit source]

Sometimes applications run into cyclic dependencies: Table A contains a reference to table B and vice versa, e.g.: A table team contains the columns id, team_name and team_leader (which is an id to a player) and the table player contains the columns id, player_name and team_id.

--
CREATE TABLE team (
  id               DECIMAL,
  team_name        VARCHAR(50),
  team_leader      DECIMAL,      -- ID of a player
  CONSTRAINT team_pk             PRIMARY KEY (id)
);

CREATE TABLE player (
  id               DECIMAL,
  player_name      VARCHAR(50),
  team_id          DECIMAL,
  CONSTRAINT player_pk           PRIMARY KEY (id)
);

ALTER TABLE team   ADD CONSTRAINT team_fk   FOREIGN KEY (team_leader) REFERENCES player(id);
ALTER TABLE player ADD CONSTRAINT player_fk FOREIGN KEY (team_id)     REFERENCES team(id);

So far, so bad. When the first team-row shall be inserted, the player-row is missed. When the player-row is inserted first, the team-row is missed.

As we have seen above, there is a DEFER option. Using this option, the FK-constraints must be defined such that they are not evaluated immediately with the INSERT commands. They shall be evaluated after all INSERTs at the COMMIT point in time.

-- Throw the above definitions away ...
ALTER TABLE team   DROP CONSTRAINT team_fk;
ALTER TABLE player DROP CONSTRAINT player_fk;
-- ... and use DEFERRABLE
ALTER TABLE team   ADD  CONSTRAINT team_fk   
                        FOREIGN KEY (team_leader) REFERENCES player(id) DEFERRABLE INITIALLY DEFERRED;
ALTER TABLE player ADD  CONSTRAINT player_fk
                        FOREIGN KEY (team_id)     REFERENCES team(id)   DEFERRABLE INITIALLY DEFERRED;

Now we can insert data in any sequence (don't miss to deactivate AUTOCOMMIT).

--
INSERT INTO team   (id, team_name,   team_leader) VALUES (1, 'Wild Tigers',  1);
INSERT INTO player (id, player_name, team_id)     VALUES (1, 'Johnny Crash', 1);
-- No checking of Foreign Keys up to here
COMMIT;   -- Commit includes the check of Foreign Keys

DROP TABLE / TRUNCATE TABLE[edit | edit source]

Foreign Keys have implications to DROP TABLE and TRUNCATE TABLE commands. As long as a Foreign Key refers to a parent table, this table cannot be dropped (remove structure and data) or truncated (remove data only). This holds true even if there is no actual row referring any row in the parent table - the existence of the Foreign Key is sufficient to refuse DROP and TRUNCATE.

To use DROP or TRUNCATE, it is necessary to drop the constraint first.

Hint: Some implementations offer a DISABLE/ENABLE command to deactivate constraints temporarily.


Exercises[edit | edit source]

Is it possible that the parent table of a FK-constraint contains 1 row and the child table is empty?

Click to see solution
Yes. Parents without children are absolutely normal.

Is it possible that the child table of a FK-constraint contains 1 row and the parent table is empty?

Click to see solution
Yes. Although the main purpose of FK-constraints is the prevention of children without parents (orphans), this situation may occur.
If the column of the child row contains the NULL value, this row relates to no parent row
because 'null = <any value>' evaluates always to UNKNOWN and never to TRUE, even if that <any value> is the NULL value.

Create a table genealogy which stores information about people and their ancestors. The columns are: id, first_name, last_name, birth_name, father_id, mother_id.

Click to see solution
CREATE TABLE genealogy (
  id             DECIMAL                           PRIMARY KEY,
  first_name     VARCHAR(100),
  last_name      VARCHAR(100),
  birth_name     VARCHAR(100),
  father_id      DECIMAL,
  mother_id      DECIMAL
);

Extend the table genealogy by two FK-contraints such that the columns 'father_id' and 'mother_id' refer to other rows of this table.

Click to see solution
ALTER TABLE genealogy ADD CONSTRAINT gen_fk_1 FOREIGN KEY (father_id) REFERENCES genealogy(id);
ALTER TABLE genealogy ADD CONSTRAINT gen_fk_2 FOREIGN KEY (mother_id) REFERENCES genealogy(id);

Insert some data into 'genealogy', e.g.: data from your personal family.

Click to see solution
-- For the first rows store NULL in 'father_id' and 'mother_id'!
INSERT INTO genealogy (id, first_name, last_name, birth_name, father_id, mother_id)
               VALUES (1,  'Mike',     'Miller',  'Miller',   null,      null);
INSERT INTO genealogy (id, first_name, last_name, birth_name, father_id, mother_id)
               VALUES (2,  'Eve',      'Miller',  'Summer',   null,      null);
INSERT INTO genealogy (id, first_name, last_name, birth_name, father_id, mother_id)
               VALUES (3,  'Marry',    'Dylan',   'Miller',   1,      2);
INSERT INTO genealogy (id, first_name, last_name, birth_name, father_id, mother_id)
               VALUES (4,  'Henry',    'Dylan',   'Dylan',    null,      3);
COMMIT;