Structured Query Language/SELECT: Set Operations

From Wikibooks, open books for an open world
Jump to: navigation, search

← SELECT: Predefined Functions | SELECT: Case Expression →

Tables, views and results of SELECT commands are in somewhat similar to sets of set theory. In this comparision the elements of sets correspond to rows of tables, views and SELECT results. The differences between set theory and the itemized SQL constructs are:

  • Sets of set theory do not allow duplicates whereas SQL allows duplicates. (Even different rows of one table may be identical as there is no duty to use the concept of primary keys.) In the following we use the term multiset when we speak about sets in SQL where duplicates are possible.
  • Sets of set theory and multisets are not ordered. But for the result of a SELECT command we can enforce an ordering by means of the optional ORDER BY clause.

The comparision between set theory and SQL goes even further. In SQL we have operations which acts on multisets in the sense of set theory: The SQL operations UNION, INTERSECT and EXCEPT (some name it MINUS) process intermediate multisets generated by differents SELECT commands. The operations expect the multisets are of the same type. This means mainly that they must have the same number of columns. Also their data type should correlate, but this is not mandatory. If they do not, the DBMS will cast them to a common data type - if possible.

UNION[edit]

The UNION operation pushs the result of several SELECT commands together. The result of the UNION contains those values, which are in the first or in the second intermediate result.

-- Please consider that this is only one command (only ONE semicolon at the very end)
SELECT firstname  -- first SELECT command
FROM   person
  UNION           -- push both intermediate results together to one result
SELECT lastname   -- second SELECT command
FROM   person;
The UNION of two intermediate results

This is a single SQL command. It consists of two SELECTs and one UNION operation. The SELECTs are evaluated first. Afterwards their results are pushed together to one single result. In our example the result contains all lastnames and firstnames in a single column (our example may be of limited help in praxis, it's only a demonstration for the UNION).

DISTINCT / ALL
If we examine the result closely, we will notice that it consists only of 17 values. The table person contains ten rows so that we probably expect twenty values in the result. If we perform the 'SELECT firstname ...' and 'SELECT lastname ...' as separate commands without the UNION, we receive for both commands 10 values. The explanation for the 3 missing values is the UNION command. It behaves by default that it removes duplicates. Therefore some of the intermediate values are skipped. If we want to obtain this duplicate values we have to extent the UNION. It can be widen with one of the two key words DISTINCT or ALL. DISTINCT is the default and its behaviour is the removal of duplicate values which we have seen before. ALL leads to the retention of all values, independent whether they appeared before or not.

-- remove (that's the default) or keep duplicates
SELECT ...
  UNION [DISTINCT | ALL]
SELECT ...
[ -- it is possible to 'UNION' more than 2 intermediate results
  UNION [DISTINCT | ALL]
SELECT ...
];

A hint for Oracle users: The use of the key word DISTINCT, which is the default, is not accepted by Oracle. Omit it.

General hint
In most cases the UNION combines SELECT commands on different tables or on different columns of the same table. SELECT commands on the same column of a single table usually use the WHERE clause in combination with boolean logic.

-- A very unusual example. People apply such querys on the same table only in combination with very complex WHERE conditions.
-- This example would normally expressed with a phrasing similar to: WHERE lastname IN ('de Winter', 'Goldstein');
SELECT *
FROM   person
WHERE  lastname = 'de Winter'
  UNION ALL
SELECT *
FROM   person
WHERE  lastname = 'Goldstein';

INTERSECT[edit]

INTERSECT

The INTERSECT operation evaluates to those values, which are in both intermediate results, in the first as well as in the second.





-- As in our example database there is no example for the INTERSECT we insert a new person.
-- This person has the same lastname 'Victor' as the first name of another person.
INSERT INTO person VALUES (21, 'Paul', 'Victor', DATE'1966-04-02', 'Washington', '078-05-1121', 66);
COMMIT;
-- All firstnames which are used as lastname.
SELECT firstname  -- first SELECT command
FROM   person
  INTERSECT       -- looking for common values
SELECT lastname   -- second SELECT command
FROM   person;

A hint to MySQL users: MySQL (5.5) does not support INTERSECT operation. But as it is not an elementary operation, there are workarrounds.

EXCEPT[edit]

EXCEPT

The EXCEPT operation evaluates to those values, which are in the first intermediate result but not in the second.





-- All firstname with the exception of 'Victor' because there is a lastname with this value.
SELECT firstname  -- first SELECT command
FROM   person
  EXCEPT          -- are there values in the result of first SELECT but not of second?
SELECT lastname   -- second SELECT command
FROM   person;

A hint to MySQL users: MySQL (5.5) does not support the EXCEPT operation. But as it is not an elementary operation, there are workarrounds.
A hint to Oracle users: Oracle use the key word MINUS instead of EXCEPT.

-- Clean up the example database
DELETE FROM person WHERE id > 10;
COMMIT;

Order By[edit]

We can combine set operations with all other elements of SELECT command, in particular with ORDER BY and GROUP BY. But this may lead to some uncertainties. Therefore, we would like to explain some of the details below.

SELECT firstname  -- first SELECT command
FROM   person
  UNION           -- push both intermediate results together to one result
SELECT lastname   -- second SELECT command
FROM   person
ORDER BY firstname;

To which part of the command belongs the ORDER BY? To the first SELECT, to the second SELECT or to the result of the UNION? The SQL rules determine, that set operations are evaluated before ORDER BY clauses (as always parenthesis can change the order of evaluation). Therefore the ORDER BY sorts the final result and not any of the intermediate results.

We rearrange the example in the hope that things get clear.

-- Equivalent semantic
SELECT * FROM
  (SELECT firstname  -- first SELECT command
   FROM   person
     UNION           -- push both intermediate (unnamed) results together to the next intermediate result 't'
   SELECT lastname   -- second SELECT command
   FROM   person
  ) t                -- 't' is the name for the intermediate result generated by UNION
ORDER BY t.firstname;

First the two SELECTS are evaluated, afterwareds the UNION. This intermediate result gets the name 't'. 't' is ordered.

Often one would like to achieve that the rows from the first SELECT are ordered independent from those of the second SELECT. We can do this by adding a virtuel column to the result of both SELECTs.

SELECT '1' AS dummy, firstname
FROM   person
  UNION
SELECT '2', lastname
FROM   person
ORDER BY dummy, firstname;

Group By[edit]

With the GROUP BY clause things are little more complicated than with ORDER BY. The GROUP BY refers to the last SELECT or - to say it the other way round - to the SELECT of its direct level.

-- Will not work because the GROUP BY belongs to the second SELECT and not to the UNION!
SELECT firstname
FROM   person
  UNION
SELECT lastname
FROM   person
GROUP BY firstname;
-- 
-- Works, but possibly not what you want to do.
-- The alias name for the (only) column of the UNION is 'firstname'.
SELECT firstname
FROM   person
  UNION
-- We group over the (only) column of the second SELECT, which is 'lastname' and results in 7 values
SELECT lastname
FROM   person
GROUP BY lastname;
--
-- Make things clear: rearrange the query to group over the final result 
SELECT * FROM
  (SELECT firstname  -- columnnames of the first SELECT determins the columnnames of the UNION
   FROM   person
     UNION
   SELECT lastname
   FROM   person
  ) t
GROUP BY t.firstname; -- now we can group over the complete result

Exercises[edit]

Show the lowest, highest and mean weight as a) 3 values of 1 row and b) 1 value in 3 rows.

Click to see solution
-- 1 row
SELECT MIN(weight), MAX(weight), avg(weight)
FROM   person;
 
-- 3 rows
SELECT MIN(weight)
FROM   person
  UNION
SELECT MAX(weight)
FROM   person
  UNION
SELECT avg(weight)
FROM   person;

Extend the previous 3-rows-solution to meet two additional criteria: a) consider only persons born in San Francisco and
b) add a virtual column to show 'Min', 'Max' and 'Avg' according to the correlating numeric values.

Click to see solution
SELECT 'Min', MIN(weight)
FROM   person
WHERE  place_of_birth = 'San Francisco'
  UNION
SELECT 'Max', MAX(weight)
FROM   person
WHERE  place_of_birth = 'San Francisco'
  UNION
SELECT 'Avg', avg(weight)
FROM   person
WHERE  place_of_birth = 'San Francisco';

Extend the previous solution to order the result: the minimum value first, followed by the average and then the highest value.

Click to see solution
-- 'ugly' solution
SELECT '1 Min' AS note, MIN(weight)
FROM   person
WHERE  place_of_birth = 'San Francisco'
  UNION
SELECT '3 Max' AS note, MAX(weight)
FROM   person
WHERE  place_of_birth = 'San Francisco'
  UNION
SELECT '2 Avg' AS note, avg(weight)
FROM   person
WHERE  place_of_birth = 'San Francisco'
ORDER BY note;
 
-- 'clean' solution
SELECT 1 AS note, 'Min', MIN(weight)
FROM   person
WHERE  place_of_birth = 'San Francisco'
  UNION
SELECT 3 AS note, 'Max', MAX(weight)
FROM   person
WHERE  place_of_birth = 'San Francisco'
  UNION
SELECT 2 AS note, 'Avg', avg(weight)
FROM   person
WHERE  place_of_birth = 'San Francisco'
ORDER BY note;

Create a list of lastnames for persons with a weight greater than 70 kg together with
all e-mail values (one value per row). There is no concordance between lastnames and e-mails.
(This example is not very helpfull for praxis, but instructive.)

Click to see solution
SELECT lastname
FROM   person
WHERE  weight > 70
  UNION
SELECT contact_value
FROM   contact
WHERE  contact_type = 'email';

In the previous example the lastname 'de Winter' is shown only once. But there are more than one persons of the family with a weight greater than 70 kg.
Why?
Extend the previous solution to show as much resulting rows as hits to the criteria.

Click to see solution
-- Extend 'UNION' to 'UNION ALL'. The default is 'UNION DISTINCT'
SELECT lastname
FROM   person
WHERE  weight > 70
  UNION ALL
SELECT contact_value
FROM   contact
WHERE  contact_type = 'email';