Structured Query Language/SELECT: IS NULL Predicate
When creating new rows it may occur that we don't know the value of one or more columns.
Let's assume that we want to store informations about banking accounts and for one of those accounts we don't know the balance. What can we do? There are several possibilities:
- Reject the whole row with all other informations like account number, dispositional credit, interest rate, ... . Not very attractive.
- Store a default value instead of the value we acually don't know. But there are cases where it is impossible to define a default value because every value is possible, e.g. a bank account of '0' or '-1' is not unusual.
- Store a flag that signals that no value is stored.
Relational DBMS uses the last mentioned technique and the sense of the flag is 'there is no value stored'. Sometimes people say 'The NULL value is stored'.
Extention of Boolean Logic
Assume there is a table for banking accounts and some of its rows hold the NULL value in the column balance. Does those rows fullfill at least one of the two WHERE conditions 'balance >= 0' or 'balance <= 0'? No. It is not possible to decide whether these conditions are true or false! Honestly we must admit that we don't know an answer in our usual true/false logic because we don't know a value for balance. We are forced to extend the range of boolean values with a third one, which we call unkown. The two conditions above evaluate neither true nor false, both evaluate to 'unknown' for rows where balance holds the NULL value.
In a later stage we need definitions for the boolean operators NOT, AND, OR and EQUAL when true/false interact with unknown. You find the definitions here.
Retrieve the NULL Value
Within every SELECT command such rows become part of the resulting rows, in which the WHERE condition evaluates to true. If it evaluates to false or unknown, the row will be rejected. As all WHERE conditions like the above 'balance >= 0' - and also their negation - evaluates to unknown for missing balance values, there is preliminary no way to retrieve them.
To overcome this lack, SQL contains the special phrase 'IS NULL'. The wording 'balance IS NULL' evaluates to true for exactly the rows with a missing value in balance.
SELECT ... FROM ... WHERE <columnname> IS NULL ... ;
We must use exactly this wording. The use of any arithmetic operator like >, <=, !=, ... will not retrieve rows with the NULL value. The same holds true even for the condition '(balance = 0) OR NOT (balance = 0)', which is a tautology in conventional true/false logic. Beside this IS NULL predicate there is no other way to retrieve the NULL values - without one simple but not helpfull exception: if you omit the WHERE condition, all rows of the table are retrieved, with and without NULL values in any column.
That's all! Dealing with NULL values and the 3-value-logic might sound strange if you first met this topic. But as the IS NULL predicate evaluates always to true or false everything works as usual afterwards. We can use all other elements of the SELECT command (boolean logic, join, having, order by, ...) in the same way we have done so far.
Our test database does not contain NULL values. Nevertheless we have met the situation during the explanation of OUTER joins. OUTER joins create resulting rows where some columns contain the NULL value. We must consider this possibility, if we deal with the results of such subselects.
There are two other ways to generate NULL values.
- INSERT or UPDATE command with the explicit notion of the NULL value. In this case the SQL key word null is used as a representative for the NULL value.
- INSERT command without using all columns. The omitted columns will get the NULL value - or a default, if one is defined.
To demonstrate this and to create some examples for the following excercises, we put one row into the person table with some columns left empty.
-- Insert a new row for testing purpose INSERT INTO person (id, firstname, lastname) VALUES (51, 'Half man', 'Uncomplete'); COMMIT; -- Retrieve the row. As defined in CREATE TABLE statement the weight has a default value of integer 0. -- Date_of_birth and place_of_birth contain the NULL value. SELECT * FROM person WHERE id = 51; -- use the IS NULL predicate within WHERE clause. The result contains 1 row. SELECT * FROM person WHERE ssn IS NULL; -- weight has a value!! We expect to retrieve no rows when we use the IS NULL predicate. SELECT * FROM person WHERE weight IS NULL; -- or, to say it the other way round, the number of rows is 0 SELECT COUNT(*) FROM person WHERE weight IS NULL; -- but in the next statement the number of rows is 1 SELECT COUNT(*) FROM person WHERE weight = 0; -- Negate the IS NULL predicate SELECT COUNT(*) FROM person WHERE ssn IS NULL; -- IS NULL SELECT COUNT(*) FROM person WHERE ssn IS NOT NULL; -- Negation of IS NULL SELECT COUNT(*) FROM person WHERE ssn IS NULL OR ssn IS NOT NULL; -- A tautology, which always retrieves ALL rows of a table -- Same as above SELECT COUNT(*) FROM person WHERE ssn IS NULL OR NOT ssn IS NULL; -- A tautology, which always retrieves ALL rows of a table
Next we show the use of the UPDATE command in combination with the NULL value
-- -- Insert a new row for testing purpose with all columns filled with a usefull value INSERT INTO person (id, firstname, lastname, date_of_birth, place_of_birth, ssn, weight) VALUES (52, 'Lyn', 'Mutable', DATE'1951-05-13', 'Anchorage', '078-05-1152', 69); COMMIT; SELECT * FROM person WHERE id = 52; -- Delete a single column value (not the complete row) UPDATE person SET ssn = NULL WHERE id = 52; COMMIT; SELECT * FROM person WHERE id = 52; -- one row SELECT * FROM person WHERE ssn IS NULL; -- two rows: 51 + 52
Restore the original state of the example database.
DELETE FROM person WHERE id > 50; COMMIT;
Coalesce() and similar Functions
In the context of NULL values it is often the case that we have to retrieve rows with no value (the NULL value) or a default value such as 0 or blank. In such cases, the WHERE condition looks something like this: "... WHERE (col IS NULL OR col = 0) ...". To keep source code simpler, the SQL standard defines a function coalesce(<expression_1>, <expression_2>). If the first argument, which normaly is the name of a column, is not NULL, the function evaluates to this argument - else to the second argument.
-- Retrieve rows without ssn or with ssn equal to blank. SELECT * FROM person WHERE COALESCE(ssn, ' ') = ' '; -- equivalent: -- WHERE (ssn IS NULL -- OR ssn = ' ');
The function name coalesce results from the fact that the function accepts an arbitrary number of parameters and evaluates them in a recursive manner. If parameter n results in a not NULL value, it evaluates to this parameter, else the function calls itself without the n-th parameter. coalesce(expression_1, expression_2, expression_3) evaluates to expression_1, if expression_1 is not NULL, else to expression_2, if expression_2 is not NULL, else to expression_3.
The SQL standard defines another function nullif(<expression_1>, <expression_2>). It evaluates to NULL, if the two expressions are equal - and it evaluates to the first expression, if they differ from each other.
Different vendors offers some more functions like isnull(), ifnull() or nvl() to support handling of NULL values. The meaning of this functions is vendor specific.
Insert a new hobby 'Snowshoeing' without a remark.
INSERT INTO hobby (id, hobbyname, remark) VALUES (10, 'Snowshoeing', NULL); COMMIT;
Find a second solution for the above question without using the key word 'null'. (First delete row 10.)
DELETE FROM hobby WHERE id = 10; INSERT INTO hobby (id, hobbyname) VALUES (10, 'Snowshoeing'); COMMIT;
Retrieve all hobbies without a remark.
-- 1 row SELECT * FROM hobby WHERE remark IS NULL;
How many hobbies are exemplified with a remark?
-- 9 rows SELECT COUNT(*) FROM hobby WHERE remark IS NOT NULL;
Change row 10 of hobby in the way that the hobbname contains the string 'NULL' and the remark 'Name of hobby not known'.
-- Consider the two apostrophes surrounding the string 'NULL', which consists of the 4 characters N, U, L and L !! UPDATE hobby SET hobbyname = 'NULL', remark = 'Name of hobby not known' WHERE id = 10; COMMIT;
a) Retrieve the row where hobbyname is 'NULL'.
b) Retrieve the row where remark is 'Name of hobby not known'.
-- This may be a pitfall question. There is no relation to the IS NULL predicate SELECT * FROM hobby WHERE hobbyname = 'NULL'; SELECT * FROM hobby WHERE remark = 'Name of hobby not known';
How many hobbies have a hobby name?
-- All 10 rows contains a hobby name, even the row with the hobbyname 'NULL' SELECT COUNT(*) FROM hobby WHERE hobbyname IS NOT NULL;