Structured Query Language/SELECT: Predefined Functions
There are two groups of predefined functions:
- aggregate functions. They work on a set of rows, which means they receive one value for each row of a set of rows and returns one value for the whole set. If they are called in the context of a GROUP BY clause, they are called once per group, else once for all rows.
- scalar functions. They work on single rows, which means they receive one value of a single row and returns one value for each of them.
They work on a set of rows and return one single value like the number of rows, the highest or lowest value, the standard deviation, etc. The most important aggregate functions are:
||The number of rows|
||The number of rows where <column name> contains a value (is not null). The elimination of NULL values applies also to the other aggregate functions.|
||Lowest value. In the case of strings according to the sequence of characters.|
||Highest value. In the case of strings according to the sequence of characters.|
||Sum of all values|
As an example we retrieve the maximum weight of all persons:
SELECT MAX(weight) FROM person;
A Word of Caution
Aggregate functions result in one value for a set of rows. Therefore it is not possible to use them together with 'normal' columns in the projection (the part behind SELECT key word). If we specify, for example,
SELECT lastname, SUM(weight) FROM person;
we try to instruct the DBMS to show a lot of rows containing the lastname simultaneously with one value. This is a contradiction and the system will throw an exception. We can use a lot of aggregate functions within one projection but we are not allowed to use them together with 'normal' columns.
-- Multiple aggregate functions. No 'normal' columns. SELECT SUM(weight)/COUNT(weight) AS average_1, avg(weight) AS average_2 FROM person;
If we use aggregate functions in the context of commands containing a GROUP BY, the aggregate functions are called once per group.
-- Not only one resulting row, but one resulting row per lastname together with the average weight of all rows with this lastname. SELECT avg(weight) FROM person GROUP BY lastname;
In such cases the GROUP BY column(s) may be displayed as it is impossible that they change within the group.
-- The lastname may be shown as it is the GROUP BY criteria SELECT lastname, avg(weight) FROM person GROUP BY lastname;
If a row contains no value (which is the NULL value) in the named column, the row is not part of the computation.
-- If ssn is NULL, this row will not count. SELECT COUNT(ssn) FROM person;
ALL vs. DISTINCT
The complete signatures of the functions are a little more detailed. We can prepend the column name with one of the two key words ALL or DISTINCT. If we specify ALL, which is the default, every value is part of the computation, else only those, which are distinct from each other.
function_name ([ALL|DISTINCT]<column name>)
count (DISTINCT weight) -- as an example
The standard defines some more aggregate functions to compute statistical messures. Also the keywords ANY, EVERY and SOME formally are defined as aggregate functions. We will discuss them on a separate page.
Scalar functions act on a 'per row basis'. They are called once per row and they return one value per call. Often they are grouped according to the data types they act on:
- String functions
- substring(<column name>, <pos>, <len>) returns a string starting at position <pos> (first character counts with '1') in the length of <len>.
- upper(<column name>) returns the uppercase equivalent of the column value.
- character_length(<column name>) returns the length of the column value.
- Numeric functions
- sqrt(<column name>) returns the square root of the column value.
- abs(<column name>) returns the absolute value of the column value.
- Date, Time & Interval functions
- extract(month FROM date_of_birth) returns the month of column date_of_birth.
- build-in functions. They do not have any input parameter.
- current_date() returns the currente date.
- current_time() returns the currente time.
There is another wikibook where those functions are shown in detail. The data type of the return value is not always identical to the type of the input, e.g. 'character_length()' receives a string and returns a number.
Here is an example with some scalar functions:
SELECT LOWER(firstname), UPPER(lastname), CONCAT('today is: ', CURRENT_DATE) FROM person;
What is the hightest id used so far in the hobby table?
SELECT MAX(id) FROM hobby;
Which lastname will occur first in an ordered list?
SELECT MIN(lastname) FROM person;
Are there aggregate functions where it makes no difference to use the ALL or the DISTINCT key word?
Yes. min(ALL <column name>) leads to the same result as min(DISTINCT <column name>) as
it makes no difference whether the smallest value occurs one or more times. The same is true for max().
Show persons with a short firstname (up to 4 characters).
-- We can use functions as part of the WHERE clause. SELECT * FROM person WHERE CHARACTER_LENGTH(firstname) <= 4; -- Hint: Some implementations use a different function name: length() or len().
Show firstname, lastname and the number of characters for the concatenated string. Find two different solutions. You may use the character_length() function to compute the length of strings and the concat() function to concatenate strings.
-- Addition of the computed length. Hint: Some implementations use a different function name: length() or len(). SELECT firstname, lastname, CHARACTER_LENGTH(firstname) + CHARACTER_LENGTH(lastname) FROM person; -- length of the concatenated string SELECT firstname, lastname, CHARACTER_LENGTH(concat (firstname, lastname)) FROM person; -- show both solutions together SELECT firstname, lastname, CHARACTER_LENGTH(firstname) + CHARACTER_LENGTH(lastname) AS L1, CHARACTER_LENGTH(concat (firstname, lastname)) AS L2 FROM person;