Structured Query Language/SQL: A Language for Working with rDBMS

From Wikibooks, open books for an open world
Jump to navigation Jump to search


History[edit | edit source]

As outlined above, rDBMS acts on the data with operations of relational algebra like projections, selections, joins, set operations (union, except and intersect) and more. The operations of relational algebra are denoted in a mathematical language that is highly formal and hard to understand for end-users and - possibly also - for many software engineers. Therefore, rDBMS offers a layer above relational algebra that is easy to understand but can be mapped to the underlying relational operations. Since the 1970s, we have seen some languages doing this job; one of them was SQL - another example was QUEL. In the early 1980s (after a rename from its original name SEQUEL due to trademark problems), SQL achieved market dominance. And in 1986, SQL was standardized for the first time. The current version is SQL 2019.

Characteristics[edit | edit source]

The tokens and syntax of SQL are modeled on English common speech to keep the access barrier as small as possible. An SQL command like UPDATE employee SET salary = 2000 WHERE id = 511; is not far away from the sentence "Change employee's salary to 2000 for the employee with id 511."

The keywords of SQL can be expressed in any combination of upper and lower case characters, i.e. the keywords are case insensitive. It makes no difference whether UPDATE, update, Update, UpDate, or any other combination of upper and lower case characters is written in SQL code.

Next, SQL is a descriptive language, not a procedural one. It does not proscribe all aspects of the relational operations (which operation, their order, ...), which are generated from the given SQL statement. The rDBMS has the freedom to generate more than one execution plan from a statement. It may compare several generated execution plans with each other and run the one it thinks is best in a given situation. Additionally, the programmer is freed from considering all the details of data access, e.g.: Which one of a set of WHERE criteria should be evaluated first if they are combined with AND?

Despite the above simplifications, SQL is very powerful. It allows the manipulation of a set of data records with a single statement. UPDATE employee SET salary = salary * 1.1 WHERE salary < 2000; will affect all employee records with an actual salary smaller than 2000. Potentially, there may be thousands of those records, only a few or even zero. The operation may also depend on data already present in the database; the statement SET salary = salary * 1.1 leads to an increase of the salaries by 10%, which may be 120 for one employee and 500 for another one.

The designer of SQL tried to define the language elements orthogonally to each other. Among other things, this refers to the fact that any language element may be used in all positions of a statement where the result of that element may be used directly. E.g.: If you have a function power(), which takes two numbers and returns another number, you can use this function in all positions where numbers are allowed. The following statements are syntactically correct (if you have defined the function power() ) - and lead to the same resulting rows.

SELECT salary FROM employee WHERE salary < 2048;
SELECT salary FROM employee WHERE salary < power(2, 11);
SELECT power(salary, 1) FROM employee WHERE salary < 2048;

Another example of orthogonality is the use of subqueries within UPDATE, INSERT, DELETE, or inside another SELECT statement.

However, SQL is not free of redundancy. Often there are several possible formulations to express the same situation.

SELECT salary FROM employee WHERE salary < 2048;
SELECT salary FROM employee WHERE NOT salary >= 2048;
SELECT salary FROM employee WHERE salary between 0 AND 2048; -- 'BETWEEN' includes edges

This is a very simple example. In complex statements, there may be the choice between joins, subqueries, and the exists predicate.

Fundamentals[edit | edit source]

Core SQL consists of statements. Statements consist of keywords, operators, values, names of system- and user-objects or functions. Statements are concluded by a semicolon. In the statement SELECT salary FROM employee WHERE id < 100; the tokens SELECT, FROM and WHERE are keywords. salary, employee, and id are object names, the "<" sign is an operator, and "100" is a value.

The SQL standard arranges statements into nine groups:

"The main classes of SQL-statements are:
SQL-schema statements; these may have a persistent effect on the set of schemas.
SQL-data statements; some of these, the SQL-data change statements, may have a persistent effect on SQL data.
SQL-transaction statements; except for the <commit statement>, these, and the following classes, have no effects that persist when an SQL-session is terminated.
SQL-control statements.
SQL-connection statements.
SQL-session statements.
SQL-diagnostics statements.
SQL-dynamic statements.
SQL embedded exception declaration."

This detailed grouping is unusual in everyday speech. A typical alternative is to organize SQL statements into the following groups:

Data Definition Language (DDL): Managing the structure of database objects (CREATE/ALTER/DROP tables, views, columns, ...).
Data Query Language (DQL): Retrieval of data with the statement SELECT. This group has only one statement.
Data Manipulation Language (DML): Changing of data with the statements INSERT, UPDATE, MERGE, DELETE, COMMIT, ROLLBACK, and SAVEPOINT.
Data Control Language (DCL): Managing access rights (GRANT, REVOKE).

Turing completeness[edit | edit source]

Core SQL, as described above, is not Turing complete. It misses conditional branches, variables, subroutines. But the standard, as well as most implementations, offers an extension to fulfill the demand for Turing completeness. In 'Part 4: Persistent Stored Modules (SQL/PSM)' of the standard, there are definitions for IF-, CASE-, LOOP-, assignment- and other statements. The existing implementations of this part have different names, different syntax, and also a different scope of operation: PL/SQL in Oracle, SQL/PL in DB2, Transact-SQL, or T-SQL in SQL Server and Sybase, PL/pgSQL in Postgres and simply 'stored procedures' in MySQL.