Structured Query Language/SQL: A Language for Working with rDBMS

From Wikibooks, open books for an open world
Jump to navigation Jump to search
Relational DBMS (rDBMS) Structured Query Language
SQL: A Language for Working with rDBMS
SQL: The Standard ISO IEC 9075 and various Implementations


History[edit]

As outlined above, rDBMS acts on the data with operations of relational algebra like projections, selections, joins, set operations (union, except and intersect) and more. The operations of relational algebra are denoted in a mathematical language which is highly formal and hard to understand for end users and - possibly also - for many software engineers. Therefore rDBMS offers a layer above relational algebra, which is easy to understand but nevertheless can be mapped to the underlying relational operations. Since the 1970s we have seen some languages doing this job, one of them was SQL - another example was QUEL. In the early 1980s (after a rename from its original name SEQUEL due to trademark problems) SQL achieved market dominance. And in 1986 SQL was standardized for the first time. The current version is SQL 2019.

Characteristics[edit]

The tokens and syntax of SQL are modeled on English common speech to keep the access barrier as small as possible. An SQL command like UPDATE employee SET salary = 2000 WHERE id = 511; is not far away from the sentence "Change employee's salary to 2000 for the employee with id 511."

The key words of SQL can be expressed in any combination of upper and lower case characters, i.e. the keywords are case insensitive. It makes no difference whether UPDATE, update, Update, UpDate or any other combination of upper and lower case characters is written in SQL code.

Next SQL is a descriptive language, not a procedural one. It does not proscribe all aspects of the relational operations (which operation, their order, ...) which are generated from the given SQL statement. The rDBMS has the freedom to generate more than one execution plan from a statement. It may compare several generated execution plans with each other and run the one it thinks is best in a given situation. Additionally, the programmer is freed from considering all the details of data access, e.g.: Which one of a set of WHERE criteria should be evaluated first if they are combined with AND?

Despite the above simplifications, SQL is very powerful. It allows the manipulation of a set of data records with a single statement. UPDATE employee SET salary = salary * 1.1 WHERE salary < 2000; will affect all employee records with an actual salary smaller than 2000. Potentially, there may be thousands of those records, only a few or even zero. The operation may also depend on data already present in the database; the statement SET salary = salary * 1.1 leads to an increase of the salaries by 10%, which may be 120 for one employee and 500 for another one.

The designer of SQL tried to define the language elements orthogonally to each other. Among other things this refers to the fact that any language element may be used in all positions of a statement where the result of that element may be used directly. E.g.: If you have a function power() which takes two numbers and returns another number, you can use this function in all positions where numbers are allowed. The following statements are syntactically correct (if you have defined the function power() ) - and lead to the same resulting rows.

SELECT salary FROM employee WHERE salary < 2048;
SELECT salary FROM employee WHERE salary < power(2, 11);
SELECT power(salary, 1) FROM employee WHERE salary < 2048;

Another example of orthogonality is the use of subqueries within UPDATE, INSERT, DELETE or inside another SELECT statement.

However, SQL is not free of redundancy. Often there are several possible formulations to express the same situation.

SELECT salary FROM employee WHERE salary < 2048;
SELECT salary FROM employee WHERE NOT salary >= 2048;
SELECT salary FROM employee WHERE salary between 0 AND 2048; -- 'BETWEEN' includes edges

This is a very simple example. In complex statements there may be the choice between joins, subqueries and the exists predicate.

Fundamentals[edit]

Core SQL consists of statements. Statements consist of key words, operators, values, names of system- and user-objects or functions. Statements are concluded by a semicolon. In the statement SELECT salary FROM employee WHERE id < 100; the tokens SELECT, FROM and WHERE are key words. salary, employee and id are object names, the "<" sign is an operator and "100" is a value.

The SQL standard arranges statements into 9 groups:

"The main classes of SQL-statements are:
SQL-schema statements; these may have a persistent effect on the set of schemas.
SQL-data statements; some of these, the SQL-data change statements, may have a persistent effect on SQL data.
SQL-transaction statements; except for the <commit statement>, these, and the following classes, have no effects that persist when an SQL-session is terminated.
SQL-control statements.
SQL-connection statements.
SQL-session statements.
SQL-diagnostics statements.
SQL-dynamic statements.
SQL embedded exception declaration."

This detailed grouping is unusual in common speech. A typical alternative is to organize SQL statements into the following three groups:

Data Definition Language (DDL): Managing the structure of database objects (CREATE/ALTER/DROP tables, views, columns, ...)
Data Manipulation Language (DML): Managing and retrieval of data with the statements INSERT, UPDATE, MERGE, DELETE, SELECT, COMMIT, ROLLBACK and SAVEPOINT.
Data Control Language (DCL): Managing access rights (GRANT, REVOKE).

Hint: In some publications the SELECT statement is said to build its own group Data Query Language. This group has no other statements than SELECT.

Turing completeness[edit]

Core SQL as described above is not Turing complete. It misses conditional branches, variables, subroutines. But the standard as well as most implementations offers an extension to fulfill the demand for Turing completeness. In 'Part 4: Persistent Stored Modules (SQL/PSM)' of the standard there are definitions for IF-, CASE-, LOOP-, assignment- and other statements. The existing implementations of this part have different names, different syntax and also a different scope of operation: PL/SQL in Oracle, SQL/PL in DB2, Transact-SQL or T-SQL in SQL Server and Sybase, PL/pgSQL in Postgres and simply 'stored procedures' in MySQL.


Relational DBMS (rDBMS) Structured Query Language
SQL: A Language for Working with rDBMS
SQL: The Standard ISO IEC 9075 and various Implementations