Structured Query Language/Rollup Cube

From Wikibooks, open books for an open world
Jump to navigation Jump to search

In the chapter Grouping we have seen that the key word GROUP BY creates groups of rows within a result set. Additionally aggregat functions like SUM() computes condensed values for each of those groups.

Because GROUP BY can summarize by multiple columns, there is often the requirement to compute summary values for 'super-groups', which arise by successively omitting one column after the next from the GROUP BY specification.

Example Table[edit | edit source]

To illustrate the situation, we offer an example table and typical questions to such kind of tables.

CREATE TABLE car_pool (
  -- define columns (name / type / default value / nullable)
  id           DECIMAL      NOT NULL,
  producer     VARCHAR(50)  NOT NULL,
  model        VARCHAR(50)  NOT NULL,
  yyyy         DECIMAL      NOT NULL CHECK (yyyy BETWEEN 1970 AND 2020),
  counter      DECIMAL      NOT NULL CHECK (counter >= 0),
  CONSTRAINT   car_pool_pk PRIMARY KEY (id)
);
--
INSERT INTO car_pool VALUES ( 1, 'VW',     'Golf',    2005, 5);
INSERT INTO car_pool VALUES ( 2, 'VW',     'Golf',    2006, 2);
INSERT INTO car_pool VALUES ( 3, 'VW',     'Golf',    2007, 3);
INSERT INTO car_pool VALUES ( 4, 'VW',     'Golf',    2008, 3);
INSERT INTO car_pool VALUES ( 5, 'VW',     'Passat',  2005, 5);
INSERT INTO car_pool VALUES ( 6, 'VW',     'Passat',  2006, 1);
INSERT INTO car_pool VALUES ( 7, 'VW',     'Beetle',  2005, 1);
INSERT INTO car_pool VALUES ( 8, 'VW',     'Beetle',  2006, 2);
INSERT INTO car_pool VALUES ( 9, 'VW',     'Beetle',  2008, 4);
INSERT INTO car_pool VALUES (10, 'Toyota', 'Corolla', 2005, 4);
INSERT INTO car_pool VALUES (11, 'Toyota', 'Corolla', 2006, 3);
INSERT INTO car_pool VALUES (12, 'Toyota', 'Corolla', 2007, 2);
INSERT INTO car_pool VALUES (13, 'Toyota', 'Corolla', 2008, 4);
INSERT INTO car_pool VALUES (14, 'Toyota', 'Prius',   2005, 1);
INSERT INTO car_pool VALUES (15, 'Toyota', 'Prius',   2006, 1);
INSERT INTO car_pool VALUES (16, 'Toyota', 'Hilux',   2005, 1);
INSERT INTO car_pool VALUES (17, 'Toyota', 'Hilux',   2006, 1);
INSERT INTO car_pool VALUES (18, 'Toyota', 'Hilux',   2008, 1);
--
COMMIT;

In the table, there are two different car producers, 6 models and 4 years. Typical questions to such tables are:

  • Number of cars per producer or per model.
  • Number of cars per combination of some criteria like: producer plus model or producer plus year.
  • Total number of cars (without any criteria).

ROLLUP[edit | edit source]

As we have seen, the keyword GROUP BY offers condensed data for exactly one grouping level, producer plus model in this case.

SELECT producer, model, sum(counter) as cnt
FROM   car_pool
GROUP BY producer, model
ORDER BY producer, cnt desc;
--
Toyota	Corolla	13
Toyota	Hilux	3
Toyota	Prius	2
VW	Golf	13
VW	Beetle	7
VW	Passat	6

In such situations, one would like to know also the corresponding values for upper groups: per producer or for the whole table. This can be achieved by submitting a slightly different SELECT.

SELECT producer, sum(counter) as cnt
FROM   car_pool
GROUP BY producer
ORDER BY producer, cnt desc;
--
Toyota	18
VW	26
--
--
SELECT sum(counter) as cnt
FROM   car_pool;
--
44

In principle, it is possible, to combine such SELECTs via UNION or to submit them sequentially. But because this is a standard requirement, SQL offers a more elegant solution, namely the extension of the GROUP BY with the ROLLUP keyword. Based on the results of the GROUP BY, it offers additional rows for every superordinate group, which arises by omitting the grouping criteria one after the other.

SELECT producer, model, sum(counter) as cnt
FROM   car_pool
GROUP BY ROLLUP (producer, model);  -- the MySQL syntax is: GROUP BY producer, model WITH ROLLUP
--
Toyota	Corolla	13
Toyota	Hilux	3
Toyota	Prius	2
Toyota		18   <-- the additional row per first producer
VW	Beetle	7
VW	Golf	13
VW	Passat	6
VW		26   <-- the additional row per next producer
		44   <-- the additional row per all producers

The simple GROUP BY clause creates rows at the level of producer plus model. The ROLLUP keyword leads to additional rows where first the model and then model and producer are omitted.

CUBE[edit | edit source]

The ROLLUP keyword offers solutions where a hierarchical point of view is adequate. But in data warehouse applications, one likes to navigate freely through the aggregated data, not only from top to bottom. To support this requirement, the SQL standard offers the keyword CUBE. It is an extension of ROLLUP and offers additional rows for all possible combinations of the GROUP BY columns.

In the case of our above example with the two columns producer and model, the ROLLUP has created rows for 'producer-only' and 'no criteria' (= complete table). Additional to that, CUBE creates rows for 'model-only'. (If different producer would use the same model-name, such rows will lead to only 1 additional row.)

SELECT producer, model, SUM(counter) AS cnt
FROM   car_pool
GROUP BY CUBE (producer, model);  -- not supported by MySQL
--
Toyota	Corolla	13
Toyota	Hilux	3
Toyota	Prius	2
Toyota	- 	18
VW	Beetle	7
VW	Golf	13
VW	Passat	6
VW	- 	26
- 	Beetle	7        <--
- 	Corolla	13       <--
- 	Golf	13       <-- additional rows for 'model-only'
- 	Hilux	3        <--
- 	Passat	6        <--
- 	Prius	2        <--
- 	- 	44

If there are tree grouping columns c1, c2, and c3, the keywords lead to the following grouping.

GROUP BY: (c1, c2, c3)
GROUP BY ROLLUP: (c1, c2, c3), (c1, c2), (c1) and ()
GROUP BY CUBE: (c1, c2, c3), (c1, c2), (c1, c3), (c2, c3), (c1), (c2), (c3) and ()