Structured Query Language/Rollup Cube

From Wikibooks, open books for an open world
Jump to: navigation, search

← Quantified Comparision |

In the chapter Grouping we have seen that the key word GROUP BY creates groups of rows within a result set. Additionally aggregat functions like SUM() computes condensed values for each of those groups.

As GROUP BY can work for more than one single column there is often the requirement to compute such condensed values also for 'super-groups', which arise by omitting successive one column after the next from the GROUP BY specification.

Example Table[edit]

To illustrate the situation we offer an example table and typical questions to such kind of tables.

CREATE TABLE car_pool (
  -- define columns (name / type / default value / nullable)
  id           DECIMAL      NOT NULL,
  producer     VARCHAR(50)  NOT NULL,
  model        VARCHAR(50)  NOT NULL,
  yyyy         DECIMAL      NOT NULL CHECK (yyyy BETWEEN 1970 AND 2020),
  counter      DECIMAL      NOT NULL CHECK (counter >= 0),
  CONSTRAINT   car_pool_pk PRIMARY KEY (id)
);
--
INSERT INTO car_pool VALUES ( 1, 'VW',     'Golf',    2005, 5);
INSERT INTO car_pool VALUES ( 2, 'VW',     'Golf',    2006, 2);
INSERT INTO car_pool VALUES ( 3, 'VW',     'Golf',    2007, 3);
INSERT INTO car_pool VALUES ( 4, 'VW',     'Golf',    2008, 3);
INSERT INTO car_pool VALUES ( 5, 'VW',     'Passat',  2005, 5);
INSERT INTO car_pool VALUES ( 6, 'VW',     'Passat',  2006, 1);
INSERT INTO car_pool VALUES ( 7, 'VW',     'Beetle',  2005, 1);
INSERT INTO car_pool VALUES ( 8, 'VW',     'Beetle',  2006, 2);
INSERT INTO car_pool VALUES ( 9, 'VW',     'Beetle',  2008, 4);
INSERT INTO car_pool VALUES (10, 'Toyota', 'Corolla', 2005, 4);
INSERT INTO car_pool VALUES (11, 'Toyota', 'Corolla', 2006, 3);
INSERT INTO car_pool VALUES (12, 'Toyota', 'Corolla', 2007, 2);
INSERT INTO car_pool VALUES (13, 'Toyota', 'Corolla', 2008, 4);
INSERT INTO car_pool VALUES (14, 'Toyota', 'Prius',   2005, 1);
INSERT INTO car_pool VALUES (15, 'Toyota', 'Prius',   2006, 1);
INSERT INTO car_pool VALUES (16, 'Toyota', 'Hilux',   2005, 1);
INSERT INTO car_pool VALUES (17, 'Toyota', 'Hilux',   2006, 1);
INSERT INTO car_pool VALUES (18, 'Toyota', 'Hilux',   2008, 1);
--
COMMIT;

In the table there are two different car producer, 6 models and 4 years. Typical questions to such tables are:

  • Number of cars per producer or per model.
  • Number of cars per combination of some criterias like: producer plus model or producer plus year.
  • Total number of cars (without any criteria).

ROLLUP[edit]

As we have seen, the key word GROUP BY offers condensed data for exactly one grouping level, producer plus model in this case.

SELECT producer, model, SUM(counter) AS cnt
FROM   car_pool
GROUP BY producer, model
ORDER BY producer, cnt DESC;
--
Toyota	Corolla	13
Toyota	Hilux	3
Toyota	Prius	2
VW	Golf	13
VW	Beetle	7
VW	Passat	6

In such situations one would like to know also the corresponding values for upper groups: per producer or for the whole table. This can be achieved by submitting slightly different SELECTs.

SELECT producer, SUM(counter) AS cnt
FROM   car_pool
GROUP BY producer
ORDER BY producer, cnt DESC;
--
Toyota	18
VW	26
--
--
SELECT SUM(counter) AS cnt
FROM   car_pool;
--
44

In principle it is possible, to combine such SELECTs via UNION or to submit them sequentially. But because this is a standard requirement SQL offers a more elegant solution, namely the extention of the GROUP BY with the ROLLUP key word. Based on the results of the GROUP BY it offers additional rows for every superordinate group, which arises by omitting the grouping criterias one after the other.

SELECT producer, model, SUM(counter) AS cnt
FROM   car_pool
GROUP BY ROLLUP (producer, model);  -- the MySQL syntax is: GROUP BY producer, model WITH ROLLUP
--
Toyota	Corolla	13
Toyota	Hilux	3
Toyota	Prius	2
Toyota		18   <-- the additional row per first producer
VW	Beetle	7
VW	Golf	13
VW	Passat	6
VW		26   <-- the additional row per next producer
		44   <-- the additional row per all producers

The simple GROUP BY clause creates rows at the level of producer plus model. The ROLLUP key word leads to additional rows where first the model and then model and producer are omitted.

CUBE[edit]

The ROLLUP key word offers solutions where a hierarchical point of view is adequate. But in data warehouse applications one likes to navigate freely through the aggregated data, not only from top to bottom. To support this requirement, the SQL standard offers the key word CUBE. It is an extention of ROLLUP and offers additional rows for all possible combinations of the GROUP BY columns.

In the case of our above example with the two columns producer and modell the ROLLUP has created rows for 'producer-only' and 'no criteria' (= complete table). Additional to that, CUBE creates rows for 'model-only'. (If different producer would use the same model-name, such rows will lead to only 1 additional row.)

SELECT producer, model, SUM(counter) AS cnt
FROM   car_pool
GROUP BY CUBE (producer, model);  -- not supported by MySQL
--
Toyota	Corolla	13
Toyota	Hilux	3
Toyota	Prius	2
Toyota	- 	18
VW	Beetle	7
VW	Golf	13
VW	Passat	6
VW	- 	26
- 	Beetle	7        <--
- 	Corolla	13       <--
- 	Golf	13       <-- additional rows for 'model-only'
- 	Hilux	3        <--
- 	Passat	6        <--
- 	Prius	2        <--
- 	- 	44


If there are tree grouping columns c1, c2 and c3, the key words leads to the following grouping.

GROUP BY: (c1, c2, c3)
GROUP BY ROLLUP: (c1, c2, c3), (c1, c2), (c1) and ()
GROUP BY CUBE: (c1, c2, c3), (c1, c2), (c1, c3), (c2, c3), (c1), (c2), (c3) and ()