Structured Query Language/MERGE
Hint: Be carefull and deactivate AUTOCOMMIT.
In many cases applications want to store rows in the database without knowing whether this rows previously exist in the database or not. If the rows exist, they must use the UPDATE command, if not, the INSERT command. To do so the following construct is often used:
-- pseudocode IF (SELECT COUNT(*) = 0 ...) THEN INSERT ... ELSE UPDATE ... ;
This situation is unpleasant in many ways:
- There are two roundtrips between application and DBMS, either SELECT + INSERT or SELECT + UPDATE.
- The application must transfer one row after the other. A 'bulk storing' is not possible because the evaluation of the criterion which decides between INSERT and UPDATE may lead to different results from row to row.
- The syntax is spread across three SQL statements. This is error-prone.
To overcome the disadvantages the SQL standard defines a MERGE command, which contains the complete code shown above in one single statement. The MERGE performs an INSERT or an UPDATE depending on the existence of individual rows at the target table.
-- Define target, source, match criterion, INSERT and UPDATE within one single command MERGE INTO <target_table> <target_table_alias> -- denote the target table USING <source_table> <source_table_alias> -- denote the source table ON (<match_criterion>) -- define the 'match criterion' which compares the source and -- target rows with the same syntax as in any WHERE clause WHEN MATCHED THEN UPDATE SET column1 = value1 [, column2 = value2 ...] -- a variant of the regular UPDATE command WHEN NOT MATCHED THEN INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...]) -- a variant of the regular INSERT command ;
The target table is named after the MERGE INTO key word, the source table after the USING key word.
The comparision between target rows and source rows, which is necessary to decide between INSERT and UPDATE, is specified after the ON key word with a syntax, which is identical to the syntax of a WHERE clause. If this comparision matches, the UPDATE will be performed, else the INSERT. In simple cases the comparision compares Primary Key or Foreign Key columns. But it is also possible to use very sophisticated conditions on any column.
In the 'MATCHED' case a variant of the UPDATE follows. It differs from the regular UPDATE command in that it has no table name (the table name is already denoted after the MERGE INTO) and no WHERE clause (it uses the match criterion after the ON key word).
In the 'NOT MATCHED' case a variant of the INSERT follows. For the same reason as before the target table is not named within the INSERT.
Create a table 'hobby_shadow' to store some of the 'hobby' rows. The subsequent MERGE command shall perform an INSERT or an UPDATE depending on the existence of correlating rows.
-- store every second row in a new table 'hobby_shadow' CREATE TABLE hobby_shadow AS SELECT * FROM hobby WHERE MOD(id, 2) = 0; SELECT * FROM hobby_shadow; -- INSERT / UPDATE depending on the column 'id'. MERGE INTO hobby_shadow t -- the target USING (SELECT id, hobbyname, remark FROM hobby) s -- the source ON (t.id = s.id) -- the 'match criterion' WHEN MATCHED THEN UPDATE SET remark = concat(s.remark, ' Merge / Update') WHEN NOT MATCHED THEN INSERT (id, hobbyname, remark) VALUES (s.id, s.hobbyname, concat(s.remark, ' Merge / Insert')) ; COMMIT; -- Check the result SELECT * FROM hobby_shadow;
The MERGE command handles all rows, but there is only 1 round-trip between the application and the DBMS. Some of the rows are handled by the INSERT part of MERGE, others by its UPDATE part. This distinction may be observed by the last part of the column 'remark'.
Typical use cases for the MERGE command are ETL processes. Often those processes have to aggregate some values for a grouping criterion (eg: a product line) over a time period. The first access per product line and period has to insert new rows with given values, subsequent accesses have to update them by increasing values.
The SQL standard defines some more features within the MERGE command.
The WHEN MATCHED and WHEN NOT MATCHED clauses may be extended by an optional query expression like
AND (place_of_birth = 'Dallas'). As a consequence, it's possible to use a series of WHEN MATCHED / WHEN NOT MATCHED clauses.
... WHEN MATCHED AND (t.hobby_name IN ('Fishing', 'Underwater Diving')) THEN UPDATE SET remark = concat('Water sports: ', t.remark) WHEN MATCHED AND (t.hobby_name IN ('Astronomy', 'Microscopy', 'Literature')) THEN UPDATE SET remark = concat('Semi-professional leisure activity: ', t.remark) WHEN MATCHED THEN UPDATE SET remark = concat('Leisure activity: ', t.remark) ... -- The same is possible with WHEN NOT MATCHED in combination with INSERT
Within a WHEN MATCHED clause it is possible to use a DELETE command instead of an UPDATE to remove the matched row. This feature may be combined with the previous presented extension by an optional query expression. In the SQL standard the DELETE command is not applicable to the WHEN NOT MATCHED clause.
-- Update 'Fishing' and 'Underwater Diving'. Delete all others which have a match between source and target. ... WHEN MATCHED AND (t.hobby_name IN ('Fishing', 'Underwater Diving')) THEN UPDATE SET remark = concat('Water sports: ', t.remark) WHEN MATCHED THEN DELETE ...
The MERGE command is clearly defined by standard SQL. The command itself as well as the extensions described before are implemented by a lot of DBMS. Deviating from the standard most implementations unfortunatelly use different and/or additional keywords and - sometimes - different concepts. Even the introductive key words MERGE INTO may differ from the standard.
A) Create a new table 'contact_merge' with the same structure as 'contact'.
B) Copy row number 3 from 'contact' to 'contact_merge'.
C) Use the MERGE command to insert/update all E-Mail-adresses from 'contact' to 'contact_merge' and add the e-mail-protocol name to the contact values (prepend column contact_value by the string 'mailto:').
-- Create table and copy one row CREATE TABLE contact_merge AS SELECT * FROM contact WHERE id = 3; SELECT * FROM contact_merge; -- INSERT / UPDATE depending on the column 'id'. MERGE INTO contact_merge t -- the target USING (SELECT id, person_id, contact_type, contact_value FROM contact WHERE contact_type = 'email') s -- the source ON (t.id = s.id) -- the 'match criterion' WHEN MATCHED THEN UPDATE SET contact_value = concat('mailto:', t.contact_value) WHEN NOT MATCHED THEN INSERT (id, person_id, contact_type, contact_value) VALUES (s.id, s.person_id, s.contact_type, concat('mailto:', s.contact_value)) ; COMMIT; -- Check the result SELECT * FROM contact_merge;