Multiversion Concurrency Control (MVCC) is a broadly used database technique to accomplish two goals: first, it provides every concurrently running transaction with a consistent state of the data as of the time of the transaction's start - this state is called a snapshot -, and second, it minimizes lock contention and guarantees high performance in a multi-user environment. MVCC implementation is very complex. This short introduction is provides a high-level overview.
Note: On this page, row is used in the conventional sense of relational database systems, and record denotes one or more versions of this row.
The basic idea is, that the modification of a column value does not lead to a modification of the value in the original record. Instead PostgreSQL creates an additional record which contains the changed value. Thus an UPDATE command does not really update values in a record. It creates a new record and leaves the values in the old record unchanged.
This behaviour raises the question, which one of the multiple records of the row shall be delivered to concurrent or later running processes. To resolve the situation PostgreSQL adds two additional hidden system columns to every record. The columns
xmax contain transaction IDs (xid). Those transaction IDs act like a serial data type . They are increased by 1 for every new transaction within the system. Therefore a lower xid indicates an older transaction.
xmin contains the xid of the transaction that created the record. Transactions with an xid greater that
xmin are younger and may see this record.
xmax is initially 0. During write operations it changes to the xid of the transaction, which has created the next version of this record - or has deleted the record. Transactions with an xid greater that
xmax are younger and cannot see this record, if
xmax differs from 0.
Example 1: INSERT
|Time||xmin||xmax||column value||Transaction #20||Transaction #21|
The initial INSERT creates one record.
xmin contains the xid of this transaction and
xmax the value 0 to indicate, that the record is not deleted. The SELECT of transaction #21 sees this record because its xid is greater than
Example 2: UPDATE
|Time||xmin||xmax||column value||Transaction #25||Transaction #33|
UPDATE stores its own xid in
xmax of the already existing record and creates a new record with the new column value in the same way as an INSERT. The SELECT of transaction #33 sees only the record with the column value 2. The first record is invisible as its
xmax is smaller (= older) than the transactions xid.
Example 3: DELETE
|Time||xmin||xmax||column value||Transaction #101||Transaction #120|
xmax in the last record to its own xid - nothing else is done! Transaction #120 cannot see any of the two records as its xid is heigher than
xmax is not 0).
The three examples show only the basic operations. The situation can become much more complicated, if the SELECT runs before the COMMIT of the writing transaction, if the writing transaction use multiple write-operations within its transaction context, if the writing transaction aborts, if the number of xids gets exhaused and wraps arround, ... .
VACUUM reclaims storage occupied by dead tuples. In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Therefore it's necessary to do VACUUM periodically, especially on frequently-updated tables.
If no additional action is taken, the MVCC technique significantly increases the size of any database that performs a large number of write actions. The number of records continues to grow even if the overall number of rows keeps constant or decreases. It is absolutely necessary to periodically remove such records that are no longer valid. The utility
vacuumdb and the non-standard SQL command
vacuum remove the invalid records.
Vacuum retrieves the smallest (= oldest) xid from all active transactions. All records with an
xmax smaller than this xid cannot been seen by any of the running or future transactions. Therefore it is safe, to physically remove them.
This task is very I/O intensive. Therefore, it should not run constantly, but on a periodical basis.