To promote a consistent use and understanding of important terms we list and define them here. For some terms we include short annotations to give a first introduction to the subject.
- 1 Database Cluster
- 1.1 Overview
- 1.2 Server (or Node)
- 1.3 Cluster of Nodes
- 1.4 Installation
- 1.5 Server Database
- 1.6 Cluster (or 'Database Cluster')
- 1.7 Instance (or 'Database Server Instance' or 'Database Server' or 'Backend')
- 1.8 Database
- 1.9 Schema
- 1.10 Search Path (or 'Schema Search Path')
- 1.11 initdb (OS command)
- 1.12 createdb (OS command)
- 1.13 CREATE DATABASE (SQL command)
- 1.14 Directory Structure
- 2 Consistent Writes
- 3 Backup and Recovery
- 4 Replication
Server (or Node)
A server is some (real or virtual) hardware where PostgreSQL is installed. In this document, the word instance is a different concept from server. See the definition of instance later in this document.
Cluster of Nodes
A set of nodes, which interchange information via replication.
After you have downloaded and installed PostgreSQL, you have a set of programs, scripts, configuration- and other files on a server. This set is called the 'Installation'. It includes all instance programs as well as some client programs like
The term server database is often used in the context of client/server connections to refer to an instance or a single database.
Cluster (or 'Database Cluster')
A cluster is a storage area (directory, subdirectories and files) in the file system, where a collection of databases plus meta-information resides. Within the database cluster there are also the definitions of global objects like users and their rights. They are known across the entire database cluster. (Access rights for an user may be limited to individual objects like a certain table or a certain schema. In that case, the user will not have this access rights to the other objects of the cluster.)
Within a database cluster there are at least three databases: 'template0', 'template1', 'postgres' and possibly more.
- 'template0': A template database, which may be used by the command
CREATE DATABASE(template0 should never be modified)
- 'template1': A template database, which may be used by the command
CREATE DATABASE(template1 may be modified by DBA)
- 'postgres': An empty database, mainly for maintenance purposes
Most PostgreSQL installations use only one database cluster. Its name is 'main'. But you can create more clusters on the same PostgreSQL installation, see tools
initdb further down.
Instance (or 'Database Server Instance' or 'Database Server' or 'Backend')
An instance is a group of processes (on a UNIX server) or one service (on a Windows server) plus shared memory, which controls and manages exactly one cluster. Using IP terminology one can say that one instance occupies one IP/port combination, eg. the combination http://localhost:5432. It is possible that on a different port of the same server another instance is running. The processes (in a UNIX server), which build an instance, are called: postmaster (creates one 'postgres'-process per client-connection), logger, checkpointer, background writer, WAL writer, autovacuum launcher, archiver, stats collector. The role of each process is explained in the chapter architecture.
If you have many clusters on your server, you can run many instances at the same machine - one per cluster.
Hint: Other publications sometimes use the term server to refer to an instance. As the term server is widely used to refer to real or virtual hardware, we do not use server as a synonym for instance.
A database is a storage area in the file system, where a collection of objects is stored in files. The objects consist of data, metadata (table definitions, data types, constraints, views, ...) and other data like indices. Those objects are stored in the default database 'postgres' or in a newly created database.
The storage area for one database is organized as one subdirectory tree within the storage area of the database cluster. Thus a database cluster may contain multiple databases.
In a newly created database cluster (see below: initdb) there is an empty database with the name 'postgres'. In most cases this database stays empty and application data is stored in separate databases like 'finance' or 'engineering'. Nevertheless 'postgres' should not be dropped because some tools try to store temporary data within this database.
A schema is a namespace within a database: it contains named objects (tables, data types, functions, and operators) whose names can duplicate those of other objects existing in other schemas of this database.
Every database contains the default schema 'public' and may contain more schemas. All objects of one schema must reside within the same database. Objects of different schemas within the same database may have the same name.
There is another special schema in each database. The schema 'pg_catalog' contains all system tables, built-in data types, functions, and operators. See also 'Search Path' below.
Search Path (or 'Schema Search Path')
A Search Path is a list of schema names. If applications use unqualified object names (e.g.: 'employee_table' for a table name), the search path is used to locate this object in the given sequence of schemas. The schema 'pg_catalog' is always the first part of the search path although it is not explicitly listed in the search path. This behaviour ensures that PostgreSQL finds the system objects.
initdb (OS command)
Despite of its name the utility
initdb creates a new cluster, which contains the 3 databases 'template0', 'template1' and 'postgres'.
createdb (OS command)
createdb creates a new database within the actual cluster.
CREATE DATABASE (SQL command)
The SQL command
CREATE DATABASE creates a new database within the actual cluster.
A cluster and its databases consists of files, which hold data, actual status information, modification information and a lot more. Those files are organized in a fixed way under one directory node.
Shared bufferes are RAM pages, which mirror pages of data files on disc. They exist due to performance reasons. The term shared results from the fact that a lot of processes read and write to that area.
Pages in the shared buffers mirror pages of data files on disc. When clients request changes of data, those pages get changed without - provisionally - a change of the related pages on disc. Until the background writer writes those modified pages to disc, they are called 'dirty' pages.
A checkpoint is a special point in time where it is guarantied that the database files are in a consistent state. At checkpoint time all change records are flushed to the WAL file, all dirty data pages (in shared buffers) are flushed to disc, and at last a special checkpoint record is written to the WAL file.
The instance's checkpointer process automatically triggers checkpoints on a regular basis. Additionally they can be forced by issuing the command
CHECKPOINT in a client program. For the database system it takes a lot of time to perform a checkpoint - because of the physical writes to disc.
WAL files contain the changes which are applied to the data by modifying commands like
CREATE TABLE ... . This is redundant information as it is also recorded in the data files (for better performance at a later time). According to the configuration of the instance there may be more information within WAL files. WAL files reside in the
pg_wal directory (which was named
pg_xlog before version 10), have a binary format and a fixed size of 16MB. When they are no longer needed, they get recycled by renaming and reusing their already allocated space.
A single information unit within a WAL file is called a log record.
Hint: In the PostgreSQL documentation and in related documents there are a lot of other, similar terms which refer to what we denote as WAL file in this Wikibook: segment, WAL segment, logfile (don't mix it with the term logfile, see below), WAL log file, ... .
The instance logs and reports warning and error messages about special situations in readable text files. These logfiles can reside at any place in the directory structure of the server and are not part of the cluster.
Hint: The term 'logfile' does not relate to the other terms of this subchapter. It is mentioned here because the term sometimes is used as a synonym for what we call WAL file - see above.
A log record is a single information unit within a WAL file.
The term segment is sometimes used as a synonym for WAL file.
Multiversion Concurrency Control (MVCC) is a common database technique to accomplish two goals: First, it allows the management of parallel running transactions on a logical level and second, it ensures high performance for concurrent read and write actions. It is implemented as follows: Whenever some values of an existing row change, PostgreSQL writes a new version of this row to the database without deleting the old one. In such situations the database contains multiple versions of the row. In addition to their regular data the rows contain transaction IDs which allows to decide, which other transactions will see the new or the old row. Hence other transactions sees only those values (of other transactions), which are committed.
Outdated old rows are deleted at a later time by the utility
vacuumdb respectively the SQL command
Backup and Recovery
The term cold as an addition to the backup method name indicates that with this method the instance must be stopped to create a useful backup. In contrast, the addition 'hot' denotes methods where the instance MUST run (and hence changes to the data may occur during backup actions).
A cold backup is a consistent copy of all files of the cluster with OS tools like cp or tar. During the creation of a cold backup the instance must not run - otherwise the backup is useless. Hence you need a period of time in which applications do not use any database of the cluster - a continuous 7×24 operation mode is not possible. And secondly: the cold backup works only on the cluster level, not on any finer granularity like database or table.
Hint: A cold backup is sometimes called an "offline backup".
A logical backup is a consistent copy of the data within a database or some of its parts. It is created with the utility
pg_dump may run in parallel to applications (the instance must be up), it creates a consistent snapshot as of the time of its start.
pg_dump supports two output formats. The first one is a text format containing SQL commands like CREATE and INSERT. Files created in this format may be used by
psql to restore the backed-up data. The second format is a binary format and is called the 'archive format'. Files with this format can be used to restore its data with the tool
pg_dump works at the database level or smaller parts of databases like tables. If you want to refer to the cluster level, you must use
pg_dumpall. Please notice, that important objects like users/roles and their rights are always defined at cluster level.
Hint: A logical backup is one form of an "online backup".
A physical backup is a possibly inconsistent copy of the files of a cluster, created with an operating system utility like cp or tar. At first glance such a backup seems to be useless. To understand its purpose, you must know PostgreSQL's recover-from-crash strategy.
- At all times and independent from any backup/recovery action, PostgreSQL maintains WAL files - primarily for crash-safety purposes. WAL files contain log records, which reflect all changes made to the data. In the case of a system crash those log records are used to recover the cluster to a consistent state. The recover process searches the timestamp of the last checkpoint and replays all subsequent log records in chronological order against this version of the cluster. Through this action the cluster returns to a consistent state and will contain all changes up to the last COMMIT.
The existence of a physical backup plus WAL files in combination with this recovery-from-crash technique can be used for backup/recovery purposes. To implement this, you need a physical backup (which may reflect an inconsistent state of the cluster) and which acts as the starting point. Additionally you need all WAL files since the point in time when you have created this backup. The recover process uses the described recovery-from-crash technique and replays all log records in the WAL files against the backup. In the exact same way as before, the cluster comes to a consistent state and contains all changes up to the last COMMIT.
Please keep in mind, that physical backups work only on cluster level, not on any finer granularity like database or table.
Hint: A physical backup is one form of an "online backup".
If the previously mentioned physical backup is used during recovery, the recovery process is not forced to run up to the latest available timestamp. Via a parameter you can stop it at a time in the past. This leads to a state of the cluster at this moment. Using this technique you can restore your cluster to a time, which is between the time of creating the physical backup and the end of the last WAL file.
Standalone (Hot) Backup
The standalone backup is a special variant of the physical backup. It offers online backup (the instance keeps running) but it lacks the possibility of PITR. The recovery process recovers always up to the end of the standalone backup process. WAL files, which arise after this point in time, cannot be applied to this kind of backup. Details are described here.
Archiving is the process of copying WAL files to a failsafe location. When you plan to use PITR you must ensure that the sequence of WAL files is saved for a longer period. To support the process of copying WAL files at the right moment (when they are completely filled and a switch to the next WAL file has taken place), PostgreSQL runs the archiving process which is part of the instance. This process copies WAL files to a configurable destination.
Recovering is the process of playing WAL files against a physical backup. One of the involved steps is the copy of the WAL files from the failsafe archive location to its original location in '/pg_xlog'. The aim of recovery is bringing the cluster into a consistent state at a defined timestamp.
Archive Recovery Mode
When recovering takes place, the instance is in archive recovery mode.
A restartpoint is an action similar to a checkpoint. Restartpoints are only performed when the instance is in archive recovery mode or in standby mode.
After a successful recovery PostgreSQL transfers the cluster into a new timeline to avoid problems, which may occur when PITR is reset and WAL files reapplied (e.g.: to a different timestamp). Timeline names are sequential numbers: 1, 2, 3, ... .
Replication is a technique to send data, which was written within a master server, to one or more standby servers or even another master server.
The master server is an instance on a server which sends data to other instances in addition to its local processing of data.
The standby server is an instance on a server which receives information from a master server about changes of its data.
Warm Standby Server
A warm standby server is a running instance, which is in standby_mode (recovery.conf file). It continuously reads and processes incoming WAL files (in the case of log-shipping) or log records (in the case of streaming replication). It does not accept client connections.
Hot Standby Server
A hot standby server is a warm standby server with the additional flag hot_standby in postgres.conf. It accepts client connections and read-only queries.
Replication is called synchronous, when the standby server processes the received data immediately, sends a confirmation record to the master server and the master server delays its COMMIT action until he has received the confirmation of the standby server.
Replication is called asynchronous, when the master server sends data to the standby server and does not expect any feedback about this action.
The term is used when log entries are transfered from master server to standby server over a TCP connection - in addition to their transfer to the local WAL file. Streaming replication is asynchronous by default but can also be synchronous.
Log shipping is the process of transfering WAL files from a master server to a standby server. Log shipping is an asynchronous operation.