PostgreSQL/Parallel Queries

Since version 9.6 PostgreSQL supports parallel processing of queries. Within most of today's servers there are a lot of CPUs. Their concurrent usage can shorten the elapsed time of queries significantly. Therefore the query optimizer tries to create a plan, which leads to more than one executing process per query. At runtime those processes work concurrently and in a coordinated way on diverse parts of the shared buffers.

Parallel execution is initiated by so called gather nodes of the execution plan. When they are reached at runtime, the actual running process requests the planned number of additional processes (background worker processes). The original process plus the additional processes execute the child node of the plan in parallel. The gather node has the additional duty to collect and accumulate the results of its child processes.

The feature is not used in all situations. This results from three different effects: type of query, parameterization of PostgreSQL, and the actual implementation.

If a query leads to an execution plan, which is highly I/O intensive, it doesn't benefit greatly from parallelization as parallelization is a RAM-access feature. In contrast to this, queries which needs high CPU activities (eg: ... where text like '%xyz%'; without an adequate index) will benefit much more. Therefore it is more likely that parallelization is chosen for the second type of queries.
The default behaviour of PostgreSQL (in version 9.6) is to use the traditional behaviour by invoking one single process. If one wants to use the parallelization feature, he must set some parameters: max_parallel_workers_per_gather defines the maximum number of processes which are allowed to run in parallel with each gather node. As it defaults to 0 it leads to the traditional behaviour - unless the value is changed. As mentioned above, each process working in parallel to the gather node is realized in a background worker process. The total number of background worker processes per instance is limited by max_worker_processes and defaults to 8. So it may be necessary to increase the value. Additionally, the parameter dynamic_shared_memory_type must be set to a value other than none.
The actual implementation in version 9.6 contains a lot of limitations which result from the fact that it must be ensured that this basic implementation is stable in all environments. It is likely that in further releases some of them disappear.

It is restricted to purely read-only commands: no UPDATE, DELETE, nor the CTE part of any writing command.
If there is a lock for any involved row.
If the transactions isolation level is serializable.
If the query is running inside of another query which is already parallelized. For example, if a function called by a parallel query issues an SQL query itself.

See also