PostgreSQL/Index Types

From Wikibooks, open books for an open world
Jump to navigation Jump to search
Data Types PostgreSQL
Index Types
Parallel Queries

PostgreSQL supports the following index types: B-tree, GIN, GiST, SP-GiST, BRIN, and Hash (which is discouraged).

B-tree[edit]

The default index type of PostgreSQL (and may other database systems) is B-tree. Those indexes are suitable in situations where there is a well defined sort order for the affected values. In particular this applies to all numeric values, e.g.: ... WHERE numeric_column < 5;. The syntax is simple and conforms to those of other implementations - see wikibook SQL.

postgres=# CREATE INDEX indexname ON tablename (columnname);

There are two situations where the use of this basic index type is not possible:

  • The comparision operator may have no notion of a sort order, eg: does a text CONTAIN a given word?
  • The type of the column may be to complex to define a sort order, eg: two- or n-dimensional grafical objects like polygons.

GIN - for muliplicities[edit]

GIN (Generalized Inverted Index) indexes are appropriate for data which is constructed out of multiple components (of equal type). Examples are texts (= many words) or arrays (= many values of any basic type). Typically, queries (where-clauses) to such columns does not address the entire column value (complete text or complete array) but the values of its elements (words or array elements). A typical operator in such situation is 'contains' which may be abbreviated as @@ or @>, depending on the data type.

The syntax of the CREATE INDEX statement is extended by the key words USING GIN plus some more parameters in dependence of the data type.

postgres=# CREATE INDEX indexname ON tablename USING GIN (columnname);

Here is a first example: text retrieval

postgres=# -- create a table with a text column
postgres=# CREATE TABLE t1 (id serial, t text);
CREATE TABLE
postgres=# CREATE INDEX t1_idx ON t1 USING gin (to_tsvector('english', t));
CREATE INDEX
postgres=# INSERT INTO t1 VALUES (1, 'a fat cat sat on a mat and ate a fat rat');
INSERT 0 1
postgres=# INSERT INTO t1 VALUES (2, 'a fat dog sat on a mat and ate a fat chop');
INSERT 0 1
postgres=# -- is there a row where column t contains the two words? (syntax contains some magic to hit index)
postgres=# SELECT * FROM t1 WHERE to_tsvector('english', t) @@ to_tsquery('fat & rat');
 id |                    t                    
----+------------------------------------------
  1 | a fat cat sat on a mat and ate a fat rat
(1 row)

Hint: If you havn't created an index, the SELECTs will lead to the same result. But the performance is quite different. Because of the missing index, the SELECT executes a sequential scan over the complete table T1 and - even more costly - normalizes every term on the fly. This must be done with every SELECT whereas the INDEX terms are created and modified rarely: only during INSERT and UPDATE.

A second example: array elements

postgres=# -- create a table where one column exists of an integer array
postgres=# --
postgres=# CREATE TABLE t2 (id serial, temperatures INTEGER[]);
CREATE TABLE
postgres=# CREATE INDEX t2_idx ON t2 USING gin (temperatures);
CREATE INDEX
postgres=# INSERT INTO  t2 VALUES (1, '{11, 12, 13, 14}');
INSERT 0 1
postgres=# INSERT INTO  t2 VALUES (2, '{21, 22, 23, 24}');
INSERT 0 1
postgres=# -- Is there a row with the two array elements 12 and 11?
postgres=# SELECT * FROM t2 WHERE temperatures @> '{12, 11}';
 id | temperatures 
----+---------------
  1 | {11,12,13,14}
(1 row)

GiST - mainly for ranges[edit]

GiST (Generalized Search Tree) is an index type which supports - among others - range searches. GIN and B-tree index nodes consists of single values like '5' or 'rat'. In contrast GiST index nodes consists of multiple values - in mimimum (in an one-dimensional space) a 'from' value and a 'to' value. The nodes are organized within a tree in a way, that the from/to of every upper node covers the from/to of all its child nodes. This is the basis for an excellent performance of range searches - the number of consulted nodes during tree traversal gets minimal.

The syntax of the CREATE INDEX statement is extended by the key words USING GIST.

postgres=# CREATE INDEX indexname ON tablename USING GIST (columnname);

Range searches are essential for spatial data. The example checks whether given circles lie within another circle. Please note, that basic geometry data types like point or circle are part of PostgreSQL's core - without the extension PostGIS.

postgres=# -- create a table with a column of non-trivial type
postgres=# --
postgres=# CREATE TABLE t3 (id serial, c circle);
CREATE TABLE
postgres=# CREATE INDEX t3_idx ON t3 USING gist(c);
CREATE INDEX
postgres=# INSERT INTO t3 VALUES (1, circle '((0, 0), 0.5)');
INSERT 0 1
postgres=# INSERT INTO t3 VALUES (2, circle '((1, 0), 0.5)');
INSERT 0 1
postgres=# INSERT INTO t3 VALUES (3, circle '((0.3, 0.3), 0.3)');
INSERT 0 1
postgres=# -- which circles lie in the bounds of the unit circle?
postgres=# SELECT * FROM t3 WHERE circle '((0, 0), 1)' @> c;
 id |        c       
----+-----------------
  1 | <(0,0),0.5>
  3 | <(0.3,0.3),0.3>
(2 rows)

Hint: The GiST index type offers much more advantages, especially he is extensible. You can create new index types and operators which are suitable for particular use cases to support domain-specific queries, eg: 'is picture X similar to picture Y?'.

Data Types PostgreSQL
Index Types
Parallel Queries