SPARQL/Aggregate functions

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Aggregate functions are used in combination with modifier GROUP BY.

Available aggregate functions are:

  • COUNT: the number of elements. You can also write COUNT(*) to simply count all results.
  • SUM, AVG: the sum or average of all elements, respectively. If the elements aren't numbers, you'll get weird results.
  • MIN, MAX: the minimum or maximum value of all elements, respectively. This works for all value types; numbers are sorted numerically, strings and other types lexically.
  • SAMPLE: any element. This is occasionally useful if you know there's only one result, or if you don't care which one is returned.
  • GROUP_CONCAT: concatenates all elements.

Aggregate functions with GROUP BY[edit | edit source]

See modifier GROUP BY for examples.

# The syntax is like this
SELECT ?group1 ?group2
               (COUNT(?var1) AS ?var2)
               (GROUP_CONCAT(DISTINCT ?var3; SEPARATOR=", ") AS ?var4)
WHERE 
{
...
}
GROUP BY ?group1 ?group2

When using aggregate functions of a label, all labels should be defined explicitly in the SERVICE.

Additionally, you can add a DISTINCT modifier for any of these functions to eliminate duplicate results. For example, if there are two results but they both have the same value in ?var, then COUNT(?var) will return 2 but COUNT(DISTINCT ?var) will only return 1. You often have to use DISTINCT when your query can return the same item multiple times – this can happen if, for example, you use ?item wdt:P31/wdt:P279* ?class, and there are multiple paths from ?item to ?class: you will get a new result for each of those paths, even though all the values in the result are identical. (If you're not grouping, you can also eliminate those duplicate results by starting the query with SELECT DISTINCT instead of just SELECT.)

Aggregate functions with HAVING[edit | edit source]

Aggregate functions can also be used with the HAVING modifier.

See for instance this query to check for duplicate IATA codes for airports:

SELECT ?object (COUNT(?subject) AS ?count)
               (MIN(?subject) AS ?subject1) (MAX(?subject) AS ?subject2)
               (GROUP_CONCAT(DISTINCT ?subjectLabel; SEPARATOR=", ") AS ?subjectLabels)
WHERE
{
  ?subject wdt:P238 ?object.         # IATA airport code
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". 
                         ?subject rdfs:label ?subjectLabel.
                         }
}
GROUP BY ?object
HAVING(COUNT(?subject) > 1)
ORDER BY ?object

Try it!

References[edit | edit source]