SPARQL/Aggregate functions
Aggregate functions are used in combination with modifier GROUP BY
.
Available aggregate functions are:
COUNT
: the number of elements. You can also writeCOUNT(*)
to simply count all results.SUM
,AVG
: the sum or average of all elements, respectively. If the elements aren't numbers, you'll get weird results.MIN
,MAX
: the minimum or maximum value of all elements, respectively. This works for all value types; numbers are sorted numerically, strings and other types lexically.SAMPLE
: any element. This is occasionally useful if you know there's only one result, or if you don't care which one is returned.GROUP_CONCAT
: concatenates all elements.
Aggregate functions with GROUP BY
[edit | edit source]See modifier GROUP BY
for examples.
# The syntax is like this
SELECT ?group1 ?group2
(COUNT(?var1) AS ?var2)
(GROUP_CONCAT(DISTINCT ?var3; SEPARATOR=", ") AS ?var4)
WHERE
{
...
}
GROUP BY ?group1 ?group2
When using aggregate functions of a label, all labels should be defined explicitly in the SERVICE.
Additionally, you can add a DISTINCT
modifier for any of these functions to eliminate duplicate results. For example, if there are two results but they both have the same value in ?var
, then COUNT(?var)
will return 2
but COUNT(DISTINCT ?var)
will only return 1
. You often have to use DISTINCT
when your query can return the same item multiple times – this can happen if, for example, you use ?item wdt:P31/wdt:P279* ?class
, and there are multiple paths from ?item
to ?class
: you will get a new result for each of those paths, even though all the values in the result are identical. (If you're not grouping, you can also eliminate those duplicate results by starting the query with SELECT DISTINCT
instead of just SELECT
.)
Aggregate functions with HAVING
[edit | edit source]Aggregate functions can also be used with the HAVING
modifier.
See for instance this query to check for duplicate IATA codes for airports:
SELECT ?object (COUNT(?subject) AS ?count)
(MIN(?subject) AS ?subject1) (MAX(?subject) AS ?subject2)
(GROUP_CONCAT(DISTINCT ?subjectLabel; SEPARATOR=", ") AS ?subjectLabels)
WHERE
{
?subject wdt:P238 ?object. # IATA airport code
SERVICE wikibase:label { bd:serviceParam wikibase:language "en".
?subject rdfs:label ?subjectLabel.
}
}
GROUP BY ?object
HAVING(COUNT(?subject) > 1)
ORDER BY ?object
References
[edit | edit source]