SPARQL/Expressions and Functions

From Wikibooks, open books for an open world
Jump to navigation Jump to search
Variables SPARQL
Expressions and Functions
Federated query

Expressions[edit]

BIND[edit]

The BIND( expression AS ?variable ). clause can be used to assign the result of an expression to a variable (usually a new variable, but you can also overwrite existing ones).

# Persons died by capital punishment
SELECT ?person ?personLabel ?personDescription ?age
WHERE
{
  ?person wdt:P31 wd:Q5;
          wdt:P569 ?born;
          wdt:P570 ?died;
          wdt:P1196 wd:Q8454.
  BIND(?died - ?born AS ?ageInDays).
  BIND(?ageInDays/365.2425 AS ?ageInYears).
  BIND(FLOOR(?ageInYears) AS ?age).
  # or, as one expression:
  #BIND(FLOOR((?died - ?born)/365.2425) AS ?age).
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

BIND can also be used to simply bind constant values to variables in order to increase readability. For example, a query that finds all female priests:

SELECT ?woman ?womanLabel
WHERE
{
  ?woman ?instanceOf ?human;
         ?sexOrGender ?female;
         ?occupation ?priest.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

  # linking to wikidata Q and P codes
  BIND(wdt:P31 AS ?instanceOf).
  BIND(wd:Q5 AS ?human).
  BIND(wdt:P21 AS ?sexOrGender).
  BIND(wd:Q6581072 AS ?female).
  BIND(wdt:P106 AS ?occupation).
  BIND(wd:Q42603 AS ?priest).
}

Try it!

The above gives the same result as

SELECT ?woman ?womanLabel
WHERE
{
  ?woman wdt:P31 wd:Q5;        # instance of human
         wdt:P21 wd:Q6581072;  # sex or gender female
         wdt:P106 wd:Q42603.   # occupation priest
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

IF[edit]

The IF( condition, thenExpression, elseExpression ) expressions are used to assign different values based on a condition.

For instance to calculate the age, precisely to the day. In the expression BIND( .... - (IF( condition,1,0 ) AS ?age) the condition determines to subtract 1 or 0 (nothing) from the ?age based on the month and the day of the birth and the death.

# Query to find all musicians who have already died 
# calculate their age (full years) at death
# count how many of them died at each age
#defaultView:LineChart
SELECT ?age  (COUNT (DISTINCT ?a) AS ?count) 
WHERE {
        ?a wdt:P31 wd:Q5.                   #instance of human
        ?a wdt:P106/wdt:P279 wd:Q639669.    #occupation a subclass of musician
        ?a p:P569/psv:P569 ?birth_date_node. 
        ?a p:P570/psv:P570 ?death_date_node.
        ?birth_date_node wikibase:timeValue ?birth_date.
        ?death_date_node wikibase:timeValue ?death_date.
        BIND( YEAR(?death_date) - YEAR(?birth_date) - 
              IF(MONTH(?death_date)<MONTH(?birth_date) || 
                 (MONTH(?death_date)=MONTH(?birth_date) && DAY(?death_date)<DAY(?birth_date)),1,0) AS ?age )
        # calculate the age, precisely to the day (times and timezones ignored)
        FILTER(?age > 10 && ?age < 100).    #ignore outlyers, several of which are probably errors
}
GROUP BY ?age
ORDER BY ?age

Try it!

Here an example to divide population in groups. The ?layer in the #defaultView:Map shows it in different colors.

#Big cities, grouped into map layers by population
#defaultView:Map
SELECT DISTINCT ?city ?cityLabel (SAMPLE(?location) AS ?location) (MAX(?population) AS ?population) (SAMPLE(?layer) AS ?layer)
WHERE
{
  ?city wdt:P31/wdt:P279* wd:Q515;
        wdt:P625 ?location;
        wdt:P1082 ?population.
  FILTER(?population >= 500000).
  BIND(
    IF(?population < 1000000, "<1M",
    IF(?population < 2000000, "1M-2M",
    IF(?population < 5000000, "2M-5M",
    IF(?population < 10000000, "5M-10M",
    IF(?population < 20000000, "10M-20M",
    ">20M")))))
    AS ?layer).
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?city ?cityLabel
ORDER BY ?population

Try it!

Operators[edit]

The familiar mathematical operators are available: +, -, *, / to add, subtract, multiply or divide numbers.

The operators <, >, =, <=, >= are available to compare values. Comparison is also defined for other types; for example, "abc" < "abd" is true (lexical comparison), as is "2016-01-01"^^xsd:dateTime > "2015-12-31"^^xsd:dateTime and wd:Q4653 != wd:Q283111. The inequality test ≠ is written !=. The ! can also be used as a prefix to functions that results a boolean like !BOUND and !REGEX.

Boolean conditions can be combined with && (logical and: a && b is true if both a and b are true) and || (logical or: a || b is true if either (or both) of a and b is true).

Examples IF( ?a != ?b, ... , ... ) and IF( ?a = ?b && ?c = ?d, ... , ... ).

VALUES[edit]

The VALUES ?var { val1 ... } clause generates a variable with one or more values. The values can be of any type, like numeric, strings, or even items, like VALUES ?number { 1 2 3 }, VALUES ?abc { "a" "b" "c" } or VALUES ?city { wd:Q84 wd:Q90 }.

SELECT ?city ?cityLabel WHERE {
     VALUES ?city { wd:Q84 wd:Q90 }. # London, Paris 
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }  
}

Try it!

Values can have more dimensions too. The dimensions are grouped by ( ) like VALUES ( ?varA ?varB ... ) { ( valA1 valB1 ... ) ... }.

SELECT ?country ?capital WHERE {
     VALUES ( ?country ?capital ) { 
            ( "United Kingdom" "London" )
            ( "France"         "Paris"  )
            ( "Etc"            UNDEF    ) # Use UNDEF for an undefined value
            }
}

Try it!

Functions[edit]

General functions[edit]

DATATYPE[edit]
See also: d:Special:ListDatatypes

Each value in SPARQL has a type, which tells you what kind of value it is and what you can do with it. The most important types are:

  • item, like wd:Q42 for Douglas Adams (Q42).
  • boolean, with the two possible values true and false. Boolean values aren’t stored in statements, but many expressions return a boolean value, e. g. 2 < 3 (true) or "a" = "b" (false).
  • string, a piece of text. String literals are written in double quotes.
  • monolingual text, a string with a language tag attached. In a literal, you can add the language tag after the string with an @ sign, e. g. "Douglas Adams"@en.
  • numbers, either integers (1) or decimals (1.23).
  • dates. Date literals can be written by adding ^^xsd:dateTime (case sensitive – ^^xsd:datetime won’t work!) to an ISO 8601 date string: "2012-10-29"^^xsd:dateTime.
# Date related properties of Bach
SELECT ?predicate ?pLabel ?object
WHERE
{
  wd:Q1339 ?predicate ?object.         # Bach
  FILTER(DATATYPE(?object) = xsd:dateTime).
  
  BIND( IRI(REPLACE( STR(?predicate),"prop/direct/","entity/" )) AS ?p). 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?object

Try it!

Some datatypes are rdf:langString, xsd:string, xsd:integer, xsd:dateTime, xsd:decimal or "<http://www.opengis.net/ont/geosparql#wktLiteral>".

STR[edit]

The STR( value ) function converts values to a string. It also resolves prefixes, for instance wd:Q1339 will be converted into http://www.wikidata.org/entity/Q1339. The reversal of this is the IRI( string ), which will convert a string to an IRI.

SELECT ?item ?itemLabel ?string ?StringLabel ?iri ?iriLabel 
WHERE {
    VALUES ?item { wd:Q1339 }.
    BIND( STR(?item) AS ?string ).
  	BIND( IRI(?string) AS ?iri ).
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

IRI[edit]

the IRI( string ), which will convert a string to an IRI, by applying the prefixes.

In the example below a Predicate like wdt:P569 get converted to a string http://www.wikidata.org/prop/direct/P569. To get a Label for the predicate it first needs to replaced to an entity (http://www.wikidata.org/entity/P569), which with IRI gets converted to wd:P569. The label of this can then be displayed as "date of birth (P569)".

# Date related properties of Bach
SELECT ?object ?predicate ?string ?entity ?p ?pLabel 
WHERE
{
  wd:Q1339 ?predicate ?object.                    # Bach
  FILTER(DATATYPE(?object) = xsd:dateTime).
  BIND( STR(?predicate) AS ?string ).
  BIND( REPLACE( ?string,"prop/direct/","entity/" ) AS ?entity ).
  BIND( IRI(?entity) AS ?p ).
  
#  or all on one line: 
#  BIND( IRI(REPLACE( STR(?predicate),"prop/direct/","entity/" )) AS ?p). 
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?object

Try it!

LANG[edit]

The LANG( string ) function retrieves the language tag of Labels, Descriptions, Aliases and of Monolingual texts.

#Countries in European Union with native name and language
SELECT ?country ?countryLabel ?nativename ?language
{
  wd:Q458 wdt:P150 ?country.   # European Union  contains administrative territorial entity
  OPTIONAL { ?country wdt:P1705 ?nativename.
              BIND( LANG(?nativename) AS ?language). }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?language)

Try it!

BOUND[edit]

The function BOUND( variable ) returns true if variable is bound to a value. It returns false otherwise. The BOUND expression is used often with IF(condition,thenExpression,elseExpression) expressions. For example, suppose you have a query that shows some humans, and instead of just showing their label, you’d like to display their pseudonym (P742) if they have one, and only use the label if a pseudonym doesn’t exist. For this, you select the pseudonym in an OPTIONAL clause (it has to be optional – you don’t want to throw out results that don’t have a pseudonym), and then use BIND(IF(BOUND(… to select either the pseudonym or the label.

SELECT ?writer ?label
WHERE
{
  ?writer wdt:P31 wd:Q5;                                         # French writer 
          wdt:P27 wd:Q142;
          wdt:P106 wd:Q36180;
          wdt:P569 ?dob.
  FILTER("1751-01-01"^^xsd:dateTime <= ?dob && ?dob < "1801-01-01"^^xsd:dateTime) # born in the second half of the 18th century
  ?writer rdfs:label ?writerLabel.                               # get the English label
  FILTER(LANG(?writerLabel) = "en")  
  OPTIONAL { ?writer wdt:P742 ?pseudonym. }                      # get the pseudonym, if it exists
  BIND(IF(BOUND(?pseudonym),?pseudonym,?writerLabel) AS ?label). # bind the pseudonym, or if it doesn’t exist the English label, as ?label
}

Try it!

IN and NOT IN[edit]

The function value IN( list of values ) checks if a value is found in a list of values. For example 2 IN( 1, 2, 3 ) is true. A similar function value NOT IN( list of values ) checks if a value is not found in a list of values.

# Railway stations in London or Paris
SELECT ?cityLabel ?station ?stationLabel ?location 
WHERE {
  ?station wdt:P31 wd:Q55488.         # is a railway station
  ?station wdt:P131* ?city.
  ?station wdt:P625  ?location
  FILTER( ?city IN(wd:Q84, wd:Q90) ). # in London or Paris
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,fr". }
}

Try it!

isBLANK[edit]

The isBLANK( variable ) expression checks if a value is an "unknown value".

#Demonstrates "unknown value" handling
SELECT ?human ?humanLabel
WHERE
{
	?human wdt:P21 ?gender
	FILTER isBLANK(?gender) .
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

Try it!

Wikidata value can also have special value "no value". With normal triples this simply results in the absence of a result.

# persons who were stateless (country of citizenship: no value) for some time (start time and end time qualifiers)
SELECT ?person ?personLabel ?start ?end 
WHERE {
  ?person wdt:P31 wd:Q5;
          p:P27 [
            a wdno:P27;			# no value for country of citizenship
            pq:P580 ?start;
            pq:P582 ?end
          ].
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?end - ?start)

Try it!

The prefix wdno: selects the "no value" node.
The code a is a special word.

Functions on Strings[edit]

STRLEN[edit]

The function STRLEN( string ) returns the length of a string. For instance STRLEN(?name) where ?name is "ABC" will return 3.

SUBSTR[edit]

The function SUBSTR( string, beginposition, stringlength ) returns a substring of a string beginning at the position marked by beginposition, and will have a length as indicated. For instance SUBSTR( ?name,2,3 ) where ?name = "ABCDEFGH" will return "BCD". A substring without length like SUBSTR( ?name,2 ) will return "BCDEFGH".

UCASE[edit]

The function UCASE( string ) returns the string in Upper case. For instance a string "Abc" will be returned as "ABC".

LCASE[edit]

The function LCASE( string ) returns the string in Lower case. For instance a string "Abc" will be returned as "abc".

STRSTARTS[edit]

The function STRSTARTS( string, comparestring ) checks if the string starts with the compare string. For instance STRSTARTS( "ABCDEFGH", "ABC" ) returns true, indicating that the string starts with "ABC".

STRENDS[edit]

The function STRENDS( string, comparestring ) checks if the string ends with the compare string. For instance STRENDS( "ABCDEFGH", "FGH" ) returns true, indicating that the string ends with "FGH".

CONTAINS[edit]

The function CONTAINS( string, comparestring ) checks if the string contains the compare string. For instance CONTAINS( "ABCDEFGH", "DEF" ) returns true, indicating that the string contains "DEF".

STRBEFORE[edit]

The function STRBEFORE( string, comparestring ) returns the part of the string before the compare string. For instance STRBEFORE( "ABCDEFGH", "DEF" ) returns "ABC". If the compare string is not found it returns "".

STRAFTER[edit]

The function STRAFTER( string, comparestring ) returns the part of the string after the compare string. For instance STRAFTER( "ABCDEFGH", "DEF" ) returns "GH". If the compare string is not found, it returns "".

ENCODE_FOR_URI[edit]

The function ENCODE_FOR_URI( string ) converts the special characters in the string, to be able to use it in an web URL. For instance ENCODE_FOR_URI( "ABC DËFGH" ) returns "ABC%20D%C3%8BFGH".

CONCAT[edit]

The function CONCAT( string1, string2 ) returns the concatenation of both strings. For instance CONCAT( "ABCDEFGH", "XYZ" ) returns "ABCDEFGHXYZ".

LANGMATCHES[edit]

The function LANGMATCHES( languagetag, languagerange ) checks if the language tag matches the language range. For example this code FILTER LANGMATCHES( LANG(?label), "fr" ). filters labels in the french language. It is similar to FILTER (LANG(?label) = "fr"). with the difference that the LANGMATCHES filter will also output language tags with regions, e.g. "fr-BE" while (FILTER (LANG(?label) = "fr"). will only output the strings with the exact tag "fr". A language range of "*" matches any non-empty language-tag string.

REGEX[edit]

The function REGEX( string, pattern, flag ) checks if the string matches the pattern. It is mostly used in a FILTER clause like FILTER REGEX( string, pattern )..

The pattern may contain different special characters. The table is partly based in this list, but there are more special characters.

Special characters Meaning
(a|b) a or b
[abc] Range (a or b or c)
[^abc] Not (a or b or c)
[a-q] Lower case letter from a to q
[A-Q] Upper case letter from A to Q
[0-7] Digit from 0 to 7
   
Special characters Meaning
* 0 or more
+ 1 or more
? 0 or 1
{3} Exactly 3
{3,} 3 or more
{3,5} 3, 4 or 5
(pattern) Matches pattern and saves the match
\1 Retrieves the saved match
(?:pattern) Matches pattern but does not save the match
   
Special characters Meaning
^ Start of string, or start of line in multi-line pattern
\A Start of string
\b \B Word boundary / Not word boundary
\d \D Digits [0-9] / Nondigit characters [^0-9]
\p{ name } Matches any single character in the Unicode general category or named block specified by name
\w \W The characters [A-Za-z0-9_] / None of the charachters [A-Za-z0-9_]
\< Start of word
\> End of word
$ End of string, or end of line in multi-line pattern
\Z End of string

The flag is optional. Flag "i" means the match is case-insensitive.

# Names of human settlements ending in "-ow" or "-itz" in Germany
#defaultView:Map
SELECT DISTINCT ?item ?itemLabel ?coord
WHERE 
{
   ?item wdt:P31/wdt:P279* wd:Q486972;   # instance/subclass of human settlement
         wdt:P17 wd:Q183;                # Germany
         rdfs:label ?itemLabel;
         wdt:P625 ?coord;
   FILTER (LANG(?itemLabel) = "de"). 
   FILTER REGEX (?itemLabel, "(ow|itz)$").
}
LIMIT 1000

Try it!

Examples

1. title might be a rhyme FILTER(REGEX(?title, "^\\w*(\\w{3})(\\W+\\w*\\1)+$", "i") && !REGEX(?title, "^(\\w+)(\\W+\\1)+$", "i")).
2. title is an alliteration FILTER(REGEX(STR(?title), "^(\\p{L})\\w+(?:\\W+\\1\\w+){2,}$", "i")).
PS: A single \ is used as an escape symbol in strings, so \\ is used to indicate a single \.

REPLACE[edit]

The function REPLACE( string, string1, string2 ) returns the string after replacing all occurrences of string 1 into string 2. For instance REPLACE( "ABCDEFGH", "DEF", "_def_" ) returns "ABC_def_GH".

Functions on Numerics[edit]

ABS[edit]

The function ABS( number ) returns the absolute value of a number. For instance ABS( -1 ) returns 1.

ROUND[edit]

The function ROUND( number ) returns the rounded value of a number. For instance ROUND( 1.4 ) returns 1, and ROUND( 1.6 ) returns 2.

CEIL[edit]

The function CEIL( number ) returns the largest number (round up). For instance both CEIL( 1.4 ) and CEIL( 1.6 ) returns 2.

FLOOR[edit]

The function FLOOR( number ) returns the smallest number (round down). For instance both FLOOR( 1.4 ) and FLOOR( 1.6 ) returns 1.

RAND[edit]

The function RAND( ) returns a random value between 0 and 1. For instance RAND( ) returns 0.7156405780739334.

COUNT, MIN, MAX, AVG and SUM[edit]

The functions COUNT, MIN, MAX, AVG, and SUM can only be used as Aggregate functions.

# average age of painters by century
SELECT ?century (AVG(?age) AS ?average_age) (ROUND(AVG(?age)) AS ?rounded_average_age)
WHERE 
{
  ?item wdt:P31 wd:Q5.          # is a human
  ?item wdt:P106 wd:Q1028181.   # occupation painter
  ?item wdt:P569 ?born.
  ?item wdt:P570 ?died.
  FILTER( ?died > ?born ).
  BIND( (?died - ?born)/365.2425 AS ?age )
  BIND( FLOOR(YEAR(?born)/100)*100 AS ?century ) 
}
GROUP BY ?century
ORDER BY ?century

Try it!

For instance BIND( MAX( ?var1, ?var2 ) AS ?max) does NOT work. Instead of that you could use the expression like BIND( IF( ?var1>?var2, ?var1, ?var2 ) AS ?max).

Functions on Dates and Times[edit]

NOW[edit]

The function NOW( ) returns the date and time of today.

YEAR, MONTH and DAY[edit]

The function YEAR( datevalue ) returns the year of the date value. The functions MONTH and DAY returns the month or day.

HOURS, MINUTES and SECONDS[edit]

The function HOUR( datevalue ) returns the hour of the date value. The functions MINUTES and SECONDS returns the minutes or seconds.
Currently Wikidata does not hold date values in hour, minutes or seconds.

TIMEZONE and TZ[edit]

The function TIMEZONE( datevalue ) returns the time zone of the date value. Currently in Wikidata all dates have a TIMEZONE of "PT0S" for UTC. Other valid values would be between -PT14H and PT14H indicating the time offset in hours.

The function TZ( datevalue ) returns the time zone as a simple literal of the date value. Currently in Wikidata all dates have a TZ of "Z" for UTC.

Example with dates

# Query to find all musicians who have already died 
# calculate their age (full years) at death
# count how many of them died at each age
#defaultView:LineChart
SELECT ?age  (COUNT (DISTINCT ?a) AS ?count) 
WHERE {
        ?a wdt:P31 wd:Q5.                   #instance of human
        ?a wdt:P106/wdt:P279 wd:Q639669.    #occupation a subclass of musician
        ?a p:P569/psv:P569 ?birth_date_node. 
        ?a p:P570/psv:P570 ?death_date_node.
        ?birth_date_node wikibase:timeValue ?birth_date.
        ?death_date_node wikibase:timeValue ?death_date.
        BIND( YEAR(?death_date) - YEAR(?birth_date) - 
              IF(MONTH(?death_date)<MONTH(?birth_date) || 
                 (MONTH(?death_date)=MONTH(?birth_date) && DAY(?death_date)<DAY(?birth_date)),1,0) AS ?age )
        # calculate the age, precisely to the day (times and timezones ignored)
        FILTER(?age > 10 && ?age < 100).    #ignore outlyers, several of which are probably errors
}
GROUP BY ?age
ORDER BY ?age

Try it!

Functions on Coordinates[edit]

geof:distance[edit]

The function geof:distance returns distance between two points, in kilometers.

Example usage:

# distance between 2 cities
SELECT ?city1 ?city1Label ?location1 ?city2 ?city2Label ?location2 ?dist 
WHERE
{
     VALUES ?city1 { wd:Q84 }. # London 
     VALUES ?city2 { wd:Q90 }. # Paris
     ?city1 wdt:P625 ?location1.
     ?city2 wdt:P625 ?location2.
     BIND(geof:distance(?location1, ?location2) as ?dist) 
     SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }  
}

Try it!

Longitude and Latitude[edit]

There are no functions to retrieve Longitude and Latitude from a coordinate value. They can however be retrieved from a coordinate node. See here for an explanation.

# Museums in Barcelona with coordinates
SELECT ?item ?itemLabel ?coord ?lon ?lat
WHERE
{
   ?item wdt:P131 wd:Q1492;   # in the administrative territory of Barcelona
         wdt:P31 wd:Q33506;   # is a museum
         p:P625 [
            ps:P625 ?coord;
            psv:P625 [
               wikibase:geoLongitude ?lon;
               wikibase:geoLatitude  ?lat; 
               ]
          ].
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 
}

Try it!

References[edit]


Variables SPARQL
Expressions and Functions
Federated query