SPARQL/Basics

From Wikibooks, open books for an open world
Jump to navigation Jump to search

SPARQL may look complicated, but the simple basics will already get you a long way – if you want, you can stop reading after this chapter, and you’ll already know enough to write many interesting queries. The other chapters just add information about more topics that you can use to write different queries. Each of them will empower you to write even more awesome queries, but none of them are necessary – you can stop reading at any point and hopefully still walk away with a lot of useful knowledge!

Also, if you’ve never heard of Wikidata, SPARQL, or WDQS before, here’s a short explanation of those terms:

SPARQL basics[edit | edit source]

A simple SPARQL query looks like this:

SELECT ?a ?b ?c
WHERE
{
  x y ?a.
  m n ?b.
  ?b f ?c.
}

The SELECT clause lists variables that you want returned (variables start with a question mark), and the WHERE clause contains restrictions on them, mostly in the form of triples. All information in Wikidata (and similar knowledge databases) is stored in the form of triples; when you run the query, the query service tries to fill in the variables with actual values so that the resulting triples appear in the knowledge database, and returns one result for each combination of variables it finds.

A triple can be read like a sentence (which is why it ends with a period), with a subject, a predicate, and an object:

SELECT ?fruit
WHERE
{
  ?fruit hasColor yellow.
  ?fruit tastes sour.
}

The results for this query could include, for example, “lemon”. In Wikidata, most properties are “has”-kind properties, so the query might instead read:

SELECT ?fruit
WHERE
{
  ?fruit color yellow.
  ?fruit taste sour.
}

which reads like “?fruit has color ‘yellow’” (not?fruit is the color of ‘yellow’” – keep this in mind for property pairs like “parent”/“child”!).

However, that’s not a good example for WDQS. Taste is subjective, so Wikidata doesn’t have a property for it. Instead, let’s think about parent/child relationships, which are mostly unambiguous.

Our first query[edit | edit source]

Suppose we want to list all children of the baroque composer Johann Sebastian Bach. Using pseudo-elements like in the queries above, how would you write that query?

Hopefully you got something like this:

SELECT ?child
WHERE
{
  # either this...
  ?child parent Bach.
  # or this...
  ?child father Bach.
  # or this.
  Bach child ?child.
  # (note: everything after a ‘#’ is a comment and ignored by WDQS.)
}

The first two triples say that the ?child must have the parent/father Bach; the third says that Bach must have the child ?child. Let’s go with the second one for now.

So what remains to be done in order to turn this into a proper WDQS query? On Wikidata, items and properties are not identified by human-readable names like “father” (property) or “Bach” (item). (For good reason: “Johann Sebastian Bach” is also the name of a German painter, and “Bach” might also refer to the surname, the French commune, the Mercury crater, etc.) Instead, Wikidata items and properties are assigned an identifier. To find the identifier for an item, we search for the item and copy the Q-number of the result that sounds like it’s the item we’re looking for (based on the description, for example). To find the identifier for a property, we do the same, but search for “P:search term” instead of just “search term”, which limits the search to properties. This tells us that the famous composer Johann Sebastian Bach is Q1339, and the property to designate an item’s father is P22.

And last but not least, we need to include prefixes. For simple WDQS triples, items should be prefixed with wd:, and properties with wdt:. (But this only applies to fixed values – variables don’t get a prefix!)

Putting this together, we arrive at our first proper WDQS query:

SELECT ?child
WHERE
{
# ?child  father   Bach
  ?child wdt:P22 wd:Q1339.
}

Try it!

Click that “Try it” link, then “Run” the query on the WDQS page. What do you get?

child
wd:Q57225
wd:Q76428

Well that’s disappointing. You just see the identifiers. You can click on them to see their Wikidata page (including a human-readable label), but isn’t there a better way to see the results?

Well, as it happens, there is! (Aren’t rhetorical questions great?) If you include the magic text

SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

somewhere within the WHERE clause, you get additional variables: For every variable ?foo in your query, you now also have a variable ?fooLabel, which contains the label of the item behind ?foo. If you add this to the SELECT clause, you get the item as well as its label:

SELECT ?child ?childLabel
WHERE
{
# ?child  father   Bach
  ?child wdt:P22 wd:Q1339.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Try running that query – you should see not only the item numbers, but also the names of the various children.

child childLabel
wd:Q57225 Johann Christoph Friedrich Bach
wd:Q76428 Carl Philipp Emanuel Bach

This completes the basics. Try amending this by varying the properties.

References[edit | edit source]