SPARQL/WIKIDATA Lexicographical data

From Wikibooks, open books for an open world
Jump to navigation Jump to search
WIKIDATA Precision, Units and Coordinates SPARQL
WIKIDATA Lexicographical data
Views

The data on WIKIDATA contains more info than only triples with concepts: Q-items are related to a thing or an idea,. Since 2018, Wikidata has also stored a new type of data: words, phrases and sentences, in many languages, described in many languages. This information is stored in new types of entities, called Lexemes (L), Forms (F) and Senses (S).

Comment This chapter is not yet complete. Please help expand this.

Glossary SPARQL code

A Lexeme is a lexical element of a language, such as a word, a phrase, or a prefix (see Lexeme on Wikipedia). Lexemes are Entities in the sense of the Wikibase data model. A Lexeme is described using the following information:

  • An ID. Lexemes have IDs starting with an "L" followed by a natural number in decimal notation, e.g. L3746552. These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Lexeme.
  • A Lemma for use as a human readable representation of the lexeme, e.g. "run".
  • The Language to which the lexeme belongs. This is a reference to a concrete Item, e.g. English (Q1860).
  • The Lexical category to which the lexeme belongs. This is given as a reference to a concrete Item, e.g. adjective (Q34698).
  • A list of Lexeme Statements to describe properties of the lexeme that are not specific to a Form or Sense (e.g. derived from or grammatical gender or syntactic function)

?l a ontolex:LexicalEntry .
?l wikibase:lemma ?word .
?l dct:language wd:Q1860 . # English
?l wikibase:lexicalCategory ?category .

  • A list of Forms, typically one for each relevant combination of grammatical features, such as 2nd person / singular / past tense. A Form is described using the following information:
    • An ID. Forms have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "F", followed by a natural number in decimal notation: e.g. L3746552-F7
    • A representation, spelling out the Form as a string.
    • A list of grammatical features that define for which syntactic role the given form applies. These are given as references to a concrete Items, e.g. participle (Q814722) for participle.
    • A list of Form Statements further describing the Form or its relations to other Forms or Items (e.g. IPA transcription (P898), pronunciation audio, rhymes with, used until, used in region)

?l ontolex:lexicalForm ?form .
?form ontolex:representation ?word .
?form wikibase:grammaticalFeature ?feat .

  • A list of Senses, describing the different meanings of the lexeme (e.g. "financial institution" and "edge of a body of water" for the English noun bank). A sense is described using the following information:
    • An ID. Senses have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "S", followed by a natural number in decimal notation: e.g. L3746552-S4. These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Sense.
    • A Gloss, defining the meaning of the Sense using natural language.
    • A list of Sense Statements further describing the Sense and its relations to Senses and Items (e.g. translation, synonym, antonym, connotation, register, denotes, evokes).

?l ontolex:sense ?sense .
?sense skos:definition ?gloss .

Prefixes[edit]

Prefixes used only for Lexicograpical data are:

PREFIX ontolex: <http://www.w3.org/ns/lemon/ontolex#>
PREFIX dct: <http://purl.org/dc/terms/>

Examples[edit]

Here as example a list of the longest words in English

SELECT DISTINCT ?l ?word ?len WHERE {
 {
   ?l a ontolex:LexicalEntry ; dct:language wd:Q1860 ; wikibase:lemma ?word
   BIND(strlen(?word) as ?len)  
  } UNION {
   ?l a ontolex:LexicalEntry ; dct:language wd:Q1860 ; ontolex:lexicalForm/ontolex:representation ?word
   BIND(strlen(?word) as ?len)  
  }
} 
order by DESC(?len) 
LIMIT 20

Try it!

This example shows (English) adjectives and their positive, comparative and superlative degrees. By changing VALUES ?language { "en" } this query can be changed into any language.

# adjectives
SELECT ?l (GROUP_CONCAT(DISTINCT ?subfeatLabel; SEPARATOR=", ") AS ?subfeatures) ?positive ?comparative ?superlative 
WHERE {
   VALUES ?language { "en" }
  
   ?l a ontolex:LexicalEntry ; wikibase:lemma ?word; wikibase:lexicalCategory ?category .
   FILTER(?category = wd:Q34698 ) # adjective
   FILTER(LANG(?word) = ?language)

OPTIONAL { 
   ?l ontolex:lexicalForm ?form1 .
   ?form1 ontolex:representation ?positive ; wikibase:grammaticalFeature ?feat1 .
   FILTER(?feat1 = wd:Q3482678 ) # positive
   FILTER(LANG(?positive) = ?language )
   OPTIONAL { ?form1 wikibase:grammaticalFeature ?subfeat . FILTER(?subfeat != ?feat1 ) } 
   }
OPTIONAL { 
   ?l ontolex:lexicalForm ?form2 .
   ?form2 ontolex:representation ?comparative ; wikibase:grammaticalFeature ?feat2 .
   FILTER(?feat2 = wd:Q14169499 ) # comparative
   FILTER(LANG(?comparative) = ?language )
   OPTIONAL { ?form2 wikibase:grammaticalFeature ?subfeat . FILTER(?subfeat != ?feat2 ) } 
   }
OPTIONAL { 
   ?l ontolex:lexicalForm ?form3 .
   ?form3 ontolex:representation ?superlative ; wikibase:grammaticalFeature ?feat3 .
   FILTER(?feat3 = wd:Q1817208 ) # superlative
   FILTER(LANG(?superlative) = ?language )        
   OPTIONAL { ?form3 wikibase:grammaticalFeature ?subfeat . FILTER(?subfeat != ?feat3 ) } 
   }

   # use ?word if ?positive is blank
   BIND(IF(BOUND(?positive),?positive,?word) AS ?positive).
  
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". 
                           ?subfeat rdfs:label ?subfeatLabel.
                          }
}
GROUP BY ?l ?positive ?comparative ?superlative
ORDER BY ?positive ?comparative ?superlative
LIMIT 2000

Try it!

This example shows (English) verbs and their conjugations. This query is very complex because conjugations in Wikidata are modeled very complex. By changing VALUES ?language { "en" } this query can be changed into any language. Currently only a few verbs are conjugated.

# verbs
SELECT ?l ?word (GROUP_CONCAT(DISTINCT ?subfeatLabel; SEPARATOR=", ") AS ?subfeatures) 
          ?single1 ?single2 ?single3 ?plural1 ?plural2 ?plural3
WHERE {
   VALUES ?language { "en" }
  
   ?l a ontolex:LexicalEntry ; wikibase:lemma ?word; wikibase:lexicalCategory ?category .
   FILTER(?category = wd:Q24905 ) # verb
   FILTER(LANG(?word) = ?language)

OPTIONAL { 
   ?l ontolex:lexicalForm ?form1 .
   { ?form1 ontolex:representation ?single1 ; wikibase:grammaticalFeature wd:Q51929218 .  # first-person singular
   } UNION 
   { ?form1 ontolex:representation ?single1 ; wikibase:grammaticalFeature wd:Q21714344 .  # first person
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q146786 .   }               # without plural
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q51929154 . }               # without plural person
   } UNION
   { ?form1 ontolex:representation ?single1 ; wikibase:grammaticalFeature wd:Q51929131 .  # singular person
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person 
   } UNION
   { ?form1 ontolex:representation ?single1 ; wikibase:grammaticalFeature wd:Q110786 .    # singular
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form1 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person
   }
   FILTER(LANG(?single1) = ?language )
   OPTIONAL { ?form1 wikibase:grammaticalFeature ?subfeat . 
             FILTER(?subfeat != wd:Q51929218 && ?subfeat != wd:Q21714344 )   # not first-person singular / first person
             FILTER(?subfeat != wd:Q51929131 && ?subfeat != wd:Q110786 )     # not singular person / singular
             FILTER(?subfeat != wd:Q51929049 && ?subfeat != wd:Q51929074 ) } # not second person / third person
   }  
OPTIONAL { 
   ?l ontolex:lexicalForm ?form2 .
   { ?form2 ontolex:representation ?single2 ; wikibase:grammaticalFeature wd:Q51929369 .  # second-person singular
   } UNION 
   { ?form2 ontolex:representation ?single2 ; wikibase:grammaticalFeature wd:Q51929049 .  # second person
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q146786 .   }               # without plural
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q51929154 . }               # without plural person
   } UNION
   { ?form2 ontolex:representation ?single2 ; wikibase:grammaticalFeature wd:Q51929131 .  # singular person
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person 
   } UNION
   { ?form2 ontolex:representation ?single2 ; wikibase:grammaticalFeature wd:Q110786 .    # singular
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form2 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person
   }
   FILTER(LANG(?single2) = ?language )
   OPTIONAL { ?form2 wikibase:grammaticalFeature ?subfeat . 
             FILTER(?subfeat != wd:Q51929369 && ?subfeat != wd:Q51929049 )   # not second-person singular / second person
             FILTER(?subfeat != wd:Q51929131 && ?subfeat != wd:Q110786 )     # not singular person / singular
             FILTER(?subfeat != wd:Q21714344 && ?subfeat != wd:Q51929074 ) } # not first person / third person
   }
OPTIONAL { 
   ?l ontolex:lexicalForm ?form3 .
   { ?form3 ontolex:representation ?single3 ; wikibase:grammaticalFeature wd:Q51929447 .  # third-person singular
   } UNION 
   { ?form3 ontolex:representation ?single3 ; wikibase:grammaticalFeature wd:Q51929074 .  # third person
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q146786 .   }               # without plural
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q51929154 . }               # without plural person
   } UNION
   { ?form3 ontolex:representation ?single3 ; wikibase:grammaticalFeature wd:Q51929131 .  # singular person
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person 
   } UNION
   { ?form3 ontolex:representation ?single3 ; wikibase:grammaticalFeature wd:Q110786 .    # singular
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form3 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person
   }
   FILTER(LANG(?single3) = ?language )        
   OPTIONAL { ?form3 wikibase:grammaticalFeature ?subfeat . 
             FILTER(?subfeat != wd:Q51929447 && ?subfeat != wd:Q51929074 )   # not third-person singular / third person
             FILTER(?subfeat != wd:Q51929131 && ?subfeat != wd:Q110786 )     # not singular person / singular
             FILTER(?subfeat != wd:Q21714344 && ?subfeat != wd:Q51929049 ) } # not first person / second person
   }
OPTIONAL { 
   ?l ontolex:lexicalForm ?form4 .
   { ?form4 ontolex:representation ?plural1 ; wikibase:grammaticalFeature wd:Q51929290 .  # first-person plural
   } UNION 
   { ?form4 ontolex:representation ?plural1 ; wikibase:grammaticalFeature wd:Q21714344 .  # first person
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q110786 . }                 # without singular
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q51929131 . }               # without singular person
   } UNION
   { ?form4 ontolex:representation ?plural1 ; wikibase:grammaticalFeature wd:Q51929154 .  # plural person
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person
   } UNION
   { ?form4 ontolex:representation ?plural1 ; wikibase:grammaticalFeature wd:Q146786 .    # plural
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q21714344 . }               # without first person
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q51929049 . }               # without second person
     FILTER NOT EXISTS{ ?form4 wikibase:grammaticalFeature wd:Q51929074 . }               # without third person
   }
   FILTER(LANG(?plural1) = ?language )
   OPTIONAL { ?form4 wikibase:grammaticalFeature ?subfeat . 
             FILTER(?subfeat != wd:Q51929290 && ?subfeat != wd:Q21714344 )     # not first-person plural / first person
             FILTER(?subfeat != wd:Q51929154 && ?subfeat != wd:Q146786 )       # not plural person / plural
             FILTER(?subfeat != wd:Q51929049 && ?subfeat != wd:Q51929074 ) }   # not second person / third person
   }
OPTIONAL { 
   ?l ontolex:lexicalForm ?form5 .
   { ?form5 ontolex:representation ?plural2 ; wikibase:grammaticalFeature wd:Q51929403 . # second-person plural
   } UNION 
   { ?form5 ontolex:representation ?plural2 ; wikibase:grammaticalFeature wd:Q51929049 . # second person
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q110786 . }                # without singular
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q51929131 . }              # without singular person
   } UNION
   { ?form5 ontolex:representation ?plural2 ; wikibase:grammaticalFeature wd:Q51929154 . # plural person
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q21714344 . }              # without first person
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q51929049 . }              # without second person
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q51929074 . }              # without third person
   } UNION
   { ?form5 ontolex:representation ?plural2 ; wikibase:grammaticalFeature wd:Q146786 .   # plural
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q21714344 . }              # without first person
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q51929049 . }              # without second person
     FILTER NOT EXISTS{ ?form5 wikibase:grammaticalFeature wd:Q51929074 . }              # without third person
   }
   FILTER(LANG(?plural2) = ?language )
   OPTIONAL { ?form5 wikibase:grammaticalFeature ?subfeat . 
             FILTER(?subfeat != wd:Q51929403 && ?subfeat != wd:Q51929049 )     # not second-person plural / second person
             FILTER(?subfeat != wd:Q51929154 && ?subfeat != wd:Q146786 )       # not plural person / plural
             FILTER(?subfeat!= wd:Q21714344 && ?subfeat != wd:Q51929074 ) }    # not first person / third person
   }
OPTIONAL { 
   ?l ontolex:lexicalForm ?form6 .
   { ?form6 ontolex:representation ?plural3 ; wikibase:grammaticalFeature wd:Q51929517 . # third-person plural
   } UNION 
   { ?form6 ontolex:representation ?plural3 ; wikibase:grammaticalFeature wd:Q51929074 . # third person
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q110786 . }                # without singular
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q51929131 . }              # without singular person
   } UNION
   { ?form6 ontolex:representation ?plural3 ; wikibase:grammaticalFeature wd:Q51929154 . # plural person
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q21714344 . }              # without first person
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q51929049 . }              # without second person
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q51929074 . }              # without third person
   } UNION
   { ?form6 ontolex:representation ?plural3 ; wikibase:grammaticalFeature wd:Q146786 .   # plural
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q21714344 . }              # without first person
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q51929049 . }              # without second person
     FILTER NOT EXISTS{ ?form6 wikibase:grammaticalFeature wd:Q51929074 . }              # without third person
   }
   FILTER(LANG(?plural3) = ?language )        
   OPTIONAL { ?form6 wikibase:grammaticalFeature ?subfeat . 
             FILTER(?subfeat != wd:Q51929517 && ?subfeat != wd:Q51929074 )     # not third-person plural / third person
             FILTER(?subfeat != wd:Q51929154 && ?subfeat != wd:Q146786 )       # not plural person / plural
             FILTER(?subfeat != wd:Q21714344 && ?subfeat != wd:Q51929049 ) }   # not first person / second person
   }
  
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". 
                           ?subfeat rdfs:label ?subfeatLabel.
                          }
}
GROUP BY ?l ?word ?single1 ?single2 ?single3 ?plural1 ?plural2 ?plural3
ORDER BY ?word ?single1 ?single2 ?single3 ?plural1 ?plural2 ?plural3
LIMIT 20000

Try it!

References[edit]


WIKIDATA Precision, Units and Coordinates SPARQL
WIKIDATA Lexicographical data
Views