XQuery/DBpedia with SPARQL - Football teams

From Wikibooks, open books for an open world
Jump to: navigation, search

DBpedia is a project to convert the contents of Wikipedia to RDF so that it can be linked into other datasets to add to the Semantic Web. A w:SPARQL endpoint is provided to query this database.


SPARQL to KML[edit]

This application uses DBpedia to create a kml file showing the birth places of the members of a selected UK Football team. Data quality is limited by a number of factors:

  • the age of the Wikipedia extract on which DBpedia is based
  • the existence or non-existence of individual pages in Wikipedia for players
  • the consistency of property labeling on Wikipedia infoboxes


SPARQL Query[edit]

declare variable  $query := "
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX p: <http://dbpedia.org/property/>          
SELECT * WHERE { 
      ?player p:currentclub  <http://dbpedia.org/resource/Arsenal_F.C.>.
      OPTIONAL {?player p:cityofbirth ?city}.
      OPTIONAL {?player p:dateOfBirth ?dob}.
      OPTIONAL {?player p:clubnumber ?no}.
      OPTIONAL {?player p:position ?position}.
      OPTIONAL {?player p:image ?image}.
      OPTIONAL {
               { ?city geo:long ?long. }
                 UNION
               { ?city p:redirect ?city2.
                 ?city2 geo:long ?long.
               }.
        }.
       OPTIONAL {
               { ?city geo:lat ?lat.}
                 UNION
               { ?city p:redirect ?city3.
                 ?city3 geo:lat ?lat.
               }.
         }.
   }
";

This query is complicated by the need to handle possible redirection of the city name - (can this be improved - this is a generic problem?). To obtain more complete data, the query should also handle the multiple synonyms used for place and date of birth

Changes to dbpedia lead to a short life for queries based om the data-model and vocabulary. As of Jan 2011, the query is being updated. Currently to get locations and birthdates for the current players at Arsenal, the following query seems to work.

PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX p: <http://dbpedia.org/property/>  
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> 
          
SELECT * WHERE { 
      <http://dbpedia.org/resource/Arsenal_F.C.> p:name ?player.
      ?player dbpedia-owl:birthPlace ?city;
              dbpedia-owl:birthDate ?dob.
      ?city geo:long ?long;
            geo:lat ?lat.
 }

However this yields multiple geocoded locations, of which it can be assumed that the first is most specific (but not possible ? to filter in SPARQL).

DBpedia Query[edit]

The prototype SPARQL query is targeted on Arsenal_F.C. This team name needs to be replaced by the supplied team name, the query then URI-encoded and passed to the DBpedia SPARQL endpoint.

let $club := request:get-parameter ("club","Arsenal_F.C.")
let $queryx := replace($query,"Arsenal_F.C.",$club)

Aside: Initially, the query was written with a generic placeholder ($team) rather than a protypical value (Arsenal_F.C.). The prototype idiom has the benefit of providing an executable SPARQL query without editing, is more expressive and less tricky - the $ in $team needs escaping in the replace expression since the second argument is a regular expression.


Execute the SPARQL Query[edit]

This query uses the SPARQL endpoint provided by the Virtuoso engine. The format of the result is defined to be XML i.e. the SPARQL Query Result format. A function tidies up the interface:

declare function local:execute-sparql($query as xs:string) {
  let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=",
                   encode-for-uri($query) 
                 )
  return doc($sparql)
};

DBpedia Result[edit]

The result is in SPARQL Query Results XML format. It is more convenient to convert this generic format to tuples with named elements for later processing.


declare namespace r = "http://www.w3.org/2005/sparql-results#";

declare function local:sparql-to-tuples($rdfxml ) {
   for $result in $rdfxml//r:result
   return
     <tuple>
            { for $binding  in $result/r:binding
               return                
                 if ($binding/r:uri)
                 then   element {$binding/@name}  {
                           attribute type  {"uri"} , 
                           string($binding/r:uri) 
                         }
                 else   element {$binding/@name}  {
                           attribute type {$binding/r:literal/@datatype},
                           string($binding/r:literal)
                        }
             }
      </tuple>
 };


Query to Tuples[edit]


let $result:= local:execute-sparql($queryx)
let $tuples := local:sparql-to-tuples($result)

KML output[edit]

Since we are generating kml, we need to set the media type and file name and create a Document node - in the appropriate places in the script:

declare option exist:serialize  "method=xhtml media-type=application/vnd.google-earth.kml+xml highlight-matches=none"; 


let $x := response:set-header('Content-disposition',concat('Content-disposition: inline;filename=',$team,'.kml;'))


return
<Document>
   <name>Birthplaces of players  in the  {$team} squad</name> 
   <Style id="player">
       <IconStyle>
          <Icon><href>http://maps.google.com/mapfiles/kml/pal2/icon49.png</href>
        </Icon>
       </IconStyle>
    </Style>
   .....

</Document>

The icon is a stock GoogleEarth footballer icon.

Document construction[edit]

Due to the multiple values for some of the properties, for example cityofbirth is often expressed as an address path, there are multiple tuples for each player. These need grouping and compressing. Here we use the XQuery idiom which uses distinct-values to get a set of player names, and then accesses groups of rows with the name as the key. This scripts takes a simplistic approach of using only the first of multiple tuples which contains a latitude , pending a better resolution of the multiple cityofbirth values.

We are only interested in players whose place of birth has been geo-coded, so we filter for tuples with a latitude element:

    {
     for $playername in distinct-values($tuples[lat]/player)
     let $player := $tuples[player=$playername][lat][1]

Data cleanup[edit]

The wikiPedia data needs some clean-up before being usable in the kml. A generic clean function decodes the uri-encoded characters, removes some irrelevant text and replaces underscores with spaces. ( this hack needs improving )


declare function local:clean($text) {
    let $text:= util:unescape-uri($text,"UTF-8")
    let $text := replace($text,"http://dbpedia.org/resource/","")
    let $text := replace($text,"\(.*\)","")
    let $text := replace($text,"Football__positions#","")
    let $text := replace($text,"#",",")
    let $text := replace($text,"_"," ")
    return $text
};
     let $name := local:clean($player/player)
     let $city :=local:clean($player/city)
     let $position := local:clean($player/position)

Data typing[edit]

The date of birth is in the form xs:date, but is optional. If the value is a valid date, it is converted to a more readable form using an eXist function:

 
     let $dob := if ($player/dob castable as xs:date) then datetime:format-date(xs:date($player/dob),"dd MMM, yyyy" ) else ""

Similarly for the position number which should be an xs:integer:

     let $no := if ($player/no castable as xs:integer) then concat(" [# ", xs:integer($player/no) ,"] ") else ""

The latitude and longitude should be xs:decimal. Since sometimes several players in a team come from the same place, the mapped positions are dithered a little.

 
     let $lat :=xs:decimal($player/lat) + (math:random() - 0.5)* 0.01
     let $long :=xs:decimal($player/long) + (math:random() - 0.5)* 0.01

Placemark Construction[edit]

The body of the Placemark description will contain XHTML markup to display an image if there is one and to link to the DBpedia page. The XML needs to be serialised to a string for GoogleMap to render the description in a pop-up:

     let $description := 
         <div>
          {concat ($position, $no, " born ", $dob, " in ", $city)}
          <div>
              <a href="{$player/player}">DBpedia</a>
                
              <a href="http://images.google.co.uk/images?q={$name}">Google Images</a>
          </div>
          {if ($player/image !="") 
           then <div><img src="{$player/image}"  height="200"/>  </div> 
           else ()
          }       
       </div>
     order by $name
     return 
     <Placemark>
        <name>{$name}</name>
       <description>
        {util:serialize($description,"method=xhtml")}
         </description>
       <Point> 
        <coordinates>{concat($long, ",",$lat,",0")}</coordinates>
       </Point>
       <styleUrl>#player</styleUrl>
     </Placemark>
   } 

Execute[edit]

Map of Arsenal players:


Note that the q parameter is URI-encoded.

Complete Script[edit]

(:  generate a sparql query on the dbpedia server
  This takes a team name  and generates a kml file showing the birth place of the players

:)
declare namespace r = "http://www.w3.org/2005/sparql-results#";
declare variable  $query := "
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX : <http://dbpedia.org/resource/>
PREFIX p: <http://dbpedia.org/property/>          
SELECT * WHERE { 
      ?player p:currentclub  <http://dbpedia.org/resource/Arsenal_F.C.>.
      OPTIONAL {?player p:cityofbirth ?city}. 
      OPTIONAL {?player p:birth ?dob}.
      OPTIONAL {?player p:clubnumber ?no}.
      OPTIONAL {?player p:position ?position}.
      OPTIONAL {?player p:image ?image}.
      OPTIONAL {
               { ?city geo:long ?long. }
                 UNION
               { ?city p:redirect ?city2.
                 ?city2 geo:long ?long.
               }.
        }.
       OPTIONAL {
               { ?city geo:lat ?lat.}
                 UNION
               { ?city p:redirect ?city3.
                 ?city3 geo:lat ?lat.
               }.
         }.
   }
";

declare function local:execute-sparql($query as xs:string) {
  let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=",
                   encode-for-uri($query)  
                 )
  return  doc($sparql)
};

declare function local:sparql-to-tuples($rdfxml ) {
   for $result in $rdfxml//r:result
   return
     <tuple>
            { for $binding  in $result/r:binding
               return                
                 if ($binding/r:uri)
                     then   element {$binding/@name}  {
                                    attribute type  {"uri"} , 
                                    string($binding/r:uri) 
                                }
                     else   element {$binding/@name}  {
                                    attribute type {$binding/@datatype}, 
                                    string($binding/r:literal)
                               }
             }
      </tuple>
 };


declare function local:clean($text) {
    let $text:= util:unescape-uri($text,"UTF-8")
    let $text := replace($text,"http://dbpedia.org/resource/","")
    let $text := replace($text,"\(.*\)","")
    let $text := replace($text,"Football__positions#","")
    let $text := replace($text,"#",",")
    let $text := replace($text,"_"," ")
    return $text
};

declare option exist:serialize  "method=xhtml media-type=application/vnd.google-earth.kml+xml highlight-matches=none"; 

let $club := request:get-parameter ("club","Arsenal_F.C.")
let $queryx := replace($query,"Arsenal_F.C.",$club)
let $result:= local:execute-sparql($queryx)
let $tuples := local:sparql-to-tuples($result)

let $x := response:set-header('Content-disposition',concat('Content-disposition: inline;filename=',$club,'.kml;'))

return

<Document>
   <name>Birthplaces of   {local:clean($club)} players</name> 
   <Style id="player">
       <IconStyle>
          <Icon><href>http://maps.google.com/mapfiles/kml/pal2/icon49.png</href>
        </Icon>
       </IconStyle>
    </Style>
    {$result}
     {
     for $playername in distinct-values($tuples[lat]/player)
     let $player := $tuples[player=$playername][lat][1]
     let $name := local:clean($player/player)
     let $city :=local:clean($player/city)
     let $position := local:clean($player/position)

     let $dob := if ($player/dob castable as xs:date) then datetime:format-date(xs:date($player/dob),"dd MMM, yyyy" ) else ""
     let $no := if ($player/no castable as xs:integer) then  concat(" [# ", xs:integer($player/no),"] ") else ""
     
     let $lat := if ($player/lat castable as xs:decimal) then  xs:decimal($player/lat) + (math:random() - 0.5)*0.01  else ""
     let $long := if ($player/long castable as xs:decimal) then  xs:decimal($player/long) + (math:random()  -0.5)* 0.01  else ""    
     let $description := 
         <div>
          {concat ($position, $no, " born ", $dob, " in ", $city)}
          <div><a href="{$player/player}">DBpedia</a>
             <a href="http://images.google.co.uk/images?q={$name}">Google Images</a>
          
          </div>
            {if ($player/image !="") then <div><img src="{$player/image}"  height="200"/>  </div> else ()}       
         </div>
      order by $name
     return 
     <Placemark>
        <name>{$name}</name>
       <description>
        {util:serialize($description,"method=xhtml")}
         </description>
       <Point> 
        <coordinates>{concat($long, ",",$lat,",0")}</coordinates>
       </Point>
       <styleUrl>#player</styleUrl>
     </Placemark>
   } 
</Document>

 

Club Index[edit]

We also need an index page, selecting all Clubs in the major English and Scottish leagues. This script follows the same lines as the more complex script above, except that due to the simpler data, the raw SPARQL result is used without transformation.

The index is sorted alphabetically by club name and provides links to the player map and to the base DBpedia data.

XQuery Script[edit]

declare option exist:serialize "method=xhtml media-type=text/html";
declare namespace r = "http://www.w3.org/2005/sparql-results#";

declare variable $query := "
PREFIX : <http://dbpedia.org/resource/>
PREFIX p: <http://dbpedia.org/property/> 
SELECT * WHERE { 
     ?club p:league ?league.
     { ?club  p:league :Premier_League.}
     UNION
     {?club p:league  :Football_League_One.}
     UNION
     {?club p:league  :Football_League_Two.}
     UNION
     {?club p:league  :Scottish_Premier_League.}
      UNION
     {?club p:league  :Football_League_Championship.}
   }
 ";
 
declare function local:execute-sparql($query as xs:string) {
  let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=",escape-uri($query,true())  )
  return doc($sparql)
};

declare function local:clean($string as xs:string)  as xs:string {
    let $string := util:unescape-uri($string,"UTF-8")
    let $string := replace($string,"\(.*\)","")
    let $string := replace($string,"_"," ")
    return $string
};


<html>
<body>
  <h1>England and Scottish Football Clubs</h1>
  <table border="1">
   {  for $tuple in local:execute-sparql($query)//r:result
      let $club := $tuple/r:binding[@name="club"]/r:uri
      let $club :=substring-after($club,"/resource/")
      let $clubx := local:clean($club)
      let $league := $tuple/r:binding[@name="league"]/r:uri
      let $league  := local:clean(substring-after($league,"/resource/"))
      let $mapurl := concat("http://maps.google.co.uk/maps?q=",escape-uri(concat("http://www.cems.uwe.ac.uk/xmlwiki/RDF/club2kml.xq?club=",$club),true()))
      order by $club
      return      
        <tr>
           <td>{$clubx}</td>
           <td>{$league}</td>
           <td><a href="{$mapurl}">Player Map</a></td>
           <td><a href="http://dbpedia.org/resource/{$club}">DBpedia</a></td>
       </tr>
   }
   </table>
 </body>
 </html>

Club Index[edit]

Club Index