XQuery/SPARQLing Country Calling Codes
Motivation
[edit | edit source]Stimulated by Henry Story's blog entry, the following script works on the same problem. This script uses the functions defined in previous module to execute a SPARQL query on the dbpedia server, and to convert SPARQL Query results to tuples.
First attempt
[edit | edit source]import module namespace fr="http://www.cems.uwe.ac.uk/wiki/fr" at "fr.xqm";
declare variable $query := "
PREFIX : <http://dbpedia.org/resource/>
PREFIX p: <http://dbpedia.org/property/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
?resource p:callingCode ?callingCode.
}
";
declare option exist:serialize "method=xhtml media-type=text/html";
<html>
<head>
<title>Country Calling codes</title>
</head>
<body>
<h1>Country Calling codes</h1>
<table border="1">
{ for $country in fr:sparql-to-tuples(fr:execute-sparql($query))
let $name := fr:clean($country/resource)
order by $name
return
<tr>
<td><a href="{$country/resource}">{$name}</a></td>
<td>{$country/callingCode}</td>
</tr>
}
</table>
</body>
</html>
In this script the resource uri is parsed to get the local name part of the resource URI in the fr:clean() function.
The more sound alternative is to filter the multilingual rdfs:label property:
SELECT * WHERE { ?resource p:callingCode ?callingCode. ?resource rdfs:label ?name. FILTER (lang(?name) = 'en') }
but this query is naturally much slower.
Discussion
[edit | edit source]This query returns a set of dbpedia resources which have a callingCode property. However, it includes resources which are not countries and it proves quite difficult to identify which resources are countries. It might be expected that either the skos:subject or rdfs:type predicates would identify countries, but this is not the case.
Of course, what entities are classified as countries is a debatable issue, as is currently illustrated by Kosova and by the documentation on ISO 3166. Perhaps countries are better identified by properties. There is a property countryCode which looks promising:
The SPARQL query becomes:
PREFIX : <http://dbpedia.org/resource/> PREFIX p: <http://dbpedia.org/property/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT * WHERE { ?resource p:callingCode ?callingCode. ?resource p:countryCode ?countryCode. }
However this shows that many countries have incomplete data in dbpedia, or that the coding of this property is inconsistent. This is not surprising because there are a number of types of country codes, which result in different definitions of country:
- ISO 3166-1 alpha-3
- ISO 3166-1 alpha-2
- ISO 3166-1 numeric
- IOC country codes
- License plate numbers
- Top-level domain codes
Wikipedia scraping
[edit | edit source]In fact, International Calling codes are listed in a wikipedia entry Thus a more direct approach would be to generate the table by scraping wikipedia directly. However, now we err in the opposite direction, in that there are calling codes for telecom services as well as countries, and the format of numbers and names is inconsistent - some multiple numbers, some numbers with leading + , some countries with appended synonyms etc.
In this script, the path expression finds the anchor "Alphabetical_Listing" and then finds the following table.
declare namespace h= "http://www.w3.org/1999/xhtml" ;
let $url := "http://en.wikipedia.org/wiki/International_calling_codes"
let $wikipage := doc($url)
let $section := $wikipage//h:table[@class="wikitable sortable"][2]
return
$section
Jan 2010 - the page layout had changed so that the previous path to this table :
let $section := $wikipage//h:a[@name="Alphabetical_Listing"]/../following-sibling::h:table[1]
to the current :
let $section := $wikipage//h:table[@class="wikitable sortable"][2]
Export as RDF
[edit | edit source]An alternative is to export this table as RDF. Here the resource is the dbpedia resource and the property is defined in the dbpedia property namespace.
declare namespace h= "http://www.w3.org/1999/xhtml" ;
declare namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
declare namespace p = "http://dbpedia.org/property/";
let $url := "http://en.wikipedia.org/wiki/International_calling_codes"
let $wikipage := doc($url)
let $section := $wikipage//h:table[@class="wikitable sortable"][2]
return
<rdf:RDF xmlns:p = "http://dbpedia.org/property/">
{for $row in $section/h:tr[h:td]
let $country := string($row/h:td[1])
let $code := string($row/h:td[2]/h:a[1])
let $code := replace($code,"\*","")
let $resource := concat("http://dbpedia.org/resource/", replace($country," ","_"))
return
<rdf:Description rdf:about="{$resource}">
<p:internationalcallingCode>{$code}</p:internationalcallingCode>
</rdf:Description>
}
</rdf:RDF>
Similarly the structure of this table changed so this code needed to be updated.