XQuery/World Temperature records
Introduction
[edit | edit source]The Met office recently released the temperature records for about 1600 stations world-wide. Each station record is available online as a text file, for example Stornoway.
This case study describes a project to make this data available as XML. The home page is http://www.cems.uwe.ac.uk/xmlwiki/Climate/index.html
Parsing temperature record to XML
[edit | edit source]The first task is to convert the plain text to XML. The main page explains the format of this text file. The code 030260 is the station code defined by the World Meteorological Organisation. It appears that the files are stored in country code directories. (actually these are Blocks in WMO parlance)
Remote data files
[edit | edit source]The task of using HTTP to GET a remote data file is a common task for which functions already exist in an XQuery module.
This module declares a constant used in the parsing:
declare variable $csv:newline:= " ";
And the basic function to get text, which may be plain text or base64encoded:
declare function csv:get-data ($uri as xs:string , $binary as xs:boolean) as xs:string? { (:~ : Get a file via HTTP and convert the body of the HTTP response to text : force the script to get the latest version using the HTTP Pragma header : @param uri - URI of the text file to read : @param binary - true if data is base64 encoded : @return - the body of the response as text or null :) let $headers := element headers { element header {attribute name {"Pragma" }, attribute value {"no-cache"}}} let $response := httpclient:get(xs:anyURI($uri), true(), $headers) return if ($response/@statusCode eq "200") then let $raw := $response/httpclient:body return if ($binary) then util:binary-to-string($raw) else xmldb:decode($raw) else () };
Parsing Function
[edit | edit source]We will create an XQuery module containing functions to carry out the parsing:
module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met";
The csv module needs to be imported:
import module namespace csv = "http://www.cems.uwe.ac.uk/xmlwiki/csv" at "../lib/csv.xqm";
Now the function to parse the MET climate data:
declare function met:station-to-xml ($station as xs:string) as element(TemperatureRecord)? { (:~ : GET and parse a MET office temperature record as documented in : http://www.metoffice.gov.uk/climatechange/science/monitoring/subsets.html : @param the station number : @return the temperature record as an adhoc XML structure matched closely to the terms used in the original record :) let $country := substring($station,1,2) (: this is the directory for all temperature records in a country :) (: construct the URI for the corresponding record :) let $uri := concat("http://www.metoffice.gov.uk/climatechange/science/monitoring/reference/",$country,"/",$station) (:GET and convert to plain text :) let $data := csv:get-data($uri,false()) return if (empty($data)) then () else (: split into two sections :) let $headertext := substring-before($data,"Obs:") (: the first section contains the meta data in the form of name=value statements :) let $headers := tokenize($headertext,$csv:nl) (: the second section is the temperature record, year by year :) let $temperatures := substring-after ($data,"Obs:") let $years := tokenize($temperatures, $csv:nl) return element TemperatureRecord { element sourceURI {$uri}, (: the original temperature record :) for $header in $headers (: split each line into a name and its value :) let $name := replace(substring-before($header,"=")," ","") (: to create a valid XML name, just remove any spaces :) let $value := normalize-space(substring-after ($header,"=")) where $name ne "" return element {$name} { (:create an XML element with the name :) if ($name = ("Normals","Standarddeviations")) (: these names have values which are a list of temperatures :) then for $temp in tokenize($value,"\s+") (: temperatures are space-separated :) return element temp_C {$temp} else if ($name = ("Name","Country")) (: these names contain redundant hyphens :) then replace ($value,"-","") else if ($name = "Long") (: the convention for signing longitudes in this data is the reverse of the usual E +, W - convention :) then - xs:decimal($value) else $value }, for $year in $years let $value := tokenize($year,"\s+") where $year ne "" return element monthlyAverages { attribute year {$value[1]}, (: the first value in the row is the year :) for $i in (2 to 13) (: the remainder are the temperatures for the months Jan to Dec :) let $temp := $value[$i] return element temp_C { if ($temp ne '-99.0') (: generate all months, but those with no reading indicated by -99 will be empty :) then $temp else () } } } };
Main Script
[edit | edit source]The main script uses these functions to convert a given station's record:
(:~ : convert climate file to XML : @param station id of station :) import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; let $station := request:get-parameter("station",()) return local:station-to-xml($station,false())
WMO stations
[edit | edit source]The station ids are based on those defined by the World Meteorological Organisation. There is a full list of all stations available online as a text file with supporting documentation.
A typical record is
00;000;PABL;Buckland, Buckland Airport;AK;United States;4;65-58-56N;161-09-07W;;;7;;
The format of these record is
- Block Number 2 digits representing the WMO-assigned block.
- Station Number 3 digits representing the WMO-assigned station.
- ICAO Location Indicator 4 alphanumeric characters, not all stations in this file have an assigned location indicator. The value "----" is used for stations that do not have an assigned location indicator.
- Place Name Common name of station location.
- State 2 character abbreviation (included for stations located in the United States only).
- Country Name Country name is ISO short English form.
- WMO Region digits 1 through 6 representing the corresponding WMO region, 7 stands for the WMO Antarctic region.
- Station Latitude DD-MM-SSH where DD is degrees, MM is minutes, SS is seconds and H is N for northern hemisphere or S for southern hemisphere. The seconds value is omitted for those stations where the seconds value is unknown.
- Station Longitude DDD-MM-SSH where DDD is degrees, MM is minutes, SS is seconds and H is E for eastern hemisphere or W for western hemisphere. The seconds value is omitted for those stations where the seconds value is unknown.
- Upper Air Latitude DD-MM-SSH where DD is degrees, MM is minutes, SS is seconds and H is N for northern hemisphere or S for southern hemisphere. The seconds value is omitted for those stations where the seconds value is unknown.
- Upper Air Longitude DDD-MM-SSH where DDD is degrees, MM is minutes, SS is seconds and H is E for eastern hemisphere or W for western hemisphere. The seconds value is omitted for those stations where the seconds value is unknown.
- Station Elevation (Ha) The station elevation in meters. Value is omitted if unknown.
- Upper Air Elevation (Hp) The upper air elevation in meters. Value is omitted if unknown.
- RBSN indicator P if station is defined by the WMO as belonging to the Regional Basic Synoptic Network, omitted otherwise.
Conversion to XML
[edit | edit source]A function is needed to convert from the DD-MM-SSH format of latitudes and longitudes. This is complicated by the variations in this format. These variations all appear in the data:
- DD-MMH
- DD-MH
- DD-MM-SH
- DD-MM-SSH
Because this format occurs in other data, it has been added to a general module of geographic functions.
declare function geo:lz ($n as xs:string?) as xs:integer { xs:integer(concat (string-pad("0",2 - string-length($n)),$n)) }; declare function geo:dms-to-decimal($s as xs:string) as xs:decimal { (:~ : @param $s - input string in the format of DD-MMH, DD-MH, DD-MM-SH,* DD-MM-SSH : where H is NSE or W : @return decimal degrees :) let $hemi := substring($s,string-length($s),1) let $rest := substring($s,1, string-length($s)-1) let $f := tokenize($rest,"-") let $deg := geo:lz($f[1]) let $min:= geo:lz($f[2]) let $sec := geo:lz($f[3]) let $dec :=$deg + ($min + $sec div 60) div 60 let $dec := round-half-to-even($dec,6) return if ($hemi = ("S","W")) then - $dec else $dec };
The geo module has to be imported:
import module namespace geo = "http://www.cems.uwe.ac.uk/xmlwiki/geo" at "../lib/geo.xqm";
Parsing the station data.
declare function met:WMO-to-xml ($station as xs:string ) as element (station) { (:~ : @param $station string describing a station : Upper Air data is ignored at present. :) let $f := tokenize(normalize-space($station),";") let $cid := concat($f[1],$f[2],"0") (: this constructs the equivalent id used in the temperature records :) return element station{ element block {$f[1]}, element number {$f[2]}, element id {$cid}, if ($f[3] eq "----") then () else element ICAO {$f[3]}, element placeName {$f[4]}, if ($f[5] ne "") then element state {$f[5]} else (), element country {$f[6]}, element WMORegion {$f[7]}, element latitude {geo:dms-to-dec($f[8])}, element longitude {geo:dms-to-dec($f[9])}, if ($f[12] ne "") then element elevation {$f[12]} else (), if ($f[14] = "P") then element RBSN {} else () } };
Generating the WMO XML file
[edit | edit source]The XQuery script GETs the text file and converts each line to an XML station element. The elements are then inserted into an empty XML file one by one.
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; import module namespace csv = "http://www.cems.uwe.ac.uk/xmlwiki/csv" at "../csv.xqm"; <results> { (: create the empty XML document :) let $login := xmldb:login("/db/Wiki/Climate","user","password") let $store := xmldb:store("/db/Wiki/Climate/Stations","metstations.xml",<stations/>) let $doc := doc($store)/stations (: get the text list of stations and convert :) let $station-list := "http://weather.noaa.gov/data/nsd_bbsss.txt" let $csv := csv:get-data($station-list,false()) for $data in tokenize($csv,$nl) where $station ne "" return let $station := met:WMO-station-to-xml($data) let $update := update insert $station into $doc return <station>{$xml/id}</station> } </results>
Indexing
[edit | edit source]There are 11000 odd stations in total. These need to be indexed for efficient access. In eXist indexes are defined in a configuration file, one per collection (directory). For the collection in which the station XML document is to be written, the configuration file is:
<collection xmlns="http://exist-db.org/collection-config/1.0"> <index> <create qname="id" type="xs:string"/> <create qname="country" type="xs:string"/> </index> </collection>
This means that all XML documents in the collection will be indexed on the qnames id and country wherever these appear in the XML structure. Indexing will be performed when a document is added to the collection or an existing document is updated. A re-index can be forced if required.
If the station data is stored in the collection /db/Wiki/Climate/Stations, this configuration file will be stored in /db/system/config/db/Wiki/Climate/Stations as configuration.xconf
WMO Station set binding
[edit | edit source]Since the code will reference this collection in a number of places, we add a constant to reference the set of stations to the library module:
declare variable $met:WMOStations := doc ("/db/Wiki/Climate/Stations/metstations.xml")//station;
Temperature Station list
[edit | edit source]A full listing of stations is needed to provide an index. This data is not provided as a simple file, but they are encoded on the HTML page as a JavaScript array.
locations[1]=["409380|Afghanistan, Islamic State Of / Afghanistan, Etat Islamique D'|Herat"",409480|Afghanistan, Islamic State Of / Afghanistan, Etat Islamique D'|Kabul Airport","409900|Afghanistan, Islamic State Of / Afghanistan, Etat Islamique D'|Kandahar Airport"]; ...
However there is no location data here, so we will get that from the WMO station list:
The approach taken to converting this to XML was:
- View source on the HTML page
- Locate the station list
- Copy the text
- Save as a text file in the eXIst data base
- A script reads this file and parses it to XML
- The resultant XML is augmented with latitude and longitude from the WMO station data.
- The final XML document is stored in the database in the same Station directory
(:~ : convert the text representation of MET stations from the WMO list to XML :) <stationList> { (: get the raw data from a text file stored as base64 in the eXist dataabse :) let $text := util:binary-to-string(util:binary-doc("/db/Wiki/Climate/cstations.txt")) (: ; separates the stations in each country :) for $country in tokenize($text,";") (: the station list is the array element content i.e. the string between =[ and ] :) let $stationlist := substring-before(substring-after($country,"=["),"]") (: The stations in each country are comma-separated, but commas are also used within the names of countries and stations. However a comma followed by a double quote is the required separator. :) let $stations := tokenize($stationlist,',"') for $station in $stations (: some cleanup of names is needed :) let $data :=replace ( replace($station,'"',"")," ","") (: Each station is in the format of Stationid | English name / French name :) let $f := tokenize($data,"\|") let $id := $f[1] let $country := tokenize($f[2],"/") let $WMOStation := $met:WMOStations[id=$id] (: create a station element containing the id , country and english station name :) return element station { element id {$f[1]}, element country {normalize-space($country[1])}, element location {$f[3]}, $WMOStation/latitude, $WMOStation/longitude } } </stationList>
Storing this file in the same Stations collection means that it will be indexed on the same element names, id and country,as the full WMO station data.
Climate station set binding
[edit | edit source]This set of stations will also be referenced in several places so we define a variable:
declare variable $met:tempStations := doc ("/db/Wiki/Climate/Stations/tempstations.xml")//station;
Visualizing the data
[edit | edit source]We will use XSLT to transform this XML to a presentation of the location of the station and charts of the temperatures. The initial stylesheet was developed by Dave Challender.
( explanation to be added )
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" exclude-result-prefixes="msxsl">
<!-- Authored by Dave Callender, minor mods by Chris Wallace -->
<xsl:output method="html"/>
<xsl:param name="start-year" select="1000"/>
<xsl:param name="end-year" select="3000"/>
<xsl:template match="Station">
<html>
<head>
<script type="text/javascript" src="http://www.google.com/jsapi"/>
<title>
<xsl:value-of select="station/placeName"/>
<xsl:text> </xsl:text>
<xsl:value-of select="station/country"/>
</title>
</head>
<body>
<xsl:apply-templates select="station"/>
<xsl:apply-templates select="TemperatureRecord" mode="googlevis"/>
<xsl:apply-templates select="TemperatureRecord" mode="table"/>
<xsl:apply-templates select="TemperatureRecord" mode="smoothed"/>
</body>
</html>
</xsl:template>
<!-- Visualization of the full temperature record -->
<xsl:template match="TemperatureRecord" mode="googlevis">
<p/>
<p>Google visualization timeline (takes no account of standard deviation etc.)</p>
<div id="chart_div" style="width: 700px; height: 440px;"/>
<p/>
<script type="text/javascript">
google.load('visualization', '1', {'packages':['annotatedtimeline']});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = new google.visualization.DataTable();
data.addColumn('date', 'Date');
data.addColumn('number', 'temp');
data.addRows([
<xsl:apply-templates select="monthlyAverages[@year][@year >= $start-year][@year <= $end-year]" mode="googlevis"/>
[null,null]
]);
var chart = new google.visualization.AnnotatedTimeLine(document.getElementById('chart_div'));
chart.draw(data, {displayAnnotations: true});
}
</script>
</xsl:template>
<xsl:template match="temp_C" mode="googlevis">
<xsl:if test="(node())">
<xsl:text>[new Date(</xsl:text>
<xsl:value-of select="../@year"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="position() - 1 "/>
<!-- Google viz uses 0-based arrays -->
<xsl:text>,15),</xsl:text>
<xsl:value-of select="."/>
<xsl:text>],
</xsl:text>
</xsl:if>
</xsl:template>
<!-- Vizualisation of the smoothed data -->
<xsl:template match="TemperatureRecord" mode="smoothed">
<p/>
<p>Almost totally meaningless - sum all temps for a year and divide by 12 (only do if all 12
data points) but shows a bit of playing with data</p>
<p/>
<div id="smoothed_chart_div" style="width: 700px; height: 440px;"/>
<script type="text/javascript">
google.load('visualization', '1', {'packages':['annotatedtimeline']});
google.setOnLoadCallback(drawChartSmoothed);
function drawChartSmoothed()
{
var data = new google.visualization.DataTable();
data.addColumn('date', 'Date');
data.addColumn('number', 'temp');
data.addRows([
<xsl:apply-templates select="monthlyAverages[@year][@year >= $start-year][@year <=$end-year]" mode="smoothed"/>
[null,null]
]);
var chart = new google.visualization.AnnotatedTimeLine(document.getElementById('smoothed_chart_div'));
chart.draw(data, {displayAnnotations: true});
}
</script>
</xsl:template>
<xsl:template match="monthlyAverages" mode="smoothed">
<xsl:if test="count(temp_C[node()])=12">
<xsl:text>[new Date(</xsl:text>
<xsl:value-of select="@year"/>
<xsl:text>,5,15),</xsl:text>
<xsl:value-of select="sum(temp_C[node()]) div 12"/>
<xsl:text>],
</xsl:text>
</xsl:if>
</xsl:template>
<!-- Data tabulated -->
<xsl:template match="TemperatureRecord" mode="table">
<table border="1">
<tr>
<td>Year</td>
<td>Jan</td>
<td>Feb</td>
<td>Mar</td>
<td>Apr</td>
<td>May</td>
<td>Jun</td>
<td>Jul</td>
<td>Aug</td>
<td>Sep</td>
<td>Oct</td>
<td>Nov</td>
<td>Dec</td>
<tr/>
</tr>
<xsl:apply-templates
select="monthlyAverages[@year][@year >= $start-year][@year < $end-year]"
mode="table"/>
</table>
</xsl:template>
<xsl:template match="monthlyAverages" mode="table">
<tr>
<td>
<xsl:value-of select="@year"/>
</td>
<xsl:apply-templates select="temp_C" mode="table"/>
</tr>
</xsl:template>
<xsl:template match="temp_C" mode="table">
<td>
<xsl:value-of select="."/>
</td>
</xsl:template>
<xsl:template match="Number">
<p> Station Number:  <xsl:value-of select="."/>
</p>
</xsl:template>
<xsl:template match="station">
<h1>
<xsl:value-of select="placeName"/>
<xsl:text>, </xsl:text>
<xsl:value-of select="country"/>
<xsl:text> </xsl:text>
</h1>
<a href="http://maps.google.com/maps?q={latitude},{longitude}">
<img
src="http://maps.google.com/maps/api/staticmap?zoom=11&maptype=hybrid&size=400x300&sensor=false&key=ABQIAAAAVehr0_0wqgw_UOdLv0TYtxSGVrvsBPWDlNZ2fWdNTHNT32FpbBR1ygnaHxJdv-8mkOaL2BJb4V_yOQ&markers=color:blue|{latitude},{longitude}"
alt="{placeName}"/>
</a>
</xsl:template>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Multiple formats
[edit | edit source]We would like to present either the original XML or the HTML visualisation page. We could use two scripts, or combine them into one script with a parameter to indicate how the output is to be rendered. eXist functions allow the serialization of the output and the mime-type to be set dynamically.
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; let $id := request:get-parameter("station",()) let $render := request:get-parameter("render",()) let $station := doc ("/db/Wiki/Climate/Stations/metstations.xml")//station[id = $id] let $tempStation := doc("/db/Wiki/Climate/Stations/tempstations.xml")//station[id = $id] let $temp := if ($tempStaion) then met:station-to-xml($id) else () let $station := <Station> {$station} {$temp} </Station> return if ($render="HTML") then let $ss := doc("/db/Wiki/Climate/FullHTMLMet-V2.xsl") let $options := util:declare-option("exist:serialize","method=xhtml media-type=text/html") let $start-year := request:get-parameter("start","1000") let $end-year := request:get-parameter("end","2100") let $params := <parameters> <param name="start-year" value="{$start-year}"/> <param name="end-year" value="{$end-year}"/> </parameters> return transform:transform($station,$ss,$params) else let $header := response:set-header("Access-Control-Allow-Origin","*") return $station
Simple HTML index
[edit | edit source]We can use the stored station list to create a simple HTML index.
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; declare option exist:serialize "method=xhtml media-type=text/html"; <html> <head> <title>Index of Temperature Record Stations </title> </head> <body> <h1>Index of Temperature Record Stations </h1> { for $country in distinct-values($met:tempStations/country) order by $country return <div> <h3>{$country} </h3> {for $station in $met:tempStations[country=$country] let $id := $station/id order by $station/location return <span><a href="station.xq?station={$id}&render=HTML">{string($station/location)}</a> </span> } </div> } </body> </html>
Station Map
[edit | edit source]We can also generate a (large) KML overlay, with links to each station's page.
We need a function transform a station into a PlaceMark with a link to the HTML station page:
declare function met:station-to-placemark ($station) { let $description := <div> <a href="http://www.cems.uwe.ac.uk/xmlwiki/Climate/station.xq?station={$station/id}&render=HTML">Temperature Record</a> </div> return <Placemark> <name>{string($station/location)}, {string($station/country)}</name> <description>{util:serialize($description,"method=xhtml")} </description> <Point> <coordinates>{string($station/longitude)},{string($station/latitude)},0</coordinates> </Point> </Placemark> };
Then the main script iterates over all the temperature stations to generate the full KML file.
import module namespace met ="http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; declare option exist:serialize "method=xml media-type=application/vnd.google-earth.kml+xml indent=yes omit-xml-declaration=yes"; let $x := response:set-header('Content-Disposition','attachment;filename=country.kml') return <kml xmlns="http://www.opengis.net/kml/2.2"> <Folder> <name>Stations</name> { for $station in $met:tempStations return met:station-to-placemark($station) } </Folder> </kml>
Work in progress
[edit | edit source]- Resource URIs
- RDF