XQuery/UK shipping forecast

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Motivation[edit | edit source]

The UK shipping forecast is prepared by the UK met office 4 times a day and published on the radio, the Met Office web site and [no longer] the BBC web site. However it is not available in a computer readable form.

Tim Duckett recently blogged about creating a Twitter stream. He uses Ruby to parse the text forecast. The textual form of the forecast is included on both the Met Office and BBC sites. However as Tim points out, the format is designed for speech, compresses similar areas to reduce the time slot and is hard to parse. The approach taken here is to scrape a JavaScript file containing the raw area forecast data.

Implementation[edit | edit source]

Dependancies[edit | edit source]

eXist-db Modules[edit | edit source]

The following scripts use these eXist modules:

  • request - to get HTTP request parameters
  • httpclient - to GET and POST
  • scheduler - to schedule scrapping tasks
  • dateTime - to format dateTimes
  • util - base64 conversions
  • xmldb - for database access

Other[edit | edit source]

  • UK Met office web site

Met Office page[edit | edit source]

[This approach is no longer viable since the javascript file retrieved is no longer being updated]

The Met office page shows an area-by-area forecast but this part of the page is generated by JavaScript from data in a generated JavaScript file. In this file, the data is assigned to multiple arrays. A typical section looks like

// Bailey
gale_in_force[28] = "0";
gale[28] = "0";
galeIssueTime[28] = "";
shipIssueTime[28] = "1725 Sun 06 Jul";
wind[28] = "Northeast 5 to 7.";
weather[28] = "Showers.";
visibility[28] = "Moderate or good.";
seastate[28] = "Moderate or rough.";
area[28] = "Bailey";
area_presentation[28] = "Bailey";
key[28] = "Bailey";

// Faeroes
...

Area Forecast[edit | edit source]

JavaScript conversion[edit | edit source]

The first function fetches the current JavaScript data using the eXist httpclient module and converts the base64 data to a string:

xquery version "3.0";

declare namespace httpclient = "http://exist-db.org/xquery/httpclient";
declare namespace met = "http://kitwallace.co.uk/wiki/met";
declare variable $met:javascript-file := "http://www.metoffice.gov.uk/lib/includes/marine/gale_and_shipping_table.js";

declare function met:get-forecast() as xs:string {
  (: fetch the javascript source  and locate the text of the body of the response :)
  let $base64:= httpclient:get(xs:anyURI($met:javascript-file),true(),())/httpclient:body/text()
 (: this is base64 encoded , so decode it back to text :)
  return  util:binary-to-string($base64)
};

The second function picks out an area forecast from the JavaScript and parses the code to generate an XML structure using the JavaScript array names.

declare function met:extract-forecast($js as xs:string, $area as xs:string) as element(forecast)? {

 (: isolate the section for the required area, prefixed with a comment :)
  let $areajs :=  normalize-space(substring-before( substring-after($js,concat("// ",$area)),"//"))
  return 
   if($areajs ="")  (: area not found :)
   then ()
   else 
(: build an XML element containing elements for each of the data items, using the array names as the element names :)

<forecast>
{
for $d in tokenize($areajs,";")[position() < last()] (: JavaScript statements terminated by ";" - ignore the last empty :)
     let $ds := tokenize(normalize-space($d),' *= *"') (: separate the LHS and RHS of the assignment statement :)
     let $name := replace(substring-before($ds[1],"["),"_","") (: element name is  the array name, converted to a legal name :)
     let $val := replace($ds[2],'"','')  (: element text is the RHS minus quotes :)
     let $val := replace ($val,"<.*>","")  (: remove embedded annotation - in shipissuetime :)
     return 
       element {$name}  {$val} 
}
</forecast>
};

To fetch an area forcast :

let $js := met:get-forecast()
return 
    met:extract-forecast($js,"Fastnet")

For example, the output for one selected area is:

<forecast>
    <galeinforce>0</galeinforce>
    <gale>0</gale>
    <galeIssueTime/>
    <shipIssueTime>1030  Tue 28 Oct</shipIssueTime>
    <wind>Southwest veering northeast, 5 or 6.</wind>
    <weather>Rain at times.</weather>
    <visibility>Good, occasionally poor.</visibility>
    <seastate>Moderate or rough.</seastate>
    <area>Fastnet</area>
    <areapresentation>Fastnet</areapresentation>
    <key>Fastnet</key>
</forecast>

Format the forecast as text[edit | edit source]

The forecast data needs to be formatted into a string:

declare function met:forecast-as-text($forecast as element(forecast)) as xs:string {
      concat( $forecast/weather,
              " Wind ",  $forecast/wind,
              " Visibility ", $forecast/visibility, 
              " Sea ", $forecast/seastate
            )
}; 

let $js := met:get-forecast()
let $forecast := met:extract-forecast($js,"Fastnet")
return
    <report>{met:forecast-as-text($forecast)}</report>

which returns

<report>Rain at times. Wind Southwest veering northeast, 5 or 6. Visibility Good, occasionally poor. Sea Moderate or rough.</report>

Area Forecast[edit | edit source]

Finally these functions can be used in a script which accepts a shipping area name and returns an XML message:

let $js := met:get-forecast()
let $forecast := met:extract-forecast($js,"Fastnet")
return
    <message area="{$area}"  dateTime="{$forecast/shipIssueTime}">
       {met:forecast-as-text($forecast)} 
    </message>

Message abbreviation[edit | edit source]

To create a message suitable for texting (160 characters), or tweeting (140 character limit), the message can compressed by abbreviating common words.


Abbreviation dictionary[edit | edit source]

A dictionary of words and abbreviations is created and stored locally. The dictionary has been developed using some of the abbreviations in Tim Duckett's Ruby implementation.

<dictionary>
 <entry  full="west" abbrev="W"/>
 <entry  full="westerly" abbrev="Wly"/>
..
 <entry  full="variable" abbrev="vbl"/>
 <entry  full="visibility" abbrev="viz"/>
 <entry  full="occasionally" abbrev="occ"/>
 <entry  full="showers" abbrev="shwrs"/>

</dictionary>

The full dictionary

Abbreviation function[edit | edit source]

The abbreviation function breaks down the text into words, replaces words with abbreviations and builds the text up again:

declare function met:abbreviate($forecast as xs:string) as xs:string {
   string-join(
(: lowercase the string, append a space (to ensure a final . is matched) and tokenise :)
      for $word in tokenize(concat(lower-case($forecast)," "),"\.? +")
      return
(: if there is an entry for the word , use its abbreviation, otherwise use the unabbreviated word :)
        ( /dictionary/entry[@full=$word]/@abbrev,$word) [1]
        ,
      " ") (: join the words back up with space separator :)  
};

Abbreviated Message[edit | edit source]

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
let $area := request:get-parameter("area","Lundy")
let $forecast := met:get-forecast($area)
return
   <message area="{$area}"  dateTime="{$forecast/shipIssueTime}">
       {met:abbreviate(met:forecast-as-text($forecast))} 
   </message>

All Areas forecast[edit | edit source]

This function is an extension of the area forecast. The parse uses the comment separator to break up the script, ignores the first and last sections and the area name in the comment

declare function met:get-forecast() as element(forecast)* {
  let $jsuri := "http://www.metoffice.gov.uk/lib/includes/marine/gale_and_shipping_table.js"
  let $base64:= httpclient:get(xs:anyURI($jsuri),true(),())/httpclient:body/text()
  let $js :=  util:binary-to-string($base64)
  for $js in tokenize($js,"// ")[position() > 1] [position()< last()]
  let $areajs := concat("gale",substring-after($js,"gale"))
  return      
<forecast>
{
for $d in tokenize($areajs,";")[position() < last()]
   let $ds := tokenize(normalize-space($d)," *= *")
   return 
     element {replace(substring-before($ds[1],"["),"_","")}
                     {replace($ds[2],'"','')}
}
</forecast>
};


XML version of forecast[edit | edit source]

This script returns the full Shipping forecast in XML:

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";

<ShippingForecast>
    {met:get-forecast()}
</ShippingForecast>

RSS version of forecast[edit | edit source]

XSLT would be suitable for transforming this XML to RSS format ...

SMS service[edit | edit source]

One possible use of this data would be to provide an SMS on-request service, taking an area name and returning the abbreviated forecast. The complete set of forecasts are created, and the one for the area supplied as the message selected and returned as an abbreviated message.

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";

let $area := lower-case(request:get-parameter("text",()))
let $forecast := met:get-forecast()[lower-case(area) = $area]
return
   if (exists($forecast))
   then 
      concat("Reply: ", met:abbreviate(met:forecast-as-text($forecast)))
    else 
      concat("Reply: Area ",$area," not recognised")

The calling protocol is determined here by the SMS service installed at UWE and described here

Caching[edit | edit source]

Fetching the JavaScript on demand is neither efficient nor acceptable net behaviour, and since the forecast times are known, it is preferable to fetch the data on a schedule, convert to the XML form and save in the eXist database and then use the cached XML for later requests.

Store XML forecast[edit | edit source]

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
declare variable $col := "/db/Wiki/Met/Forecast";

if (xmldb:login($col, "user", "password"))  (: a user who has write access to the Forecast collection :)
then 
       let $forecast := met:get-forecast()
       let $forecastDateTime := met:timestamp-to-xs-date(($forecast/shipIssueTime)[1])  (: convert to xs:dateTime :)
       let $store :=  xmldb:store( 
              $col,                    (: collection to store forecast in :)
              "shippingForecast.xml",  (: file name - overwrite is OK here as we only want the latest :)
                                       (: then the constructed XML to be stored :)
              <ShippingForecast  at="{$forecastDateTime}" >
                   {$forecast}
              </ShippingForecast>
              ) 
       return
           <result>
               Shipping forecast for {string($forecastDateTime)} stored in  {$store}
          </result>
else ()

The timestamp used on the source data is converting to an xs:dateTime for ease of later processing.

declare function met:timestamp-to-xs-date($dt as xs:string) as xs:dateTime {
(: convert timestamps in the form 0505 Tue 08 Jul to xs:dateTime :)
   let $year := year-from-date(current-date())  (: assume the current year since none provided :)
   let $dtp := tokenize($dt," ")
   let $mon := index-of(("Jan","Feb", "Mar","Apr","May", "Jun","Jul","Aug","Sep","Oct","Nov","Dec"),$dtp[4])
   let $monno := if($mon < 10) then concat("0",$mon) else $mon
   return xs:dateTime(concat($year,"-",$monno,"-",$dtp[3],"T",substring($dtp[1],1,2),":",substring($dtp[1],3,4),":00"))
};

Reducing the forecast data[edit | edit source]

The raw data contains redundant elements (several versions of the area name) and elements which are normally empty (all gale related elements when no gale warning) but lacks a case-normalised area name as a key. The following function performs this restructuring:

declare function met:reduce($forecast as element(forecast)) as element(forecast) {
            <forecast>  
                          { attribute area {lower-case($forecast/area)}}
                          { $forecast/*
                                    [not(name(.) = ("shipIssueTime","area","key"))] 
                                    [ if (../galeinforce = "0" ) 
                                      then not(name(.) = ("galeinforce","gale","galeIssueTime")) 
                                      else true()
                                    ]
                            }
             </forecast>
};

There would be a case to make for using XSLT for this transformation. The caching script applies this transformation to the forecast before saving.

SMS via cache[edit | edit source]

The revised SMS script can now access the cache. First a function to get the stored forecast:

declare function met:get-stored-forecast($area as xs:string) as element(forecast) {
  doc("/db/Wiki/Met/Forecast/shippingForecast.xml")/ShippingForecast/forecast[@area = $area]
};
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";

let $area := lower-case(normalise-space(request:get-parameter("text",())))
let $forecast := met:get-stored-forecast($area)
return
   if (exists($forecast))
   then 
      concat("Reply: ", datetime:format-dateTime($forecast/../@at,"HH:mm")," ",met:abbreviate(met:forecast-as-text($forecast)))
    else 
      concat("Reply: Area ",$area," not recognised")

In this script, the selected forecast for the input area extracted by the met function call is a reference to the database element, not a copy. Thus it is still possible to navigate back to the parent element containing the timestamp.

The eXist datetime functions are wrappers for the Java class java.text.SimpleDateFormat which defines the date formatting syntax.

Job scheduling[edit | edit source]

eXist includes a scheduler module which is a wrapper for the Quartz scheduler. Jobs can only be created by a DBA user.

For example, to set a job to fetch the shipping forecast on the hour,

let $login := xmldb:login( "/db", "admin", "admin password" ) 
let $job := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq" , "0 0 * * * ?")
return $job

where "0 0 * * * ?" means to run at 0 seconds, 0 minutes past every hour of every day of every month, ignoring the day of the week.

To check on the set of scheduled jobs, including system schedule jobs:

let $login := xmldb:login( "/db", "admin", "admin password" ) 
return scheduler:get-scheduled-jobs()

It would be better to schedule jobs on the basis of the update schedule for the forecast. These times are 0015, 0505, 1130 and 1725. These times cannot be fitted into a single cron pattern so multiple jobs are required. Because jobs are identified by their path, the same url cannot be used for all instances, so a dummy parameter is added.

Discussion The times are one minute later than the published times. This may not be enough slack to account for discrepancies in timing on both sides. Clearly a push from the UK Met Office would be better than the pull scraping. The scheduler clock runs in local time (BST) as are the publication times.

let $login := xmldb:login( "/db", "admin", "admin password" ) 
let $job1 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=1" , "0 16 0 * * ?")
let $job2 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=2" , "0 6 5 * * ?")
let $job3 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=3" , "0 31 11 * * ?")
let $job4 := scheduler:schedule-xquery-cron-job("/db/Wiki/Met/getandsave.xq?t=4" , "0 26 17 * * ?")
return ($job1, $job2, $job3, $job4)

Forecast as kml[edit | edit source]

Sea area coordinates[edit | edit source]

The UK Met Office provides a clickable map of forecasts but a KML map would be nice. The coordinates of the sea areas can be captured and manually converted to XML.

<?xml version="1.0" encoding="UTF-8"?>
<boundaries>
    <boundary area="viking">
        <point latitude="61" longitude="0"/>
        <point latitude="61" longitude="4"/>
        <point latitude="58.5" longitude="4"/>
        <point latitude="58.5" longitude="0"/>
      </boundary>
...

The boundary for an area is accessed by two functions. In this idiom one function hides the document location and returns the root of the document. Subsequence functions use this base function to get the document and then apply further predicates to filter as required.

declare function met:area-boundaries() as element(boundaries) {
  doc("/db/Wiki/Met/shippingareas.xml")/boundaries
};

declare function met:area-boundary($area as xs:string) as element(boundary) {
   met:area-boundaries()/boundary[@area=$area]
};

The centre of an area can be roughly computed by averaging the latitudes and longitudes:

declare function met:area-centre($boundary as element(boundary)) as element(point) {
   <point 
      latitude="{round(sum($boundary/point/@latitude) div count($boundary/point) * 100) div 100}"
      longitude="{round(sum($boundary/point/@longitude) div count($boundary/point) * 100) div 100}"
   />
};

kml Placemark[edit | edit source]

We can generate a kml PlaceMark from a forecast:

declare function met:forecast-to-kml($forecast as element(forecast)) as element(Placemark) {
   let $area := $forecast/@area
   let $boundary := met:area-boundary($area)
   let $centre := met:area-centre($boundary)
   
   return 
     <Placemark >
        <name>{string($forecast/areapresentation)}</name>
         <description>
           {met:forecast-as-text($forecast)}
         </description>
         <Point>
             <coordinates>
                 {string-join(($centre/@longitude,$centre/@latitude),",")}
             </coordinates>
         </Point>
    </Placemark>
};

kml area area[edit | edit source]

Since we have the area coordinates, we can also generate the boundaries as a line in kml.

declare function met:sea-area-to-kml(
    $area as xs:string, 
    $showname as xs:boolean
    ) as element(Placemark)
 {
   let $boundary := met:area-boundary($area)
   return 
     <Placemark >
        {if($showname) then <name>{$area}</name> else()}
        <LineString>
            <coordinates>
            {string-join(
               for $point in $boundary/point
               return
                   string-join(($point/@longitude,$point/@latitude,"0"),",")
                , " "
                )
             }
            </coordinates>
         </LineString>
      </Placemark>
  };

Generate the kml file[edit | edit source]

import module  namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";

(: set the media type for a kml file :)
declare option exist:serialize  "method=xml indent=yes 
     media-type=application/vnd.google-earth.kml+xml"; 
     
(: set the file name ans extension when saved to allow GoogleEarth to be invoked :)
let $dummy := response:set-header('Content-Disposition','inline;filename=shipping.kml;')

(: get the latest forecast :)
let $shippingForecast := met:get-stored-forecast()

return
<kml >
   <Folder>
       <name>{datetime:format-dateTime($shippingForecast/@at,"EEEE HH:mm")} UK Met Office Shipping forecast</name>
       {for $forecast in $shippingForecast/forecast
        return 
         (met:forecast-to-kml($forecast),
          met:sea-area-to-kml($forecast/@area,false())
         )
       }
   </Folder>
</kml>


Push messages[edit | edit source]

An alternative use of this data is to provide a channel to push the forecasts through as soon as they are received. The channel could be a SMS alert to subscribers or a dedicated Twitter stream which users could follow.

Subscription SMS[edit | edit source]

This service should allow a user to request an alert for a specific area or areas. The application requires:

  • a data structure to record subscribers and their areas
  • a web service to register a user, their mobile phone number and initial area [to do]
  • an SMS service to change the required area and turn messaging on or off
  • a scheduled task to push the SMS messages when the new forecast has been obtained

Document Structure[edit | edit source]

<subscriptions>
  <subscription>
     <username>Fred Bloggs</username>
     <password>hafjahfjafa</password>
     <mobilenumber>447777777</mobilenumber>
     <area>lundy</area>
     <status>off</status>
  </subscription>
  ...
</subscriptions>


XML Schema[edit | edit source]

(to be completed)

Access control[edit | edit source]

Access to this document needs to be controlled.

The first level of access control is to place the file in a collection which is not accessible via the web. In the UWE server, the root (via mod-rewrite) is the collection /db/Wiki so resources in this directory and subdirectories are accessible, subject to the access settings on the file, but files in parent or sibling directories are not. So this document is stored in the directory /db/Wiki2. The URL of this file, relative to the external root is http://www.cems.uwe.ac.uk/xmlwiki/../Wiki2/shippingsubscriptions.xml but access fails.

The second level of control is to set the owner and permissions on the file. This is needed because a user on a client behind the firewall, using the internal server address, will gain access to this file. By default, world permissions are set to read and update. Removing this access requires the script to login to read as group or owner.

Ownership and permissions can be set either via the web client or by functions in the eXist xmldb module.

SMS push[edit | edit source]

This function takes a subscription, formulates a text message and calls a general sms:send function to send. This interfaces with our SMS service provider.

declare function met:push-sms($subscription as element(subscription))  as element(result) {
  let $area := $subscription/area
  let $forecast := met:get-stored-forecast($area)
  let $time := datetime:format-dateTime($forecast/../@at,"EE HH:mm")
  let $text := encode-for-uri(concat($area, " ",$time," ",met:abbreviate(met:forecast-as-text($forecast))))
  let $number := $subscription/mobilenumber
  let $sent := sms:send($number,$text)
  return 
       <result number="{$number}" area="{$area}" sent="{$sent}"/>
};

SMS push subscriptions[edit | edit source]

First we need to get the active subscriptions. The functions follow the same idiom used for boundaries:

declare function met:subscriptions() {
    doc("/db/Wiki2/shippingsubscriptions.xml")/subscriptions
};

declare function met:active-subscriptions() as element(subscription) *  {
    met:subscriptions()/subscription[status="on"]
};


and then to iterate through the active subscriptions and report the result:

declare function met:push-subscriptions() as element(results) {
<results>
   { 
     let $dummy := xmldb:login("/db","webuser","password")
     for  $subscription in  met:active-subscriptions()
     return     
        met:push-sms($subscription) 
   }
</results>
};

This script iterates through the subscriptions currently active and calls the push-SMS function for each one.

import module  namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
met:push-subscriptions()

This task could be scheduled to run after the caching task has run or the caching script modified to invoke the subscription task when it has completed. However eXist also supports triggers so the task could also be triggered by the database event raised when the forecast file store has been completed.

Subscription editing by SMS[edit | edit source]

A message format is required to edit the status of the subscription and to change the subscription area:

 metsub [ on |off |<area> ]

If the area is changed the status is set to on.

The area is validated against a list of area codes. These are extracted from the boundary data:

declare function met:area-names() as xs:string* {
   met:area-boundaries()/boundary/string(@area)
};


import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
let $login:= xmldb:login("/db","user","password")
let $text := normalize-space(request:get-parameter("text",()))
let $number := request:get-parameter("from",())
let $subscription := met:get-subscription($number)

return
   if (exists($subscription))
   then       
        let $update :=
           if ( $text= "on") 
           then update replace $subscription/status with <status>on</status>
           else if( $text = "off") 
           then update replace $subscription/status with <status>off</status>
           else if ( lower-case($text) = met:area-names())
           then ( update replace $subscription/area with <area>{$text}</area>,
                  update replace $subscription/status with <status>on</status>
                )
           else ()
       return
         let $subscription := met:get-subscription($number)(: get the subscription post update :)
         return 
             concat("Reply: forecast is ",$subscription/status," for area ",$subscription/area)
 else ()


Twitter[edit | edit source]

Twitter has a simple REST API to update the status. We can use this to tweet the forecasts to a Twitter account. Twitter uses Basic Access Authentication and a suitable XQuery function to send a message to a username/password, using the eXist httpclient module is :

declare function met:send-tweet ($username as xs:string,$password as xs:string,$tweet as xs:string )  as xs:boolean {
   let $uri :=  xs:anyURI("http://twitter.com/statuses/update.xml")
   let $content :=concat("status=", encode-for-uri($tweet))
   let $headers := 
      <headers>
          <header name="Authorization" 
                  value="Basic {util:string-to-binary(concat($username,":",$password))}"/>
         <header name="Content-Type"
                  value="application/x-www-form-urlencoded"/>
     </headers>
   let $response :=   httpclient:post( $uri, $content, false(), $headers ) 
   return
        $response/@statusCode='200'
 };

A script is needed to access the stored forecast and tweet the forecast for an area. Different twitter accounts could be set up for each shipping area. The script will need to be scheduled to run after the full forecast has been acquired.

In this example, the forecast for given are is tweeted to a hard-coded twitterer:

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";

declare variable $username := "kitwallace";
declare variable $password := "mypassword";
declare variable $area := request:get-parameter("area","lundy");

let $forecast := met:get-stored-forecast($area)
let $time := datetime:format-dateTime($forecast/../@at,"HH:mm")
let $message := concat($area," at ",$time,":",met:abbreviate(met:forecast-as-text($forecast)))
return 
    <result>{met:send-tweet($username,$password,$message)}</result>

Chris Wallace's Twitter

To do[edit | edit source]

Creating and editing subscriptions[edit | edit source]

This task is ideal for XForms.

Triggers[edit | edit source]

Use a trigger to push the SMS messages when update has been done.