XQuery/String Analysis
XQuery analyze-string
[edit | edit source]XSLT 2.0 includes the analyze-string construct which captures matching groups (in parentheses) in a regular expression. Strangely this is not available in XQuery. It is possible to use the XSLT construct by wrapping an XQuery function round a generated XSLT stylesheet, even though this seems rather painful. In this installation of eXist, the XSLT engine is Saxon 8.
declare function str:analyze-string($string as xs:string, $regex as xs:string,$n as xs:integer ) { transform:transform (<any/>, <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match='/' > <xsl:analyze-string regex="{$regex}" select="'{$string}'" > <xsl:matching-substring> <xsl:for-each select="1 to {$n}"> <match> <xsl:value-of select="regex-group(.)"/> </match> </xsl:for-each> </xsl:matching-substring> </xsl:analyze-string> </xsl:template> </xsl:stylesheet>, () ) };
UK Vehicle Registration numbers
[edit | edit source]To illustrate the use of this function, here is a decoder for UK vehicle license plates. These have undergone a number of changes of format, so the script must first decide which format is used, then analyze the number to find the significant codes for the area and date of registration. The patterns are defined in XML and define the regular expression to be used, and the meaning of the matched groups.
Problem: Passing repetition modifiers through is failing
import module namespace str = "http://www.cems.uwe.ac.uk/string" at "../lib/string.xqm"; declare variable $patterns := <patterns> <pattern version="01" regexp="([A-Z][A-Z])(\d\d)[A-Z][A-Z][A-Z]"> <field>Area</field><field>Date</field> </pattern> <pattern version="83" regexp="([A-Z])\d+[A-Z]([A-Z][A-Z])"> <field>Date</field><field>Area</field> </pattern> <pattern version="63" regexp="([A-Z][A-Z])[A-Z]?\d+([A-Z])"> <field>Area</field><field>Date</field> </pattern> </patterns>; declare function local:decode-regno($regno) { let $regno := upper-case($regno) let $regno := replace($regno, " ","") return for $pattern in $patterns/pattern let $regexp := concat("^",$pattern/@regexp,"$") return if (matches($regno,$regexp)) then let $analysis := str:analyze-string($regno,$regexp,count($pattern/field)) return <regno version="{$pattern/@version}"> {for $field at $i in $pattern/field let $value := string($analysis[position() = $i]) let $table := concat($field,$pattern/@version) let $value := /CodeList[@id=$table]/Entry[Code=$value] return element {$field} {$value/*} } </regno> else () }; let $regno := request:get-parameter("regno",()) return local:decode-regno($regno)
Decode tables
[edit | edit source]Separate tables decode codes to date ranges or areas. These tables are plain XML created from CSV files via Excel. The pre-83 area codes are currently incorrect.
e.g.
<CodeList id="Area83"> <Entry> <Code>AA</Code> <Location>Bournemouth</Location> </Entry> <Entry> <Code>AB</Code> <Location>Worcester</Location> </Entry> <Entry> <Code>AC</Code> <Location>Coventry</Location> </Entry> ...
Examples
[edit | edit source]Location Mapping
[edit | edit source]One use of this conversion is to display the locations on a map. Here we take a file of observed registration numbers, decode them all, group by location and generate a KML file with the locations geocoded through the Google API.
<NumberList> <Regno>H251GBU</Regno> <Regno>WRA870Y</Regno> <Regno>ENB427T</Regno> <Regno>C406OUY</Regno> <Regno>N62VNF</Regno> <Regno>R895KCV</Regno> <Regno>C758HOV</Regno> <Regno>H541HEM</Regno> ...
(: this script plots the registration locations of a set of UK vehicle license plates using kml. :) import module namespace geo="http://www.cems.uwe.ac.uk/exist/geo" at "../lib/geo.xqm"; import module namespace str = "http://www.cems.uwe.ac.uk/string" at "../lib/string.xqm"; declare namespace reg = "http://www.cems.uwe.ac.uk/wiki/reg"; declare option exist:serialize "method=xml media-type=application/vnd.google-earth.kml+xml indent=yes omit-xml-declaration=yes"; declare variable $reg:icon := "http://maps.google.com/mapfiles/kml/paddle/ltblu-blank.png"; declare variable $reg:patterns := <patterns> <pattern version="01" regexp="([A-Z][A-Z])(\d\d)[A-Z][A-Z][A-Z]"> <field>Area</field><field>Date</field> </pattern> <pattern version="83" regexp="([A-Z])\d+[A-Z]([A-Z][A-Z])"> <field>Date</field><field>Area</field> </pattern> <pattern version="63" regexp="([A-Z][A-Z])[A-Z]?\d+([A-Z])"> <field>Area</field><field>Date</field> </pattern> </patterns>; declare function reg:decode-regno($regno) { let $regno := upper-case($regno) let $regno := replace($regno, " ","") return for $pattern in $reg:patterns/pattern let $regexp := concat("^",$pattern/@regexp,"$") return if (matches($regno,$regexp)) then let $analysis := str:analyze-string($regno,$regexp,count($pattern/field)) return <regno version="{$pattern/@version}"> {for $field at $i in $pattern/field let $value := string($analysis[position() = $i]) let $table := concat($field,$pattern/@version) let $value := /CodeList[@id=$table]/Entry[Code=$value] return element {$field} {$value/*} } </regno> else () }; declare function reg:regno-locations($regnos) { for $regno in $regnos let $analysis := reg:decode-regno($regno) return if (exists($analysis//Location)) then string($analysis//Location) else () }; let $url := request:get-parameter("url",()) let $x := response:set-header('Content-Disposition','inline;filename=regnos.kml;') return <Document> <name>Reg nos</name> {for $i in (1 to 10) return <Style id="size{$i}"> <IconStyle> <scale>{$i}</scale> <Icon><href>{$reg:icon}</href> </Icon> </IconStyle> </Style> } { let $locations := reg:regno-locations(doc($url)//Regno) let $max := count($locations) for $place in distinct-values($locations) let $latlong := geo:geocode(concat($place,',UK')) let $count := count($locations[. = $place]) let $scale := max((round($count div $max * 10),1)) order by $count descending return <Placemark> <name>{$place} ({$count})</name> <styleUrl>#size{$scale}</styleUrl> <Point><coordinates>{geo:position-as-kml($latlong)}</coordinates></Point> </Placemark> } </Document>
SMS service
[edit | edit source]The Department of Information Science and Digital Media supports an SMS service with facilities to send and receive text messages. The service is paid for by the University of the West of England, Bristol and all traffic is logged.
A decoder for UK vehicle license numbers is one of the demonstration services which are supported for mobile-originated (MO) text messages.
The format of the text message is
REG L052
e.g.447624803759
A text message in this format sent to our SMS mobile number 447624803759 passes through a PHP script which allows multiple SMS services to be supported. The script uses the first word of the message to identify the associated service endpoint, and then invokes that endpoint via HTTP, passing the prefix as code, the rest of the message as text and the origination mobile number as from.
For the prefix REG, the associated endpoint is an XQuery script:
http://www.cems.uwe.ac.uk/xmlwiki/regno/smsregno.xq
The smsregno.xq script is essentially the parseregno script above.
declare option exist:serialize "method=text media-type=text/text";
...
let $regno := request:get-parameter("text",())
let $data := local:decode-regno($regno)
return
concat("Reply: ",
$regno ,
" was registered in ",
$data/Area/Location,
" between ",
$data/Date/From ,
" and ",
$data/Date/To
)
The SMS switch then sends the Reply on to the originating mobile phone.
To do
[edit | edit source]- solve problem with repetition modifiers (or function support for analayze-string)
- Pre-83 area code data
- Switch implementation in XQuery to replace the PHP application - awaits switch to eXist v2