XQuery/Delivery Status Report

From Wikibooks, open books for an open world
Jump to navigation Jump to search

A common task is the need to integrate local data with related data on another site. Although an increasing number of sites provide an RSS feed, it is often necessary to scrape web pages to get the relevant data.


Delivery Status Reporting[edit | edit source]

One such case is where a company needs to monitor the status of their deliveries which use a courier service.

In this mock example, a company maintains their own records of all deliveries commissioned, the delivery service used and the service's consignment number.. As an XML file, this might look like;

<DeliveryList>
  <Delivery><CustomerName>Fred Flintstone</CustomerName><Service>CityLink</Service> <ConsignmentNo>RZL14823</ConsignmentNo></Delivery>   
  <Delivery><CustomerName>Bill Bailey</CustomerName><Service>CityLink</Service> <ConsignmentNo>RZL14869</ConsignmentNo></Delivery>   
  <Delivery><CustomerName>Jack and Jill</CustomerName><Service>CityExpress</Service> <ConsignmentNo>RXL9999</ConsignmentNo></Delivery>   
 </DeliveryList>

Integrated Report[edit | edit source]

The following script shows how the local delivery data can be combined with the data for this delivery obtained from the delivery company. In this case, the delivery company City-Line provides a page for each consignment reporting its status.

The script loops over the relevant deliveries and constructs the appropriate URL to read the page for each delivery. The page is input to an HTML-to-XML conversion (used in the Yahoo Weather feed), and then specific elements are retrieved from the HTML to build an extract in XML of the page. This XML data is then combined with the local data to create a combined report.

import module namespace  fwiki = "http://www.cems.uwe.ac.uk/xmlwiki" at "../reports/util.xqm";
declare option exist:serialize "method=xhtml media-type=text/html";

declare variable $citylinkURL := "http://www.city-link.co.uk/pod/podfrm.php?JobNo=ZZZZ";

declare function  local:get-consignment($consNo) {
let $citylinkURL := replace($citylinkURL,"ZZZZ",$consNo)
let $page := fwiki:html-to-xml($citylinkURL)
return  
 <Consignment>
     <CustomerReference> 
        {string($page//table[@id="this_table_holds_the_summary_info"]/tr[1]/td[2])} 
    </CustomerReference>
    <ScheduledDeliveryDate>
        {string($page//table[@id="this_table_holds_the_summary_info"]/tr[1]/td[4])}
    </ScheduledDeliveryDate>
    <DeliveryStatus>
        {string($page//table[@id="this_table_holds_the_detailed_status_desc"]/tr[1]/td[2])}
    </DeliveryStatus>
 </Consignment>
};

let $report := 
<Report>
{for $delivery in //Delivery[Service="CityLink"]
 let $citylink := local:get-consignment($delivery/ConsignmentNo)
 return  
   <Delivery>
        {$delivery/*}
        {$citylink/*}
    </Delivery>
}
</Report>
return
  fwiki:element-seq-to-table($report)

Show Report

Notes[edit | edit source]

  1. In production, a simple script to extract and store the delivery data in the database could be scheduled to run every hour to reduce the demands on the sites used in this application.
  2. The script uses a generic function to convert any simple tabular XML to an HTML table.
  3. The mapping between HTML elements and XML depends on the stability of this page. The paths are simplified by the presence of ids for the relevant tables.
  4. A production system must be able to detect HTTP errors and act accordingly. This would require more control over the HTTP requests and responses. This facility is provided by the HTTP module in later releases of eXist. The simplistic approach taken here to obtain the XML would need to be replaced.