XQuery/Compare with XQuery

From Wikibooks, open books for an open world
Jump to: navigation, search

Motivation[edit]

You want to use XQuery to compare two lists of items and find out how the lists are different

Methods[edit]

We will use a variety of functions that iterate through the lists. For each example we will perform some comparison with the second list.

Simple iteration and test for missing elements[edit]

In this first example, we will use a simple for loop to go through each item on a linear list. We then will check to see if that item is anywhere on the second list (regardless of order). If it is on the second list we will display the item from the first list. If not we will output "missing". This is useful if you want to find out if a local collection is missing files from some remote collection.

xquery version "1.0";
 
(: Compare of two linear lists :)
 
let $list1 :=
<list1>
   <item>a</item>
   <item>b</item>
   <item>c</item>
   <item>d</item>
</list1>
 
 
let $list2 :=
<list1>
   <item>a</item>
   <item>c</item>
   <item>e</item>
</list1>
 
return
<missing>{
  for $item1 in $list1/item
  let $item-text := $item1/text()
  return
    <test item="{$item-text}">
    {if ($list2/item/text()=$item-text)
       then ($item1)
       else <missing>{$item-text}</missing>
    }
     </test>
}</missing>

Note that the conditional expression:

 if ($list2/item/text() = $item-text)

Tests to see if the $item-text is anywhere in list2. If it occurs anywhere this expression will return true().

Sample Results[edit]

<missing>
    <test item="a">
        <item>a</item>
    </test>
    <test item="b">
        <missing>b</missing>
    </test>
    <test item="c">
        <item>c</item>
    </test>
    <test item="d">
        <missing>d</missing>
    </test>
</missing>

Note that this will not report any items on the second list that are missing from the first list.

Using Quantified Expressions[edit]

This can be rewritten using XQuery quantified expressions. There are two reasons for this. First the XQuery optimizer can frequently run quantified expressions much faster and some people feel they are easier to read. See XQuery/Quantified Expressions for more details.

In this second example the list assignments are the same but we will only display the items from list 1 that are missing from list 2.

<missing>{
  for $item1 in $list1/item
  return
    if (some $item2 in $list2/item satisfies $item2/text() = $item1/text())
       then ()
       else $item1
}</missing>

This returns:

<missing>
   <item>b</item>
   <item>d</item>
</missing>

We are now ready to modularize this missing function so that we can pass any two lists to find missing elements.

Creating a Missing XQuery Function[edit]

Our next step is to create an XQuery function that compare any two lists and returns the items in the second list that are not in the first list.

declare function local:missing($list1 as node()*, $list2 as node()*) as node()* {
  for $item1 in $list1/item
  let $item-text := $item1/text()
  return
    if (some $item2 in $list2/item satisfies $item2/text() = $item1/text())
       then ()
       else $item1
};

We can rewrite the output function to use this function:

<results>
   <missing-from-2>{local:missing($list1, $list2)}</missing-from-2>
   <missing-from-1>{local:missing($list2, $list1)}</missing-from-1>
</results>

Note that the order of the lists has been reversed in the second call to the missing() function. The second pass looks for items on list2 that are not on list1.

Running this query generates the following output:

<results>
    <missing-from-2>
        <item>b</item>
        <item>d</item>
    </missing-from-2>
    <missing-from-1>
        <item>e</item>
    </missing-from-1>
</results>

Creating HTML Difference Lists[edit]

We can use CSS to style the output of these reports.

Screen Image[edit]

HTML Diff Report using CSS

Sample Data[edit]

This example uses full words of items to show text highlighting:

let $list1 :=
<list>
   <item>apples</item>
   <item>bananas</item>
   <item>carrots</item>
   <item>kiwi</item>
</list>
 
 
let $list2 :=
<list>
   <item>apples</item>
   <item>carrots</item>
   <item>grapes</item>
</list>

The following function uses HTML div and span elements and adds class="missing" to each div that is missing. The CSS file will highlight this background.

declare function local:missing($list1 as node()*, $list2 as node()*) as node()* {
  for $item1 in $list1/item
  return
    if (some $item2 in $list2/item satisfies $item2/text() = $item1/text())
       then <div>{$item1/text()}</div>
       else
       <div>
        {attribute {'class'} {'missing'}}
        {$item1/text()}         
       </div>
};

We then use the following CSS file to highlight the differences. Each missing element must have class="missing" attribute for the missing element to be highlighted in this report.

body {font-family: Ariel,Helvetica,sans-serif; font-size: large;}
h2 {padding: 3px; margin: 0px; text-align: center; font-size: large; background-color: silver;}
.left, .right {border: solid black 1px; padding: 5px;}
.missing {background-color: pink;}
.left {float: left; width: 190px}
.right {margin-left: 210px; width: 190px}
<body>
      <h1>Missing Items Report</h1>
      <div class="left">
         <h2>List 1</h2>
         {for $item in $list1/item return <div>{$item/text()}</div>}
      </div>
      <div class="right">
         <h2>List 2</h2>
         {for $item in $list2/item return <div>{$item/text()}</div>}
      </div>
      <br/>
      <div class="left">
         <h2>List 1 Missing from 2</h2>
         {local:missing($list1, $list2)}
      </div>
      <div class="right">
         <h2>List 2 Missing from 1</h2>
         {local:missing($list2, $list1)}
      </div>
   </body>

Collation[edit]

If the lists are in sorted order, or can be sorted into order, an alternative approach is to recursively collate the two lists. The core algorithm looks like:

declare function local:merge($a, $b  as item()* )  as item()* {
    if (empty($a) and empty($b))
    then ()
    else if (empty ($b) or $a[1] lt $b[1])
    then ($a[1], local:merge(subsequence($a, 2), $b))
    else  if (empty($a) or $a[1] gt $b[1])
    then  ($b[1],local:merge($a, subsequence($b,2)))           
    else (: a and b matched :)
          ($a[1], $b[1],  local:merge(subsequence($a,2), subsequence($b,2)))
   };

With the example above, we can merge two lists.

 
let $list1 := 
<list>
   <item>apples</item>
   <item>bananas</item>
   <item>carrots</item>
   <item>kiwi</item>
</list>
 
 
let $list2 := 
<list>
   <item>apples</item>
   <item>carrots</item>
   <item>grapes</item>
</list>
 
return 
<result>
{local:merge($list1/item,$list2/item) }
</result>

Execute

The actions on merge will depend on the application and the algorithm can be modified to output only mismatched items on one or other list, and handle matching items appropriately. For example, to display the merged list as HTML, we might modify the algorithm to:

declare function local:merge($a, $b  as item()*  )  as item()* {
    if (empty($a) and empty ($b)) 
    then ()
    else if (empty ($b) or $a[1] lt $b[1])
    then (<div class="left">{$a[1]/text()}</div>, local:merge(subsequence($a, 2), $b))
    else if (empty ($a) or $a[1] gt $b[1])
    then  (<div class="right">{$b[1]/text()}</div>,local:merge($a, subsequence($b,2)))  
    else  (<div class="match">{$a[1]/text()}</div>,  local:merge(subsequence($a,2), subsequence($b,2)))
  };

Execute