XQuery/Sequences

From Wikibooks, open books for an open world
Jump to: navigation, search

Motivation[edit]

You want to manipulate a sequence of items. These items may be very similar to each other or they may be of very different types.

Method[edit]

We begin with some simple examples of sequences. We then look at the most common sequence operators. XQuery uses the word sequence as a generic name for an ordered container of items.

Understanding how sequences work in XQuery is central to understanding how the language works. The use of generic sequences of items is central to functional programming and stands in sharp contrast to other programming languages such as Java or JavaScript that provide multiple methods and functions to handle key-value pairs, dictionaries, arrays and XML data. The wonderful thing about XQuery is that you only need to learn one set of concepts and a very small list of functions to learn how to quickly manipulate data.

Examples[edit]

Creating sequences of characters and strings[edit]

You use the parenthesis to contain a sequence, commas to delimit items and quotes to contain string values:

   let $sequence := ('a', 'b', 'c', 'd', 'e', 'f')

Note that you can use single or double quotes, but for most character strings a single quote is used.

   let $sequence := ("apple", 'banana', "carrot", 'dog', "egg", 'fig')

You can also intermix data types. For example the following sequence has three strings and three integers in the same sequence.

   let $sequence := ('a', 'b', 'c', 1, 2, 3)

You can then pass the sequence to any XQuery function that works with sequences of items. For example the count() function takes a sequence as an input and returns the number of items in the sequence.

   let $count := count($sequence)

To see the results of these items you can create a simple XQuery that displays the items using a FLWOR statement.

Viewing items in a sequence[edit]

   xquery version "1.0";
 
   let $sequence := ('a', 'b', 'c', 'd', 'e', 'f')
 
   let $count := count($sequence)
 
   return
   <results>
      <count>{$count}</count>
      <items>
      {for $item in $sequence
       return
          <item>{$item}</item>
      }
      </items>
   </results>

Execute

   <results>
      <count>6</count>
      <items>
         <item>a</item>
         <item>b</item>
         <item>c</item>
         <item>d</item>
         <item>e</item>
         <item>f</item>
         </items>
   </results>

Viewing select items in a sequence[edit]

Items within a sequence can be selected using a predicate expression.

Items can be slected by position ( 1-based ):

   let $sequence := ('a', 'b', 'c', 'd', 'e', 'f')
   return
      <items>{
         for $item in $sequence[1, 3, 4]
         return
           <item>{$item}</item>
      }</items>

Results:

<items>
    <item>a</item>
    <item>c</item>
    <item>d</item>
</items>

or by value:

    let $sequence := ('a', 'b', 'c', 'd', 'e', 'f')
    return
       <items>{
          for $item in $sequence[. = ('a','e')]
          return
             <item>{$item}</item>
       }</items>

Results:

<items>
    <item>a</item>
    <item>e</item>
</items>

Adding XML elements to your sequence[edit]

You can also store XML elements in a sequence.

let $sequence := ('apple', <banana/>, <fruit type="carrot"/>, <animal type='dog'/>, <vehicle>car</vehicle>)

Although you can use parenthesis to create sequence of XML items, a best practice (?when) is to use XML tags to begin and end a sequence and to store all items as XML elements. One suggestion is to use items as the element name to hold generic sequences of items.

Here is an example of this:

let $items := 
   <items>
      <banana/>
      <fruit type="carrot"/>
      <animal type='dog'/>
      <vehicle>car</vehicle>
   </items>

The other convention is to put all individual items in their own item element tags and to place each item on a separate line if the list of items gets long.

let $items := 
   <items>
      <item>banana</item>
      <item>
         <fruit type="carrot"/>
      </item>
      <item>
         <animal type='dog'/>
      </item>
      <item>
         <vehicle>car</vehicle>
      </item>
   </items>

The following FLOWR expression can then be used to display each of these items:

xquery version "1.0";
 
let $sequence :=
    <items>
       <item>banana</item>
       <item>
          <fruit type="carrot"/>
       </item>
       <item>
          <animal type='dog'/>
       </item>
       <item>
          <vehicle>car</vehicle>
       </item>
    </items>
 
 
return
   <results>{
      for $item in $sequence/item
      return
         <item>{$item}</item>
   }</results>

This will return the following XML

<results>
    <item>
        <item>banana</item>
    </item>
    <item>
        <item>
            <fruit type="carrot"/>
        </item>
    </item>
    <item>
        <item>
            <animal type="dog"/>
        </item>
    </item>
    <item>
        <item>
            <vehicle>car</vehicle>
        </item>
    </item>
</results>

Note that when the resulting XML is returned, only double quotes are present in the output.

Counting Items[edit]

You can count the number of items in a sequence by using the count function and adding /* to the end of the sequence path.

Common Sequence Functions[edit]

There are only a handful of functions you will need to use with sequences. We will review these functions and also show you how to create new functions using combinations of these functions.

Here are the three most common non-mathematical functions used with sequences. These three are the real workhorses of XQuery sequences. You can spend days writing XQueries and never need functions beyond these three functions.

  count($seq as item()*) - used to count the number of items in a sequence.  
   Returns a non-negative integer.
  distinct-values($seq as item()*) - used to remove duplicate items in a sequence.  
   Returns another sequence.
  subsequence($seq as item()*, $startingLoc as xs:double, $length as xs:double) - used to return only a subset of items in a sequence.
   Returns another sequence.

All of these functions have a datatype of item()* which is read zero or more items. Note that both the distinct-values() function and the subsequence() function both take in a sequence and return a sequence. This comes in very handy when you are creating recursive functions.

Along with count() are also a few sequence operators that calculate sums and average, min and max:

  sum($seq as item()*) - used to sum the values of numbers in a sequence
  avg($seq as item()*) - used to calculate the average (arithmetic mean) of numbers in a sequence
  min($seq as item()*) - used to find the minimum value of a sequence of numbers
  max($seq as item()*) - used to find the maximum value of a sequence of numbers


These functions are designed to work on numeric values of items and all return numeric values. You many want to use the number() function when working with strings of items.

You may find that you can perform many tasks just by learning these few XQuery functions. You can also create most other sequence operators from these functions.

Occasionally Used Sequence Functions[edit]

In addition there are some functions that are occasionally used:

  insert-before($seq as item()*, $position as xs:integer, $inserts as item()*) - for inserting
     new items anywhere in a sequence
  remove($seq as item()*, $position as xs:integer) - removes an item from a sequence
  reverse($seq as item()*) - reverses the order of items in a sequence
  index-of($seq as anyAtomicType()*, $target as anyAtomicType()) - returns a sequence of integers that
     indicate where an item is within a sequence (index counting starts at 1)

The following two functions can be used in conjunction with the bracketed predicate expressions '[]' which operates on an item's position information within a sequence.

  last() - when used in a predicate returns the last item in a sequence so (1,2,3)[last()] returns 3
  position() - this function is used to output the position in a FLWOR statement, so 
     for $x in ('a', 'b', 'c', 'd')[position() mod 2 eq 1] return $x returns ('a', 'c')

You can used these two functions together to remove the last item in a sequence by comparing the current position to the last:

  ('a', 'b', 'c')[position() ne (last())]

Which works the same as:

  reverse(remove(reverse($input-sequence), 1))

but may run faster since the list does not need to be reversed.

Example of Sum Function[edit]

Lets imagine that we have a basket of items and we want to count the total items in the basket.

let $basket :=
   <basket>
      <item>
         <department>produce</department>
         <type>apples</type>
         <count>2</count>
      </item>
      <item>
         <department>produce</department>
         <type>banana</type>
         <count>3</count>
      </item>
      <item>
         <department>produce</department>
         <type>pears</type>
         <count>5</count>
      </item>
      <item>
         <department>hardware</department>
         <type>nuts</type>
         <count>7</count>
      </item>
      <item>
         <department>packaged-goods</department>
         <type>nuts</type>
         <count>20</count>
      </item>
   </basket>

To sum the counts of each item we will need to use an XPath expression to get the item counts:

  $basket/item/count


We can then total this sequence and return the result:

return
   <total>
      {sum($basket/item/count)}
   </total>

Execute result is 37 - not 19

Tests on Sequences[edit]

You can also test to see if a sequence contains one or all of the items in another set. There are several methods to do this.

Finding if an Item is in a Sequence[edit]

Users find that XQuery is easy to use since it tries to do the right thing based on the data types you give it. XQuery checks if you have a sequence, a XML element or a single string and performs the most logical operation. This behavior keeps your code compact and easy to read. If you are comparing a element with a string XQuery will look inside the element and get the string for you so you do not explicitly need to tell XQuery to use the content of an element. If you are comparing a sequence of items with a string with the "=" operator, XQuery will look for that string in the sequence and return true() if the string is in the sequence. It just works!

For example if we have the sequence:

 let $sequence := ('a', 'b', 'c', 'd', 'e', 'f')

Now if we execute:

  return
  if ($sequence = 'd') then 
     true()
  else
     false()

Because it finds 'd' in the sequence of letters and will return true() and:

  if ($sequence = 'x') then
     true()
  else 
     false()

Will return false() because 'x' is not in the sequence.

  • You can use the index-of() function to see where the item appears in the sequence. If the item is in the sequence then it will return a non-zero. You can then return true() or false() if the item is not in the sequence.
  let $sequence := ('a', 'b', 'c', 'd', 'e', 'f')
  let $item := 'x'
  return 
     if (index-of($sequence, $item)) then 
        true() 
     else 
        false()

Recall that index-of() returns a 0 if the $item is not found in the $sequence.

  • You can use a "Quantified" expression.
  some $str in $sequence satisfies ($str = 'e')

Which will also return the the correct result. See the Wikibook article here: XQuery/Quantified_Expressions

Sorting Sequence[edit]

There is no "sort" function in XQuery. To sort your sequence you just create a new sequence that contains a FLWOR loop of your items with the order statement in it.

  • For example if you have a list of items with titles as one of the elements you can use the following to sort the items by title:
  let $sorted-items :=
     for $item in $items
     order by $item/title/text()
     return $item
  • You can return the items sorted by their element name :
  let $sorted-items :=
     for $item in $items
     order by name($item)
     return $item
  • You can also use descending with order by to reverse the order :
  for $item in $items
  order by name($item) descending
  return $item
  • If you want to sort with your own order by creating a seperate sequence and using the index-of function

to find where this item is in the sequence :

  for $i in /root/*
  let $order := ("b", "a", "c")
  let $name := name($i)
  order by index-of($order, $i)
  return $i

Set Operations: Concatenation, Unions, Intersections and Exclusions[edit]

XQuery also provides functions to join sets and to find items that are in both sets.

Assume that we have two sets that contain overlapping items:

  let $sequence-1 := ('a', 'b', 'c', 'd')
  let $sequence-2 := ('c', 'd', 'e', 'f')

Concatenation[edit]

You can concatenate the two sequences by doing the following

  let $both := ($sequence-1, $sequence-2)

or

  for $item in ( ($sequence-1, $sequence-2)) return $item

Which will return:

 a b c d c d e f

Union of sequence[edit]

You can also create a "union" set that removes duplicates for all items that are in both sets by using the distinct-values() function:

  distinct-values(($sequence-1, $sequence-2))

This will return the following:

  a b c d e f

Note that the "c d" pair is not repeated.

Intersection[edit]

You can now use a variation of this to find the intersection of all items in sequence-1 that are also in sequence-2:

  distinct-values($sequence-1[.=$sequence-2])

This will return only items that are in BOTH sequence-1 AND sequence-2:

  c d

The way you read this is "for each item in sequence-1, if this item (.) is also in sequence-2 then return it."

Exclusion[edit]

The last set operation you might want to do is the "exclusion" function, where we find all items in the first sequence that are NOT in the second sequence.

  distinct-values($sequence-1[not(.=$sequence-2)])

This will return

  a b

Returning Duplicates[edit]

The following example returns a list of all items that occur more than once in a sequences. This process is known as "duplicate detection."

Method 1: using distinct-values()[edit]

We can use the distinct-values() function on a sequence to find all the unique values in a sequence. We can then check to see if there are any items that occur twice.

xquery version "1.0";
 
let $seq := ('a', 'b', 'c', 'd', 'e', 'f', 'b', 'c')
let $distinct-value := distinct-values($seq)
 
(: for each distinct item if the count is greater than 1 then return it :)
let $duplicates :=
   for $item in $distinct-value
   return
      if (count($seq[.=$item]) > 1) then
         $item
      else 
         ()
 
return
   <results>
      <sequence>{string-join($seq, ', ')}</sequence>
      <distinct-values>{$distinct-value}</distinct-values>
      <duplicates>{$duplicates}</duplicates>
   </results>

This returns:

<results>
   <sequence>a, b, c, d, e, f, b, c</sequence>
   <distinct-values>a b c d e f</distinct-values>
   <duplicates>b c</duplicates>
</results>

You can also remove all duplicates just by moving the $item to the else() portion of the if statement and putting () in the then() portion of the else statement:

     if (count($seq[. = $item]) > 1) then 
        ()
     else
        $item

Method 2: Using index-of()[edit]

The following method uses the index-of() function to find the duplicates. This method does not run under eXist 2.0

let $values := (3,4,6,6 ,2,7, 3,1,2)
return $values[index-of($values, .)[2]]

Creating Sequences of Letters[edit]

You can use the codepoints functions to convert letters to numbers and numbers to letters. For example to generate a list of all the letters from a to z you can write the following XQuery:

let $number-for-a := string-to-codepoints('a')
let $number-for-z := string-to-codepoints('z')
for $letter in ($number-for-a to $number-for-z)
return
  codepoints-to-string($letter)

This will return a sequence of the following:

  ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z')


Execute

Creating Letter Collections[edit]

You can also use this to create a list of subcollections:

let $data-collection := '/db/apps/terms/data'
let $number-for-a := string-to-codepoints('a')
let $number-for-z := string-to-codepoints('z')
for $letter in ($number-for-a to $number-for-z)
  return
     xmldb:create-collection($data-collection, codepoints-to-string($letter) )

This process is very common way to store related files in subcollections.

Counting Items[edit]

It is very common to need to count your items as you go through them. You can do this by adding the "at $count" to your FLWOR loop:

for $item at $count in $sequence
return
  <item>
     <count>{$count}</count>
     {if ($count mod 2) then <odd/> else <even/>}
  </item>

Note that the modulo (divide by) function:

 ($count mod 2)

returns 1 for odd numbers, which gets converted to true(), and zero for even numbers, which gets converted to false. You can use this technique to make alternating rows of tables different colors.

Converting Sequences to a String[edit]

On of the most common things we do with a string is to convert it to a single string for display. And we frequently want to put a separator string between the values but not after the final value. XQuery includes a very handy function for this called string-join. The format is:

  string-join($input-sequence as item()*, $seperator-string as xs:string) as xs:string

For example the output of:

 let $sequence := ('a', 'b', 'c')
 return string-join($sequence, '--')

would be:

 a--b--c

Note that there is no "--" after the last string. The separator is only used between the items of a sequence.

Combining Sequence Operations[edit]

It is very common to need to "chain" sequence operations in a linear sequence of steps. For example if you wanted to sort a list of sequences and then select the first 10 items your query might look like the following:

(: this gets a list of names items from the input :)
let $input-sequence := doc('/db/apps/items-manager/data')//item/name/text()
 
let $sorted-items :=
  for $item in $input-sequence
    order by $item
    return $item
return
<ol>{
for $item at $count subsequence($sorted-items, 1, 10)
return
  <li>       (: this puts an even or odd class attribute in the li :)
     {$name }{if ($count mod 2) then attribute class {'odd'} else attribute class {'even'}}
  </li>
}</ol>

This technique can be used to paginate results for search results so that users see the first 10 results of a search. A control can then be used to get the next N items from the search result.