XQuery/Typeswitch Transformations

From Wikibooks, open books for an open world
Jump to: navigation, search

Motivation[edit]

You have an XML document that you want to transform into a different format of XML. You want to control and customize the transformation process, and you want a modular way to store the transformation rules so that you or others can easily modify and maintain them.

Background on using XQuery vs. XSLT for Document Transformation[edit]

You may have heard the conventional wisdom that "XQuery is best for querying or selecting XML, and XSLT is best for transforming it." In reality, both methods are capable of transforming XML. Despite XSLT's somewhat longer history and larger install base, the "XQuery typeswitch" method of transforming XML provides numerous advantages. These are covered in more detail in XQuery Benefits.

Method[edit]

We will use XQuery's typeswitch expression to transform an XML document from one form into another. The basic approach is simple and straightforward: For each XML node in the input document, we will specify what should be created in the output document. The typeswitch expression performs this core function of identifying what happens to each node in the source document. We will write an XQuery function that takes a node, tests it using a typeswitch expression, and dispatches that node to the appropriate handler function, which transforms the node into the new format and sends any child elements back to the main function using the passthru function. This recursive routine effectively crawls through an entire node and its children, transforming them into the target format. Once the structure has been set up, the transform is easy to modify, even if there is very complex nesting of the tags within the input document. (The tail recursion technique will be familiar to discerning users of XSLT, but there is absolutely no XSLT prerequisite for this article.)

Example Data[edit]

Suppose you have a simple XML document that you would like to transform:

Sample Input Document[edit]

<bill>
  <!-- This is a XML comment -->
  <btitle>This is the Bill title</btitle>
  <section-id>1</section-id>
  <bill-text>This is the text with <strike>many</strike> examples.</bill-text>
</bill>

Sample Output Document[edit]

Here is the format that you would like to turn the source input into:

<Bill>
  <!-- This is a XML comment -->
  <BillTitleText>This is the Bill title</BillTitleText>
  <BillSectionID>1</BillSectionID>
  <BillText>This is the text with <del>many</del> examples.</BillText>
</Bill>

Options for Typeswitch Transforms[edit]

There are two important options when you are creating a typeswitch transform. One choice is if you are using a single node() parameter or if you are using a sequence of nodes as your parameter.

The second important option is what you want your default action to be. The default can be configured to pass-thru or remove all elements that are not matched by your typeswitch statement.

Example Transformation With Typeswitch[edit]

The most effective way to use the typeswitch expression to transform XML is to create a series of XQuery functions. In this way, we can cleanly separate the major actions of the transformation into modular functions. (In fact, the library of functions can be saved into an XQuery library module, which can then be reused by other XQueries.) The "magic" of this typeswitch-style transformation is that once you understand the basic pattern and structure of the functions, you can adapt them to your own data. You'll find that the structure is so modular and straightforward that it's even possible to teach others the basics of the pattern in a short period of time and empower them to maintain and update the transformation rules themselves.

The first function in our module is where the typeswitch expression is located. This function is conventionally called the "dispatch" function:

declare function local:dispatch($node as node()) as item()* {
    typeswitch($node)
        case text() return $node
        case comment() return $node
        case element(bill) return local:bill($node)
        case element(btitle) return local:btitle($node)
        case element(section-id) return local:section-id($node)
        case element(bill-text) return local:bill-text($node)
        case element(strike) return local:strike($node)
        default return local:passthru($node)
};

Notice that the typeswitch expression tests the input node against a list of criteria: is the node a text node, a comment node, a bill element, or a btitle element, or a section-id element, etc? If it's a text node (e.g. "This is the Bill title"), we simply return the text, unmodified. (Note that the text() node test comes first since text() is likely to be the single most plentiful node type in a text-rich document, and placing the most common type first improves performance.) If instead the node is a bill element, then we pass the node to the aptly-named local:bill() function for bill-specific handling. The local:bill() function (see below) turns the <bill> element into a <Bill> element. It then passes the contents of the bill element to the local:passthru() function. If our node doesn't match any of the pre-defined rules, then the typeswitch expressions resorts to the required final "default" (think: "fallback") statement; this default is used for all nodes that don't match any of the preceding tests. In our example, the default expression sends nodes without matches to the local:passthru() function. (Typeswitch isn't limited to matching text() and element() nodes; it can also match other the node types: processor-instruction() and comment(), but not typically attribute(). Attributes are conventionally dealt with inside the handler function of the attribute's parent element, rather than in the core typeswitch function.)

The Passthru Function[edit]

The passthru() function recurses through a given node's children, handing each of them back to the main typeswitch operation.

declare function local:passthru($nodes as node()*) as item()* {
    for $node in $nodes/node() return local:dispatch($node)
};

(*Note: This is such a simple function that it may appear extraneous. Why not simply replace instances of local:passthru($node) with local:dispatch($node/node())? Its primary benefit is that it simplifies the code, relieving you of the burden of typing an extra "/node()" for each recursion. A secondary benefit is that it introduces the possibility of filtering a node before it is sent to the typeswitch routine.)

The Alternative Passthru Function[edit]

The above local:passthru() function will remove all attributes from your nodes. If you have attributes in your input XML which you would like to retain, use the following passthru() function as an alternative.

declare function local:passthru($nodes as node()*) as item()* {
    element {name($node)} {($node/@*, dispatch($node/node())}
};

Functions to Handle Each Element[edit]

declare function local:bill($node as element(bill)) as element() {
    <Bill>{local:passthru($node)}</Bill>
};
declare function local:btitle($node as element(btitle)) as element() {
    <BillTitle>{local:passthru($node)}</BillTitle>
};
declare function local:section-id($node as element(section-id)) as element() {
    <BillSectonID>{local:passthru($node)}</BillSectonID>
};
declare function local:strike($node as element(strike)) as element() {
    <del>{local:passthru($node)}</del>
declare function local:bill-text($node as element(bill-text)) as element() {
    <BillText>{local:passthru($node)}</BillText>
};

Execute the transformation[edit]

We can now write a query that takes the source XML and uses the local:dispatch() function to transform the input into the target format:

let $input :=
  <bill>
    <!-- This is a XML comment --> 
    <btitle>This is the Bill title</btitle>
    <section-id>1</section-id>
    <bill-text>This is the text with <strike>many</strike> examples.</bill-text>
  </bill>
return 
  local:dispatch($input)

Execute

Compact approach[edit]

While the above approach is recommended as the most modular, extensible approach, it is perfectly acceptable to express the same transformation using a more compact, self-contained function:

declare function local:transform($nodes as node()*) as item()* {
    for $node in $nodes
    return 
        typeswitch($node)
            case text() return $node
            case comment() return $node
            case element(bill) return element Bill {local:transform($node/node())}
            case element(btitle) return element BillTitle {local:transform($node/node())}
            case element(section-id) return element BillSectonID {local:transform($node/node())}
            case element(strike) return element del {local:transform($node/node())}
            case element(bill-text) return element BillText {local:transform($node/node())}
            default return local:transform($node/node())
};

Besides the fact that this function is entirely self-contained (beginning with a FLWOR expression and using $node/node() to recurse through child nodes), notice that the function uses computed element constructors to accomplish the transformation.

Conclusion[edit]

This is the heart of the XQuery Typeswitch approach to XML document transformation. On the basis of this simple pattern, entire libraries have been written to transform source formats like TEI, DocBook, and Office OpenXML documents into other formats like XHTML, XSL-FO, and each other.

While we can create typeswitch modules by hand, building them up element by element, we can also use XQuery to generate a skeleton typeswitch module; see this article's companion article, XQuery/Generating_Skeleton_Typeswitch_Transformation_Modules. In addition to the "skeleton generator", this article also provides examples of more complex transformation patterns with XQuery typeswitch: changing an element's name, ignoring an element, transforming differently based on the context of the element, reordering elements. It also provides a detailed comparison of XQuery and XSLT's approaches to the same example transformation, so it is useful for readers coming from the world of XSLT.

References[edit]