XQuery/All Paths

From Wikibooks, open books for an open world
Jump to: navigation, search

Motivation[edit]

You want to generate a list of all unique path expressions to a document.

This process is very useful to quickly get familiar with a new data set. It is also important to make sure that your document-style transforms are accessing all the elements. This process can also be used as a basis for generating index files for a new data set.

Example Output[edit]

Paths the list of unique paths for a sample file from the Shakespeare Demos on the eXist demo system at /db/shakespeare/plays/hamlet.xml would generate the following results.

PLAY
PLAY/TITLE
PLAY/FM
PLAY/FM/P
PLAY/PERSONAE
PLAY/PERSONAE/TITLE
PLAY/PERSONAE/PERSONA
PLAY/PERSONAE/PGROUP
PLAY/PERSONAE/PGROUP/PERSONA
PLAY/PERSONAE/PGROUP/GRPDESCR
PLAY/SCNDESCR
PLAY/PLAYSUBT
PLAY/ACT
PLAY/ACT/TITLE
PLAY/ACT/SCENE
PLAY/ACT/SCENE/TITLE
PLAY/ACT/SCENE/STAGEDIR
PLAY/ACT/SCENE/SPEECH
PLAY/ACT/SCENE/SPEECH/SPEAKER
PLAY/ACT/SCENE/SPEECH/LINE
PLAY/ACT/SCENE/SPEECH/STAGEDIR
PLAY/ACT/SCENE/SPEECH/LINE/STAGEDIR

Note that these path expressions are sorted in document order, that is the order that the path first appeared in a document. So you can see that the cast list in the PERSONAE appear before the ACT/SCENE elements. The output can also be sorted in alphabetical order.

Method[edit]

We will use the functx libraries.

In particular the function:

 functx:distinct-element-paths($nodes)

takes as its input a node and returns a sequence of strings of the path expressions.

See Documentation on xqueryfunctions.com

distinct-element-paths function[edit]

xquery version "1.0";
declare namespace functx = "http://www.functx.com";
declare function functx:path-to-node($nodes as node()*) as xs:string* {
    $nodes/string-join(ancestor-or-self::*/name(.), '/')
};
 
declare function functx:distinct-element-paths($nodes as node()*) as xs:string* {
    distinct-values(functx:path-to-node($nodes/descendant-or-self::*))
 };
 
declare function functx:sort($seq as item()*) as item()* {
  for $item in $seq
  order by $item
  return $item
};
 
let $in-xml := collection("NAMEOFCOLLECTION")
 
return functx:sort(functx:distinct-element-paths($in-xml))

The heart of this query is the single expression:

  ancestor-or-self::*/name(.)

Which says in effect "get me the element names of all the nodes in the document". The next step is to turn this list into a list of distinct element paths. This is done by the function functx:distinct-element-paths()

Working with a single test document[edit]

use the document()

Working with a document collection[edit]

use collection() function

Creating a Web Service[edit]

Acknowledgments[edit]

David Elwell posted this suggestion on the open-exist list on July 22 of 2010