XQuery/All Paths
Contents |
Motivation [edit]
You want to generate a list of all unique path expressions to a document.
This process is very useful to quickly get familiar with a new data set. It is also important to make sure that your document-style transforms are accessing all the elements. This process can also be used as a basis for generating index files for a new data set.
Example Output [edit]
Paths the list of unique paths for a sample file from the Shakespeare Demos on the eXist demo system at /db/shakespeare/plays/hamlet.xml would generate the following results.
PLAY PLAY/TITLE PLAY/FM PLAY/FM/P PLAY/PERSONAE PLAY/PERSONAE/TITLE PLAY/PERSONAE/PERSONA PLAY/PERSONAE/PGROUP PLAY/PERSONAE/PGROUP/PERSONA PLAY/PERSONAE/PGROUP/GRPDESCR PLAY/SCNDESCR PLAY/PLAYSUBT PLAY/ACT PLAY/ACT/TITLE PLAY/ACT/SCENE PLAY/ACT/SCENE/TITLE PLAY/ACT/SCENE/STAGEDIR PLAY/ACT/SCENE/SPEECH PLAY/ACT/SCENE/SPEECH/SPEAKER PLAY/ACT/SCENE/SPEECH/LINE PLAY/ACT/SCENE/SPEECH/STAGEDIR PLAY/ACT/SCENE/SPEECH/LINE/STAGEDIR
Note that these path expressions are sorted in document order, that is the order that the path first appeared in a document. So you can see that the cast list in the PERSONAE appear before the ACT/SCENE elements. The output can also be sorted in alphabetical order.
Method [edit]
We will use the functx libraries.
In particular the function:
functx:distinct-element-paths($nodes)
takes as its input a node and returns a sequence of strings of the path expressions.
See Documentation on xqueryfunctions.com
distinct-element-paths function [edit]
xquery version "1.0";
declare namespace functx = "http://www.functx.com";
declare function functx:path-to-node($nodes as node()*) as xs:string* {
$nodes/string-join(ancestor-or-self::*/name(.), '/')
};
declare function functx:distinct-element-paths($nodes as node()*) as xs:string* {
distinct-values(functx:path-to-node($nodes/descendant-or-self::*))
};
declare function functx:sort($seq as item()*) as item()* {
for $item in $seq
order by $item
return $item
};
let $in-xml := collection("NAMEOFCOLLECTION")
return functx:sort(functx:distinct-element-paths($in-xml))
The heart of this query is the single expression:
ancestor-or-self::*/name(.)
Which says in effect "get me the element names of all the nodes in the document". The next step is to turn this list into a list of distinct element paths. This is done by the function functx:distinct-element-paths()
Working with a single test document [edit]
use the document()
Working with a document collection [edit]
use collection() function
Creating a Web Service [edit]
Acknowledgments [edit]
David Elwell posted this suggestion on the open-exist list on July 22 of 2010
This page may need to be