XQuery/All Leaf Paths

From Wikibooks, open books for an open world
Jump to: navigation, search

Motivation[edit]

You want to generate a list of all leaf paths in a document or document collection.

This process is very useful to get to know a new data set. Specifically you will find that the leaf elements in an XML file carry much of the data in a data-style markup. These leaf elements frequently are used to carry the most semantics or meaning within the document. They for the basis for a semantic inventory of the document. That is each leaf element should be able to be associated with a data definition.

Leaf elements are also good targets for indexing within your index configuration file.

Example[edit]

Method[edit]

We will use the functx leaf-elements() function

  functx:leaf-elements($nodes*) xs:string*

This function takes as input, one or more nodes and returns an array of strings.

Example Output[edit]

For the demo play Hamlet that is included in the eXist demo set the file /db/shakespeare/plays/hamlet.xml will generate the following output:

PLAY
TITLE
FM
P
PERSONAE
PERSONA
PGROUP
GRPDESCR
SCNDESCR
PLAYSUBT
ACT
SCENE
STAGEDIR
SPEECH
SPEAKER
LINE

Source Code to leaf-elements[edit]

declare namespace functx = "http://www.functx.com"; 
declare function functx:leaf-elements ($root as node()?) as element()* {
   $root/descendant-or-self::*[not(*)]
};

This query uses the descendant-or-self::* function with the predicate [not(*)] to qualify only elements that do not have child nodes.

Example XQuery[edit]

xquery version "1.0";
declare namespace functx = "http://www.functx.com";
declare function functx:distinct-element-names($nodes as node()*) as xs:string* {
   distinct-values($nodes/descendant-or-self::*/local-name(.))
};
 
let $doc := doc('/db/shakespeare/plays/hamlet.xml')
 
let $distinct-element-names := functx:distinct-element-names($doc)
 
let $distinct-element-names-count := count($distinct-element-names)
 
return
<ol>{
  for $distinct-element-name in $distinct-element-names
  order by $distinct-element-name
  return
      <li>{$distinct-element-name}</li>
}</ol>

Adding Attributes[edit]

You can also run a query that will get all the distinct attributes. Attributes are all considered leaf data types since they can never have child elements.

declare function functx:distinct-attribute-names($nodes as node()*)  as xs:string* {
   distinct-values($nodes//@*/name(.))
};

This query says in effect to "get all the all the distinct attribute names in the input nodes".

For the MODS demo file: doc('/db/mods/01c73f2b05650de2e6124d9d113f40be.xml')

You will get the following attributes:

  1. type
  2. encoding
  3. authority

</source>

References[edit]

Documentation on xqueryfunctions.com web site.