XQuery/Lorum Ipsum text

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Motivation[edit | edit source]

You want to create realistically-sized example XML for testing or demonstration. Lorum impsum text is often used to fill out the contents and it would be useful to add this text wherever needed in an XML file.

We explore two approaches, one based on modifying the text and the other modifying the XML.

Approach 1 : string replacement[edit | edit source]

The places in the incomplete XML file where lorum ipsum text is to be placed is marked with ellipsis "...". The XML file is read, serialised to a string, split into parts, and the parts re-assembled adding a randomly chosen section of the lorum ipsum text in place of the ellipsis. The string is then turned back into XML for output. The base lorum ipsum text is stored as an XML file:

http://www.cems.uwe.ac.uk/xmlwiki/apps/lorumipsum/words.xml

Concepts used[edit | edit source]

  • XML <> string conversion : The script uses a pair of functions from the exist util module (util:serialize and util:parse) to convert back and forth between XML and a string. This allows the XML text to be operated on as a simple string before being converted back to XML
  • recursion : interpolating the random text into the original string requires a recursive function
  • regular expressions: reg exps are used to tokenise the lorum ipsum text and the incomplete XML file containing ellipsis

XQuery[edit | edit source]

declare function local:join-random($parts,$words) {
if (count($parts) > 1)
then 
 let $randomtext :=string-join(subsequence ($words,util:random(100), util:random(100))," ")
 return string-join(($parts[1],$randomtext, local:join-random(subsequence($parts,2), $words)),"")
else $parts
};

let $lorumipsum := doc("/db/Wiki/apps/lorumipsum/words.xml")/lorumipsum
let $words := tokenize($lorumipsum,"\s+")
let $file := request:get-parameter("file",())
let $doc := doc($file)/*
let $docText := util:serialize($doc,"media-type=text/xml method=xml")
let $parts := tokenize($docText, "\.\.\.")
let $completedText := local:join-random($parts,$words)
return util:parse($completedText)

Example[edit | edit source]

Explanation[edit | edit source]

  • the lorum ipsum text is split into words by tokenising on whitespace
  • the incomplete XML is fetched and the root element accessed.
  • this element is converted to a string using the util:serialize function, then tokenized with the pattern "\.\.\.\" (not "..." since . means any single character in regular expressions)
  • the recursive function join-random() joins the first of a sequence of strings with a random stretch of the lorum ipsum text with the remainder of the strings similarly joined
  • the expanded text is converted back to an XML element using util:parse()

Improvements[edit | edit source]

  • the lorum ipsum text itself could be generated rather than stored.
  • the script could be parameterized for the lorum impsum file, allowing different, perhaps more realistic text to be used.
  • the lorum ipsum words are passed as a parameter to the recursive function. This could be defined in a global variable instead.
  • It would be better to use the httpclient module to fetch the files and control the caching via headers - here the file is being cached

Approach 2 - XML replacement[edit | edit source]

The choice of ellipsis as marker is problematic if this is to appear in the text. The conversion into text and back to XML is an overhead.

An alternative approach would be to use an XML element, for example <ipsum/> to mark the places where ipsum lorum text is to appear and replace every occurrence with a random word. The replacement of a specific element anywhere in the XML tree can be accomplished by modifying the identify transformation discussed in XQuery/Filtering_Nodes.

Concepts[edit | edit source]

  • recursion - to copy an arbitrary XML tree, replacing a given element with random text.

XQuery[edit | edit source]

declare variable $lorumipsum := doc("/db/Wiki/apps/lorumipsum/words.xml")/lorumipsum;
declare variable $words := tokenize($lorumipsum,"\s+");
declare variable $marker:= "ipsum";

declare function local:copy-with-random($element as element()) as element() {
   element {node-name($element)}
      {$element/@*,
          for $child in $element/node()
          return
               if ($child instance of element())
               then
                  if (name($child) =  $marker)
                  then subsequence($words,util:random(100),util:random(100))
                  else local:copy-with-random($child)
              else $child
      }
};

let $file := request:get-parameter("file",())
let $root := doc($file)/*
return
    local:copy-with-random($root)

Explanation[edit | edit source]

  • the sequence of ipsum lorum words are held in a global variable to avoid passing it as a parameter to the recursive function.
  • The copy-with-random() function recursively copies the elements and items in a tree to a new tree
  • When the element with the name "ipsum" is encountered, a selection of ipsum lorem text is returned instead of the original element.

Example[edit | edit source]

Discussion[edit | edit source]

The second approach is simpler. Performance is about the same.

Acknowledgements[edit | edit source]

  • the sample XML is an extract from "Search: The Graphics Web Guide", Ken Coupland,Laurence King Publishing (2002)