XQuery/Using Intermediate Documents

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Processing XML often involves the creation of intermediate XML fragments for subsequent processing. Here is an example of two approaches, one using multiple passes on the same data, the other a constructed intermediate view of the data.

MusicXML[edit | edit source]

MusicXML is an XML application for recording music scores. There is a range of software which produces and consumes MusicXML.


There are two styles of MusicXML with two related schemas, one in which measures are within parts (partwise), the other in which parts are within measures (timewise).

An example of a MusicXML partwise score is Mozart's Piano Sonata in A Major, K. 331

Here is a sample definition of a note:

<note>
   <pitch>
      <step>A</step>
      <octave>3</octave>
   </pitch>
   <duration>2</duration>
   <voice>3</voice>
   <type>eighth</type>
   <stem>down</stem>
   <staff>2</staff>
   <beam number="1">begin</beam>
   <notations>
      <slur type="stop" number="1"/>
   </notations>
</note>

Notes Range[edit | edit source]

The Recordare site has some sample code to demonstrate the use of XQuery to process MusicXML [1]. The first script finds the lowest and highest notes in the score. The script shown on the site is not conformant to the current XQuery standard, but a few minor changes brings it up-to-date.

declare function local:MidiNote($thispitch as element(pitch) ) as xs:integer
{
  let $step := $thispitch/step
  let $alter :=
    if (empty($thispitch/alter)) then 0
    else xs:integer($thispitch/alter)
  let $octave := xs:integer($thispitch/octave)
  let $pitchstep :=
    if ($step = "C") then 0
    else if ($step = "D") then 2
    else if ($step = "E") then 4
    else if ($step = "F") then 5
    else if ($step = "G") then 7
    else if ($step = "A") then 9
    else if ($step = "B") then 11
    else 0
  return 12 * ($octave + 1) + $pitchstep + $alter
} ;

let $doc := doc("/db/Wiki/Music/examples/MozartPianoSonata.xml")
let $part := $doc//part[./@id = "P1"]

let $highnote := max(for $pitch in $part//pitch return local:MidiNote($pitch))
let $lownote := min(for $pitch in $part//pitch return local:MidiNote($pitch)) 

let $highpitch := $part//pitch[local:MidiNote(.) = $highnote]
let $lowpitch := $part//pitch[local:MidiNote(.) = $lownote]

let $highmeas := string($highpitch[1]/../../@number)
let $lowmeas := string($lowpitch[1]/../../@number) 

return
  <result>
  <low-note>{$lowpitch[1]}
    <measure>{$lowmeas}</measure>
  </low-note>
  <high-note>{$highpitch[1]}
    <measure>{$highmeas}</measure>
  </high-note>
  </result>

With output:

<result>
    <low-note>
        <pitch>
            <step>D</step>
            <octave>2</octave>
        </pitch>
        <measure>3</measure>
    </low-note>

    <high-note>
        <pitch>
            <step>E</step>
            <octave>6</octave>
        </pitch>
        <measure>5</measure>
    </high-note>

</result>

execute

Ancestor access[edit | edit source]

The path to the measure in which a note is located

let $highmeas := string($highpitch[1]/../../@number)

uses a fixed set of steps back up the hierarchy. This limits the application of this script to one type of MusicXML schema because the position of the measure in the hierarchy is different in the two schemas. When the script was written, the ancestor axis was not supported but it is now, so those lines are more generally expressible as:

let $highmeas := string($highpitch/ancestor::measure/@number)

Note-to-midi[edit | edit source]

The function to convert notes to midi numbers uses nested if-then-else expressions. XQuery lacks a switch expression which might be used but a clearer approach would be to use a lookup-table, defined either locally in the script or stored in the database.

Here a sequence of notes is created as a look-up table. This is bound to a global variable which is used in a revised note-to-midi function:

declare variable  $NOTESTEP := 
(
   <note name="C" stepNo="0"/>,
   <note name="D" stepNo="2"/>,
   <note name="E" stepNo="4"/>,
   <note name="F" stepNo="5"/>,
   <note name="G" stepNo="7"/>,
   <note name="A" stepNo="9"/>,
   <note name="B" stepNo="11"/>
);

declare function local:MidiNote($thispitch as element(pitch) ) as xs:integer
{
  let $alter := xs:integer(($thispitch/alter,0)[1])
  let $octave := xs:integer($thispitch/octave)
  let $pitchstepNo := xs:integer($NOTESTEP[@name = $thispitch/step]/@stepNo)
  return 12 * ($octave + 1) + $pitchstepNo + $alter
} ;

Intermediate XML[edit | edit source]

The original script required repeated access to the original MusicXML source. An alternative approach would be to create an intermediate structure to hold the midi notes and use this in subsequent analysis. This structure is a computed view of the original notes augmented with derived data - the midi note and the measure.

 
let $midiNotes :=
       for $pitch in  $part//pitch
       return 
             <pitch>
                     {$pitch/*}
                     <midi>{local:MidiNote($pitch)}</midi>
                     <measure>{string($pitch/../../@number)}</measure>
             </pitch>

and this view is then used to locate the high and low notes and their position in the score:

let $highnote := max($midiNotes/midi)
let $lownote  := min($midiNotes/midi) 

let $highpitch := $midiNotes[midi = $highnote]
let $lowpitch := $midiNotes[midi = $lownote]

Revised script[edit | edit source]

declare variable  $NOTESTEP := 
(
   <note name="C" step="0"/>,
   <note name="D"  step="2"/>,
   <note name="E" step="4"/>,
   <note name="F" step="5"/>,
   <note name="G" step="7"/>,
   <note name="A" step="9"/>,
   <note name="B" step="11"/>
);

declare function local:MidiNote($thispitch as element(pitch) ) as xs:integer
{
  let $alter := xs:integer(($thispitch/alter,0)[1])
  let $octave := xs:integer($thispitch/octave)
  let $name := $thispitch/step
  let $pitchstep := xs:integer($NOTESTEP[@name = $name]/@step)
  return 12 * ($octave + 1) + $pitchstep + $alter
} ;

let $doc := doc("/db/Wiki/Music/examples/MozartPianoSonata.xml")
let $part := $doc//part[./@id = "P1"]
 
let $midiNotes :=
       for $pitch in  $part//pitch
        return 
             <pitch>
                     {$pitch/*}
                      <midi>{local:MidiNote($pitch)}</midi>
                       <measure>{string($pitch/ancestor::measure/@number)}</measure>
              </pitch>
   
let $highnote := max($midiNotes/midi)
let $lownote  := min($midiNotes/midi) 
 
return
<result>
  <low-note>
      {$midiNotes[midi = $lownote]}
  </low-note>
  <high-note>
     { $midiNotes[midi = $highnote]}
  </high-note>
</result>

execute

Discussion[edit | edit source]

Although arguably a cleaner, more direct design, the second script relies on the construction of temporary XML nodes which are then the subject of XPath expressions. These temporary XML nodes are handled differently in different implementations. In older versions of eXist each is written to a temporary document in the database which creates a performance overhead and problems of garbage collection. In the 1.3 release, intermediate XML nodes remain in memory, resulting in a major performance improvement.

There is, however, another problem with this approach. The size of the intermediate node may exceed pre-set, but configurable, limits on the size of constructed nodes.