XQuery/Parsing CSV

From Wikibooks, open books for an open world
< XQuery
Jump to: navigation, search

Motivation[edit]

You want to parse a file of comma-separated values (CSV) text into an xml structure or you have a flat file with very similar line/field structure that you want to convert into XML format.

Method[edit]

We will use the tokenize($input, '\n') function to break the input file into separate lines. We will then use the tokenize($input, ',') function to parse each line into separate fields.

Basic Example[edit]

xquery version "1.0";
 
let $title := 'XQuery CSV Parser'
 
let $csv-input-sample :=
'John,Smith,x123
Peg,Jones,x456
Sue,Adams,x789
Dan,McCoy,x321'
 
let $lines := tokenize($csv-input-sample, '\n')
 
return
<results>{
  for $line in $lines
  let $fields := tokenize($line, ',')
  return
  <row>{
    for $field in $fields
    return
      <field>{$field}</field>
  }</row>
}</results>

Sample Output[edit]

<results>
   <row>
      <field>John</field>
      <field>Smith</field>
      <field>x123</field>
   </row>
   <row>
      <field>Peg</field>
      <field>Jones</field>
      <field>x456</field>
   </row>
   <row>
      <field>Sue</field>
      <field>Adams</field>
      <field>x789</field>
   </row>
   <row>
      <field>Dan</field>
      <field>McCoy</field>
      <field>x321</field>
   </row>
</results>

Example With Data Dictionary in Row 1[edit]

This second example will use the first row of the CSV file as a data dictionary of the element names for each column of each row.

let $csv :=
'name,faculty
alice,anthropology
bob,biology'
 
let $lines := tokenize($csv, '\n')
let $head := tokenize($lines[1], ',')
let $body := remove($lines, 1)
return
    <people>
        {
            for $line in $body
            let $fields := tokenize($line, ',')
            return
                <person>
                    {
                        for $key at $pos in $head
                        let $value := $fields[$pos]
                        return
                            element { $key } { $value }
                    }
                </person>
        }
    </people>

We will receive below result (tested with eXide in eXist 2.0):

<people>
  <person>
    <name>alice</name>
    <faculty>anthropology</faculty>
  </person>
  <person>
    <name>bob</name>
    <faculty>biology</faculty>
  </person>
</people>

Beware though, that this requires the very strict input format proposed. In practice CSV can look quite different.

Adding Configuration File Options[edit]

Many times you have a family of CSV files that all may have very similar options for import. In this case it is useful to be able to pass a series of configuration parameter to a single XQuery function. These configuration parameters include:

  1. What the field delimiter is (comma is the defult"
  2. The element name of the root node
  3. The element name of each line or row
  1. If each field could be wrapped in quotes
<file-import-config>
   <field-delimiter>,<field-delimiter>
   <root-element-name>people</root-element-name>
   <line-element-name>person</line-element-name>
</file-import-config>

You can then pass this configuration file to your XQuery function:

  import-lib:flat-file-import($input-file, $config)

References[edit]

Zorba CSV importer