Ruby Programming/XML Processing/REXML

From Wikibooks, open books for an open world
Jump to navigation Jump to search

REXML is a XML processing API. As of Ruby 1.8, it is included in the Standard API.

REXML can read and write XML documents. Validation against a DTD or a schema is not yet fully implemented.

Strong XML parsing gems exist as well, such as nokogiri and the hpricot gems. They work faster than REXML typically.

Basics[edit | edit source]

DOM[edit | edit source]

Definitions[edit | edit source]

Using the DOM API, REXML can parse documents and build a tree containing the elements, attributes, and texts.

For example, this might be used to save a wikibook:

 <wikibook title="Programming:Ruby">
   <section title="Getting started">
     <chapter title="Overview">Ruby is a programming language of the Perl and Python ilk; [...]
     </chapter>
   </section>
 </wikibook>

In this case, the chapter is an element. It has an attribute title with the value Overview and a text with the value "Ruby is a programming language [...]".

The section is also an element. It has an attribute, too, but no text. Instead, it has an element, the chapter element.

In short, elements can have attributes, text and child elements.

Representation in REXML[edit | edit source]

When parsing a XML document, an instance of the REXML::Document class is created. (The new message of REXML::Document just has to be fed with a REXML::Document itself, or a String, or an IO.) This represents the whole document, including the <?xml...?> tag. REXML::Document itself is a subclass of REXML::Element, an important class.

When using DOM, instances of the Element class are representing the elements of the XML document. They might have attributes, accessible using the attributes message, text, and child elements.

The Document is an Element itself, but usually, you might be more interested in the root element of the XML document. As defined in the XML specification, any document has only one root element; it can be easily obtained calling REXML::Document.root().

Once you have obtained the root Element, you can go down the tree using the elements message defined in Element, which returns a collection of all child elements, or access attributes or texts, whatever you need.

The tree can be modified, too. In addition, the to_s methods have been overridden to return the XML code of elements, attributes and text. Element.to_s returns the XML code of the whole element, including attributes, text, and child elements' XML code. You can call that on the Document, too.

External links[edit | edit source]

Standard API Documentation at ruby-doc.org, including the rexml package