Ruby Programming/XML Processing/REXML
REXML is a XML processing API. As of Ruby 1.8, it is included in the Standard API.
REXML can read and write XML documents. Validation against a DTD or a schema is not yet fully implemented.
Strong XML parsing gems exist as well, such as nokogiri and the hpricot gems. They work faster than REXML typically.
Basics[edit | edit source]
DOM[edit | edit source]
Definitions[edit | edit source]
Using the DOM API, REXML can parse documents and build a tree containing the elements, attributes, and texts.
For example, this might be used to save a wikibook:
<wikibook title="Programming:Ruby"> <section title="Getting started"> <chapter title="Overview">Ruby is a programming language of the Perl and Python ilk; [...] </chapter> </section> </wikibook>
In this case, the
chapter is an element. It has an attribute
title with the value
Overview and a text with the value "Ruby is a programming language [...]".
section is also an element. It has an attribute, too, but no text. Instead, it has an element, the
In short, elements can have attributes, text and child elements.
Representation in REXML[edit | edit source]
When parsing a XML document, an instance of the
REXML::Document class is created. (The
new message of
REXML::Document just has to be fed with a
REXML::Document itself, or a
String, or an
IO.) This represents the whole document, including the
REXML::Document itself is a subclass of
REXML::Element, an important class.
When using DOM, instances of the
Element class are representing the elements of the XML document. They might have attributes, accessible using the
attributes message, text, and child elements.
Document is an
Element itself, but usually, you might be more interested in the root element of the XML document. As defined in the XML specification, any document has only one root element; it can be easily obtained calling
Once you have obtained the root
Element, you can go down the tree using the
elements message defined in
Element, which returns a collection of all child elements, or access attributes or texts, whatever you need.
The tree can be modified, too. In addition, the
to_s methods have been overridden to return the XML code of elements, attributes and text.
Element.to_s returns the XML code of the whole element, including attributes, text, and child elements' XML code. You can call that on the