Ruby Programming/XML Processing/REXML
REXML is a XML processing API. As of Ruby 1.8, it is included in the Standard API.
REXML can read and write XML documents. Validation against a DTD or a schema is not yet fully implemented.
Strong XML parsing gems exist as well, such as nokogiri and the hpricot gems. They work faster than REXML typically.
Basics
[edit | edit source]DOM
[edit | edit source]Definitions
[edit | edit source]Using the DOM API, REXML can parse documents and build a tree containing the elements, attributes, and texts.
For example, this might be used to save a wikibook:
<wikibook title="Programming:Ruby">
<section title="Getting started">
<chapter title="Overview">Ruby is a programming language of the Perl and Python ilk; [...]
</chapter>
</section>
</wikibook>
In this case, the chapter
is an element. It has an attribute title
with the value Overview
and a text with the value "Ruby is a programming language [...]".
The section
is also an element. It has an attribute, too, but no text. Instead, it has an element, the chapter
element.
In short, elements can have attributes, text and child elements.
Representation in REXML
[edit | edit source]When parsing a XML document, an instance of the REXML::Document
class is created. (The new
message of REXML::Document
just has to be fed with a REXML::Document
itself, or a String
, or an IO
.) This represents the whole document, including the <?xml...?>
tag. REXML::Document
itself is a subclass of REXML::Element
, an important class.
When using DOM, instances of the Element
class are representing the elements of the XML document. They might have attributes, accessible using the attributes
message, text, and child elements.
The Document
is an Element
itself, but usually, you might be more interested in the root element of the XML document. As defined in the XML specification, any document has only one root element; it can be easily obtained calling REXML::Document.root()
.
Once you have obtained the root Element
, you can go down the tree using the elements
message defined in Element
, which returns a collection of all child elements, or access attributes or texts, whatever you need.
The tree can be modified, too. In addition, the to_s
methods have been overridden to return the XML code of elements, attributes and text. Element.to_s
returns the XML code of the whole element, including attributes, text, and child elements' XML code. You can call that on the Document
, too.
External links
[edit | edit source]Standard API Documentation at ruby-doc.org, including the rexml package