XML - Managing Data Exchange/Print version

From Wikibooks, open books for an open world
< XML - Managing Data Exchange
Jump to: navigation, search

Note: current version of this book can be found at http://en.wikibooks.org/wiki/XML:_Managing_Data_Exchange


Learning Objectives
  • define the purpose of SGML, HTML, and XML


There are four central problems in data management: capture, storage, retrieval, and exchange of data. The purpose of this book is to address XML, a technology for managing data exchange. The foundational XML chapters in this book are structured by a 'data model' approach. The first chapter introduces the reader to the XML document, XML schema, and XML stylesheet with a single entity example. Subsequent chapters expand upon the XML basics with multiple-entity examples and a one-to-one relationship, a one-to-many relationship, or a many-to-many relationship.

XML is a tool used for data exchange. Data exchange has long been an issue in information technology, but the Internet has elevated its importance. Electronic data interchange (EDI), the traditional data exchange standard for large organizations, is giving way to XML, which is likely to become the data exchange standard for all organizations, irrespective of size.

EDI supports the electronic exchange of standard business documents and is currently the major data format for electronic commerce. A structured format is used to exchange common business documents (e.g., invoices and shipping orders) between trading partners. In contrast to the free form of e-mail messages, EDI supports the exchange of repetitive, routine business transactions. Standards mean that routine electronic transactions can be concise and precise. The main standard used in the United States and Canada is known as X.12, and the major international standard is UN/EDIFACT. Firms adhering to the same standard can share data electronically.

The Internet is a global network potentially accessible by nearly every firm, with communication costs typically less than those of traditional EDI. Consequently, the Internet has become the electronic transport path of choice between trading partners. The simplest approach is to use the Internet as a means of transporting EDI documents. But because EDI was developed in the 1960s, another approach is to reexamine the technology of data exchange. A result of this rethinking is XML, but before considering XML we need to learn about SGML, the parent of XML.

SGML[edit]

For a typical U.S. firm, it is estimated that document management consumes up to 15 percent of its revenue, nearly 25 percent of its labour costs, and anywhere between 10 and 60 percent of an office worker’s time. The Standard Generalized Markup Language (SGML) is designed to reduce the cost and increase the efficiency of document management.

A markup language embeds information about a document within the document's text. In the following example, the markup tags indicate that the text contains details of a city. Note also that the city's name, state, and population are identified by specific tags. Thus, the reader—a person or a computer—is left in no doubt as to meaning of Athens, Georgia, or 100,000. Note also the latitude and location of the city are explicitly identified with appropriate tags. SGML’s usefulness is based upon both recording text and the meaning of that text.

Exhibit 1: Markup language

<city> 
       <cityname>Athens</cityname> 
       <state>GA</state>
       <description> Home of the University of Georgia</description>
       <population>100,000</population>
       <location>Located about 60 miles Northeast of Atlanta</location>
       <latitude>33 57' 39" N</latitude>
       <longitude>83 22' 42" W</longitude>
</city>

SGML is a vendor-independent International Standard (ISO 8879) that defines the structure of documents. Developed in 1986 as a meta language, SGML is the parent of both HTML and XML. Because SGML documents are standard text files, SGML provides cross-system portability. When technology is rapidly changing, SGML provides a stable platform for managing data exchange. Furthermore, SGML files can be transformed for publication in a variety of media. The use of SGML preserves textual information independent of how and when it is presented. Organizations reap long-term benefits when they can store documents in a single, independent standard that can then be converted for display in any desired media.

SGML has three major advantages for data management:

  • Reuse: Information can be created once and reused many times.
  • Flexibility: SGML documents can be published in any format. The same content can be printed, presented on the Web, or delivered with a text synthesis. Because SGML is content-oriented, presentation decisions can be delayed until the output format is decided.
  • Revision: SGML supports revision and version control. With content version control, a firm can readily track the changes in documents.

A short section of SGML demonstrates clearly the features and strength of SGML (see Exhibit 2). The tags surrounding a chunk of text describe its meaning and thus support presentation and retrieval. For example, the pair of tags <airline> and </airline> surrounding “Delta” identify the airline making the flight.

Exhibit 2: SGML example

   <flight>
       <airline>Delta</airline>
       <flightno>22</flightno>
       <origin>Atlanta</origin>
       <destination>Paris</destination>
       <departure>5:40pm</departure>
       <arrival>8:10am</arrival>
   </flight>

The preceding SGML code can be presented in several ways by applying a style sheet to the file. For example, it might appear as

Delta flight 22 flies from Atlanta to Paris leaving 5:40pm and arriving 8:10am

or as

Airline Flight Origin Destination Departure Arrival
Delta 22 Atlanta Paris 5:40pm 8:10am


If the data are stored in HTML format and rendered on a Web site (as in Exhibit 3), then the meaning of the data has to be inferred by the reader. This is generally quite easy for humans, but impossible for machines. Furthermore, the presentation format is fixed and can only be altered by rewriting the HTML. If you are not familiar with HTML, you should read the WikiBooks chapter on XHTML, an extension of HTML, before reading the next chapter.

Exhibit 3: HTML rendering example

    Delta flight 22 flies from Atlanta to Paris leaving 5:40pm and arriving 8:10am

Meaning and presentation should be independent, and this is an important reason why SGML is more powerful than HTML.

SGML is a markup language that defines the structure of documents and is preferred to HTML as it can be transformed into a variety of media.

XML[edit]

Many computer systems contain data in incompatible formats. A time-consuming challenge is to exchange data between such systems. XML is a generic data storage format that comes bundled with a number of tools and technologies that should make it easier to exchange specific XML 'applications' between incompatible systems. Since XML is open and generic, it is expected that as time progresses, more and more organizations and people will jump onto the XML bandwagon, both developers and data users. This should make XML the ultimate viable technology for certain types of data exchange.

XML is used not only for exchanging information, but also for publishing Web pages. XML's very strict syntax allows for smaller and faster Web browsers and as such is well suited for use with Personal Digital Assistants (PDAs) and cellphones. Web browsers that interpret HTML documents, on the other hand, are bloated with programming code to compensate for HTML’s not so strict coding.

The types of data generally well suited for encoding as XML are those where field lengths are unknown and unpredictable and where field contents are predominantly textual.

An XML schema allows for the exchange of information in a standardized structure. A schema defines custom markup tags that can contain attributes to describe the content that is enclosed by these tags. Information from the tagged data in the XML document can be extracted using an application called a “parser”, and with the use of an XML stylesheet the data can be formatted for a Web page.

XML's power lies in the combination of custom markup tags and content in a defined XML document. The purpose of eXtensible Markup Language (XML) is to make information self-describing. Based on SGML, XML is designed to support electronic commerce. The definition of XML, completed in early 1998 by the World Wide Web Consortium (W3C), describes it as a meta language — a language to generate languages. XML should steadily replace HTML on many Web sites because of some key advantages. The major differences between XML and HTML are captured in the following table.

Exhibit 4: XML vs HTML

XML HTML
Information content Information presentation
Extensible set of tags Fixed set of tags
Data exchange language Data presentation language
Greater hypertext linking Limited hypertext linking


The eXtensible in XML means that a new data exchange language can be created by defining its structure and tags. For example, the OpenGIS Consortium designed a Geography Markup Language (GML) to facilitate the electronic exchange of geographic information. Similarly, the Open Tourism Consortium is working on the definition of TourML to support exchange of tourism information. The insurance industry uses data corresponding to the XML based standard ACORD for electronic data exchange. Another good example of XML in action is NewsML™.

In this text we will cover all the features of XML, but at this point let us introduce a few of the key features.


Applications of XML:

Before we start learning more about how an XML document is structured, let us point out what XML can be used for. The four major implementations of XML are:

Publication: Database content can be converted into XML and afterwards into HTML by using an XSLT stylesheet. Making use of this technique, complex websites as well as print media like PDF files can be generated. Information no longer has to be stored in different formats (i.e. RTF, DOC, PDF, HTML). Content can be stored in the neutral XML format and then, using appropriate layout style sheets and transformations, brochures, websites, or datalists can be generated (See more in Chapter 17.)

An example of the capability of XML and XSLT can be found at http://www.emimusic.de: This website contains approximately 20,000 pages with profiles of the artists, their products and the titles of the songs. These pages are generated using a XSLT script. Based on the script used it will also be possible to create a catalog in PDF format. Please see below for more details.

Interaction: XML can be used for accessing and changing data interactively. This man<->machine communication usually happens via a web browser (see Chapter 12).

Integration: Using XML, homogenous and heterogenous applications can be integrated. In this case, XML is used to describe data, interfaces, and protocols. This machine-machine communication helps integrate relational databases (i.e. by importing and exporting different formats).

Transaction: XML helps to process transactions in applications like online marketplaces, supply chain management, and e-procurement systems.

Key features of XML[edit]

  • Elements have both an opening and a closing tag
  • Elements follow a strict hierarchy, with documents containing only one root element
  • Elements cannot overlap other elements
  • Element names must obey XML naming conventions
  • XML is case sensitive

XML will improve the efficiency of data exchange in several important ways, which include:

  • write once and format many times: Once an XML file is created it can be presented in multiple ways by applying different XML stylesheets. For instance, the information might be displayed on a web page or printed in a book.
  • hardware and software independence: XML files are standard text files, which means they can be read by any application.
  • write once and exchange many times: Once an industry agrees on a XML standard for data exchange, data can be readily exchanged between all members using that standard.
  • Faster and more precise web searching: When the meaning of information can be determined by a computer (by reading the tags), web searching will be enhanced. For example, if you are looking for a specific book title, it is far more efficient for a computer to search for text between the pair of tags <booktitle> and </booktitle> than search an entire file looking for the title. Furthermore, spurious results should be eliminated.

10 reasons to use XML[edit]

  1. XML is a widely accepted open standard.
  2. XML allows to clearly separate content from form (appearance).
  3. XML is text-oriented.
  4. XML is extensible.
  5. XML is self-describing.
  6. XML is universal; meaning internationalization is no problem.
  7. XML is independent from platforms and programming languages.
  8. XML provides a robust and durable format for information storage.
  9. XML is easily transformable.
  10. XML is a future-oriented technology.

The major XML elements[edit]

The major XML elements are:

  • XML document: An XML file containing XML code.
  • XML schema: An XML file that describes the structure of a document and its tags.
  • XML stylesheet: An XML file containing formatting instructions for an XML file.

In the next few chapters you will learn how to create and use each of these elements of XML.

Creating a markup file[edit]

Any text editor can be used to create a markup file (e.g. an HTML file). In this book, we use the text editor within NetBeans, an open source Integrated Development Environment (IDE) for Java, because NetBeans supports editing and validation of XML files. Before proceeding, you should download and install NetBeans from http://www.NetBeans.org/.

The examples in this book use NetBeans to illustrate proper XML code. For an alternative to NetBeans, see Exchanger XML Lite

Case Studies in XML Implementation[edit]

XML at United Parcel Service (UPS)[edit]

“UPS is a service company and it is all about scale and speed,” says Geoff Chalmers, Project Leader at UPS eSolutions Department. In 2003, UPS had $33.5 billion annual revenue and 357,000 employees worldwide. Six percent of the United States' Gross Domestic Product (GDP) on any given day is in the UPS system.

UPS uses technology extensively. The Information Systems department employs 4,000 people. The company's web site has 166 different country home pages and is supported by 44 applications.

UPS delivers around 13 million packages every day, and customers can track these shipments via the UPS Web site, which receives around 200 million hits daily. Nineteen of the applications within ups.com are XML OnLine Tool (Web services) applications.

UPS’s online tools are developed specifically to be integrated with customers’ applications. This makes the customer’s task simpler, easier, and faster. UPS verified the importance of simplicity and speed, via “CampusShip,” a product that has been one of the UPS’s most successful in the last 10 years. UPS CampusShip® is a Web-based, UPS-hosted shipping system. Using an Internet connection, employees can ship their own packages and letters from any desktop, while management maintains overall control of shipping activities. UPS CampusShip® allows simultaneous shipper autonomy and managerial cost-control within the organization. This product has been successful because no installation or software maintenance is required and it is quick to implement. XML Online Tools enabled cheap and fast evolution of CampusShip®.

UPS favors XML especially because it is agnostic; platform and language independent. These features make XML very flexible and powerful. It is also decoupled and scalable. XML has enabled UPS to target a broader market and reduce customer interaction, and thus the cost of customer service. Another positive feature of XML is that it is backward compatible. The adoption of XML has reduced maintenance, implementation, and usage costs significantly within UPS.

However these advantages don’t come without a price. “XML is inefficient in so many ways” says Chalmers. XML unfortunately takes more CPU and bandwidth than the other technologies. Yet bandwidth and CPU are cheap and getting cheaper everyday, so this is a gradually disappearing problem.

Nevertheless, Chalmers also thinks that XML doesn’t work well in databases. He says that it is too wordy and it is an exchange medium rather than a database medium. There were some early attempts to tightly integrate XML and databases. Because databases do supply structure and identification to data as does XML, the value-add of XML-database integration is limited to applying hierarchical structure. On the other hand, if data is to be stored as a blob, then XML makes sense. Another problem that he points out about XML is that business rules cannot be expressed in XML schemas.

Finally, raw XML programming and debugging can be challenging. Therefore, UPS’s enterprise customers are starting to explore the code generators and embedded facilities to be found in .NET and BEA. However, hand coding by experienced in-house engineers is a must for the high availability, scalability, and performance that UPS requires for the UPS OnLine Tools.

XML at EMI Music[edit]

How is it used?

EMI Music Germany GmbH & Co. KG, a famous German record label, displays information about the artists it is affiliated with on its website. Visitors are able to explore all their audio or video productions. The whole website consists of nearly 20,000 pages that contain information about artists and their products (CD, DVD, LP). Everything is properly linked and systematically grouped.

After all, there is data to be provided for every artist, albums, samples, pictures, descriptions or article codes. The site is updated on a daily basis and is subject to change by a web editor whenever it’s necessary. Now this is a fairly complex and large amount of data to be handled.

This is where XML comes into play. The data, which is stored in a database, has been transformed into XML code. Now an XSLT stylesheet converts this data into HTML code, which can be easily read by any web browser (e.g. Internet Explorer or Firefox).

What's the benefit?

The advantage of XML is that the programming effort is considerably lower as compared to other formats. This is because XML lies at the point of intersection of XSLT and HTML.

It’s also no problem for the web editor to update the website. Using XML makes it easy for the person in charge to deal with this large amount of data.

Going beyond… On the basis of the XML scripts thus far produced by EMI Music, the company could easily produce a PDF-formatted catalog or design i-Mode pages for the current mobile phone generation. Thanks to XML, this can be done with little extra effort.

A brief history of XML[edit]

In the late 60s Charles Goldfarb, Raymond Lorie and Edward Mosher all working for IBM started to develop GML (Generalized Markup Language), a text formatting language. The language was successfully applied for internal documentation procedures. As it used to be common, the document editing was performed in the batch-mode. GenCode, another procedure to define generic formatting codes for the typesetting systems of various software producers, was developed by the GCA (Graphic Communications Association) at about the same time. Both of these technologies, GML syntactically and GenCode semantically, served as basis for the development of SGML (Standard Generalized Markup Language). The process of standardization started at the U.S. Standardization institute ANSI in the early 80s and in 1986 SGML finally passed as ISO standard ISO2879:1986.

SGML is reckoned to be a complex and comprehensive language (the specification extends 500 pages). However, the success of HTML (Hyper Text Markup Language) proved that the concepts of SGML were appropriate. SGML-based HTML was developed by Tim Berners-Lee in Geneva, in the early 90s in order to illustrate and link documents in the Internet. Meanwhile, HTML developed as the most successful format for all electronical documents. The Internet was originally designed as a space for human-human and human-machine communication but lately machine-machine communication has gained tremendous importance, putting a completely new challenge on the computer languages used.

HTML is a descriptive language for the presentation of documents. The main focus is on the presentation, meaning that an HTML-document mixes the presented data and its formatting instruction. A human being may recognize the displayed semantic by means of the presentation and the context meaning; a machine or (better-said) software is unable to.

In 1996 a team under the guidance of Jos Bosak attending the W3C-consortium was established to make SGML web-suitable. The result was a 30-page specification, which received in February 1998 the status of a "W3C-recommendation" and was named "Extensible Markup Language (XML)".

The most important goals developing XML were:

  • XML should be compatible with SGML
  • XML should be easy to use in the Internet
  • The number of optional characteristics should be minimized
  • XML-documents should be easy to generate and human-readable
  • XML should be supported by a variety of application
  • It should be easy to write programs for XML
  • XML should be put into practice on time

In the terminology of markup languages, a description formulated in XML is called a XML-document, albeit the content has nothing to do with text processing.

Why is this book not an XML document?[edit]

If you have accepted the ideas presented in this chapter, the question is very pertinent. The simple answer is that we have been unable to find the technology to support the creation of an open text book in XML. We need several pieces of technology

  • An XML language for describing a book. DocBook is such a language, but the structure of a book is quite complex, and DocBook (reflecting this complexity) cannot be quickly mastered
  • A Wiki that works with a language such as DocBook
  • A XML stylesheet that converts XML into HTML for displaying the book's content

There is a project to create WikiMl (Wiki MarkupLanguage), and this might be used at some point.

References[edit]

Initiating author Richard T. Watson, University of Georgia


Xml book cover wiki.png XML - Managing Data Exchange
Chapters
Appendices
Exercises
Related Topics
Computer Science Home
Library and Information Science Home
Markup Languages
Get Involved
To do list
Contributors list
Contributing to Wikibooks
Previous Chapter Next Chapter
Introduction to XML Basic data structures




Learning objectives


  • introduce XML documents, schemas, and stylesheets
  • describe and create an XML document
  • describe and create an XML schema
  • describe and create an XML stylesheet


Introduction[edit]

In this chapter, we start to practice working with XML using XML documents, schemas, and stylesheets. An XML document organizes data and information in a structured, hierarchical format. An XML schema provides standards and rules for the structure of a given XML document. An XML schema also enables data transfer. An XSL (XML stylesheet) allows unique presentations of the material found within an XML document.

In the first chapter, Introduction to XML, you learned what XML is, why it is useful, and how it is used. So, now you want to create your very own XML documents. In this chapter, we will show you the basic components used to create an XML document. This chapter is the foundation for all subsequent chapters--it is a little lengthy, but don't be intimidated. We will take you through the fundamentals of XML documents.


This chapter is divided into three parts:

  • XML Document
  • XML Schema
  • XML Stylesheets (XSL)


As you learned in the previous chapter, the XML Schema and Stylesheet are essentially specialized XML Documents. Within each of these three parts we will examine the layout and components required to create the document. There are links at the end of the XML document, schema, and stylesheet sections that show you how to create the documents using an XML editor. At the bottom of the page there is a link to Exercises for this chapter and a link to the Answers.

The first thing you will need before starting to create XML documents is a problem--something you want to solve by using XML to store and share data or information. You need some entity you can collect information about and then access in a variety of formats. So, we created one for you.

To develop an XML document and schema, start with a data model depicting the reality of the actual data that is exchanged. Once a high fidelity model has been created, the data model can be readily converted to an XML document and schema. In this chapter, we start with a very simple situation and in successive chapters extend the complexity to teach you more features of XML.

Our starting point is a single entity, CITY, which is shown in the following figure. While our focus is on this single entity, to map CITY to an XML schema, we need to have an entity that contains CITY. In this case, we have created TOURGUIDE. Think of a TOURGUIDE as containing many cities, and in this case TOURGUIDE has no attributes nor an identifier. It is just a container for data about cities.


Exhibit 1: Data model - Tourguide

Data Model - Tourguide.png


XML document[edit]

An XML document is a file containing XML code and syntax. XML documents have an .xml file extension.

We will examine the features & components of the XML document.


  • Prologue (XML Declaration)
  • Elements
  • Attributes
  • Rules to follow
  • Well-formed & Valid XML documents


Below is a sample XML document using our TourGuide model. We will refer to it as we describe the parts of an XML document.

Exhibit 2: XML document for city entity

  <?xml version="1.0" encoding="UTF-8"?>
  <tourGuide xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
    xsi:noNamespaceSchemaLocation='city.xsd'>
    <city>
        <cityName>Belmopan</cityName>
        <adminUnit>Cayo</adminUnit>
        <country>Belize</country>
        <population>11100</population>
        <area>5</area>
        <elevation>130</elevation>
        <longitude>88.44</longitude>
        <latitude>17.27</latitude>
        <description>Belmopan is the capital of Belize</description>
        <history>Belmopan was established following the devastation of the
           former capital, Belize City, by Hurricane Hattie in 1965. High 
           ground and open space influenced the choice and ground-breaking 
           began in 1966.  By 1970 most government offices and operations had 
           already moved to the new location.
        </history>
    </city>
    <city>
        <cityName>Kuala Lumpur</cityName>
        <adminUnit>Selangor</adminUnit>
        <country>Malaysia</country>
        <population>1448600</population>
        <area>243</area>
        <elevation>111</elevation>
        <longitude>101.71</longitude>
        <latitude>3.16</latitude>
        <description>Kuala Lumpur is the capital of Malaysia and the largest 
            city in the nation</description>
        <history>The city was founded in 1857 by Chinese tin miners and  
            preceded Klang.  In 1880 the British government transferred their 
            headquarters from Klang to Kuala Lumpur, and in 1896 it became the 
            capital of Malaysia. 
        </history>
    </city>
    <city>
        <cityName>Winnipeg</cityName>
        <adminUnit>St. Boniface</adminUnit>
        <country>Canada</country>
        <population>618512</population>
        <area>124</area>
        <elevation>40</elevation>
        <longitude>97.14</longitude>
        <latitude>49.54</latitude>
        <description>Winnipeg has two seasons. Winter and Construction.</description>
        <history>The city was founded by people at the forks (Fort Garry)
         trading in pelts with the Hudson Bay Company. Ironically, 
         The Bay was bought by America.
        </history>
    </city>
  </tourGuide>

Prologue (XML declaration)[edit]

The XML document starts off with the prologue. The prologue informs both a reader and the computer of certain specifications that make the document XML compliant. The first line is the XML declaration (and the only line in this basic XML document).

Exhibit 3: XML document - prologue

     <?xml version="1.0" encoding="UTF-8"?>

xml   =   this is an XML document
version="1.0"   =   the XML version (XML 1.0 is the W3C-recommended version)
encoding="UTF-8"   =   the character encoding used in the document - UTF 8 corresponds to 8-bit encoded Unicode characters (i.e. the standard way to encode international documents) - Unicode provides a unique number for every character.
Another potential attribute of the XML declaration:
standalone="yes"   =   the dependency of the document ('yes' indicates that the document does not require another document to complete content)

Elements[edit]

The majority of what you see in the XML document consists of XML elements. Elements are identified by their tags that open with < or </ and close with > or />. The start tag looks like this: <element attribute="value">, with a left angle bracket (<) followed by the element type name, optional attributes, and finally a right angle bracket (>). The end tag looks like this: </element>, similar to the start tag, but with a slash (/) between the left angle bracket and the element type name, and no attributes.

When there's nothing between a start tag and an end tag, XML allows you to combine them into an empty element tag, which can include everything a start tag can: <img src="Belize.gif" />. This one tag must be closed with a slash and right angle bracket (/>), so that it can be distinguished from a start tag.

The XML document is designed around a major theme, an umbrella concept covering all other items and subjects; this theme is analyzed to determine its component parts, creating categories and subcategories. The major theme and its component parts are described by elements. In our sample XML document, 'tourGuide' is the major theme; 'city' is a category; 'population' is a subcategory of 'city'; and the hierarchy may be carried even further: 'males' and 'females' could be subcategories of 'population'. Elements follow several rules of syntax that will be described in the Rules to Follow section.


We left out the attributes within the <tourGuide> start tag — that part will be explained in the XML Schema section.

Exhibit 4: Elements of the city entity XML document

  <tourGuide>
    <city>
        <cityName>Belmopan</cityName>
        <adminUnit>Cayo</adminUnit>
        <country>Belize</country>
        <population>11100</population>
        <area>5</area>
        <elevation>130</elevation>
        <longitude>88.44</longitude>
        <latitude>17.27</latitude>
        <description>Belmopan is the capital of Belize</description>
        <history>Belmopan was established following the devastation of the
           former capital, Belize City, by Hurricane Hattie in 1965. High 
           ground and open space influenced the choice and ground-breaking 
           began in 1966.  By 1970 most government offices and operations had 
           already moved to the new location.
        </history>
    </city>
  </tourGuide>


Element hierarchy[edit]

  • root element  -   This is the XML document's major theme element. Every document must have exactly one and only one root element. All other elements are contained within this one root element. The root element follows the XML declaration. In our example, <tourGuide> is the root element.
  • parent element  -   This is any element that contains other elements, the child elements. In our example, <city> is a parent element.
  • child element  -   This is any element that is contained within another element, the parent element. In our example, <population> is a child element of <city>.
  • sibling element  -   These are elements that share the same parent element. In our example, <cityName>, <adminUnit>, <country>, <population>, <area>, <elevation>, <longitude>, <latitude>, <description>, and <history> are all sibling elements.


Attributes[edit]

Attributes aid in modifying the content of a given element by providing additional or required information. They are contained within the element's opening tag. In our sample XML document code we could have taken advantage of attributes to specify the unit of measure used to determine the area and the elevation (it could be feet, yards, meters, kilometers, etc.); in this case, we could have called the attribute 'measureUnit' and defined it within the opening tag of 'area' and 'elevation'.


       <adminUnit class="state">Cayo</adminUnit>
       <adminUnit class="region">Selangor</adminUnit>


The above attribute example can also be written as:


1. using child elements

     <adminUnit>
          <class>state</class>
          <name>Cayo</name>
     </adminUnit>
     <adminUnit>
          <class>region</class>
          <name>Selangor</name>
     </adminUnit>

2. using an empty element

    <adminUnit class="state" name="Cayo" />
    <adminUnit class="region" name="Selangor" />


Attributes can be used to:

  • provide more information that is not defined in the data
  • define a characteristic of the element (size, color, style)
  • ensure the inclusion of information about an element in all instances

Attributes can, however, be a bit more difficult to manipulate and they have some constraints. Consider using a child element if you need more freedom.


Rules to follow[edit]

These rules are designed to aid the computer reading your XML document.

  • The first line of an XML document must be the XML declaration (the prologue).
  • The main theme of the XML document is established in the root element and all other elements must be contained within the opening and closing tags of this root element.
  • Every element must have an opening tag and a closing tag - no exceptions

(e.g. <element>data stuff</element>).

  • Tags must be nested in a particular order

=> the parent element's opening and closing tags must contain all of its child elements' tags; in this way, you close first the tag that was opened last:

<parentElement>
      <childElement1>data</childElement1>
      <childElement2>
              <subChildElementA>data</subChildElementA>
              <subChildElementB>data</subChildElementB>
      </childElement2>
      <childElement3>data</childElement3>
</parentElement>
  • Attribute values should have quotation marks around them and no spaces.
  • Empty tags or empty elements must have a space and a slash (/) at the end of the tag.
  • Comments in the XML language begin with "<!--" and end with "-->".


XML Element Naming Convention[edit]

Any name can be used but the idea is to make names meaningful to those who might read the document.

  • XML elements may only start with either a letter or an underscore character.
  • The name must not start with the string "xml" which is reserved for the XML specification.
  • The name may not contain spaces.
  • The ":" should not be used in element names because it is reserved to be used for namespaces (This will be covered in more detail in a later chapter).
  • The name may contain a mixture of letters, numbers, or other characters.


XML documents often have a corresponding database. The database will contain fields which correspond to elements in the XML document. A good practice is to use the naming rules of your database for the elements in the XML documents.

DTD (Document Type Definition) Validation - Simple Example[edit]

Simple Internal DTD[edit]

 <?xml version="1.0"?>
 <!DOCTYPE cdCollection [
    <!ELEMENT cdCollection (cd)>
    <!ELEMENT cd (title, artist, year)>
    <!ELEMENT title (#PCDATA)>
    <!ELEMENT artist (#PCDATA)>
    <!ELEMENT year (#PCDATA)>
 ]>
 <cdCollection>
  <cd>
    <title>Dark Side of the Moon</title>
    <artist>Pink Floyd</artist>
    <year>1973</year>
  </cd>
 </cdCollection>

Every element that will be used MUST be included in the DTD. Don’t forget to include the root element, even though you have already specified it at the beginning of the DTD. You must specify it again, in an <!ELEMENT> tag. <!ELEMENT cdCollection (cd)> The root element, <cdCollection>, contains all the other elements of the document, but only one direct child element: <cd>. Therefore, you need to specify the child element (only direct child elements need to be specified) in the parentheses. <!ELEMENT cd (title, artist, year)> With this line, we define the <cd> element. Note that this element contains the child elements <title>, <artist>, and <year>. These are spelled out in a particular order. This order must be followed when creating the XML document. If you change the order of the elements (with this particular DTD), the document won’t validate. <!ELEMENT title (#PCDATA)> The remaining three tags, <title>, <artist>, and <year> don’t actually contain other tags. They do however contain some text that needs to be parsed. You may remember from an earlier lecture that this data is called Parsed Character Data, or #PCDATA. Therefore, #PCDATA is specified in the parentheses. So this simple DTD outlines exactly what you see here in the XML file. Nothing can be added or taken away, as long as we stick to this DTD. The only thing you can change is the #PCDATA text part between the tags.

Adding complexity[edit]

There may be times when you will want to put more than just character data, or more than just child elements into a particular element. This is referred to as mixed content. For example, let’s say you want to be able to put character data OR a child element, such as the <b> tag into a <description> element:

 <!ELEMENT description (#PCDATA | b | i )*>

This particular arrangement allows us to use PCDATA, the <b> tag, or the <i> tag all at once. One particular caveat though, is that if you are going to mix PCDATA and other elements, the grouping must be followed by the asterisk (*) suffix. This declaration allows us to now add the following to the XML document (after defining the individual elements of course)

  <cd>
    <title>Love. Angel. Music. Baby</title>
    <artist>Gwen Stefani</artist>
    <year>2004</year>
    <genre>pop</genre>
    <description>
      This is a great album from former  
      <nowiki><i>No Doubt</i> singer <b>Gwen Stephani</b>.</nowiki>
    </description>
  </cd>

With attributes this is done a little differently than with elements. Please see following example:

  <cd remaster_date=”1992”>
    <title>Dark Side of the Moon</title>
    <artist>Pink Floyd</artist>
    <year>1973</year>
  </cd>

In order for this to validate, it must be specified in the DTD. Attribute content models are specified with:

 <!ATTLIST element_name attribute_name attribute_type default_value>

Let’s use this to validate our CD example:

 <!ATTLIST cd remaster_date CDATA #IMPLIED>

Choices[edit]

 <ATTLIST person gender (male|female) “female”>

Grouping Attributes for an Element[edit]

If a particular element is to have many different attributes, group them together like so:

<!ATTLIST car horn CDATA #REQUIRED
             seats CDATA #REQUIRED
     steeringwheel CDATA #REQUIRED
             price CDATA #IMPLIED>

Adding STATIC validation, for items that must have a certain value[edit]

<!ATTLIST classList   classNumber CDATA #IMPLIED
                      building (UWINNIPEG_DCE|UWINNIPEG_MAIN) "UWINNIPEG_MAIN"
                      originalDeveloper CDATA #FIXED "Khal Shariff">

Suffixes[edit]

So what happens with our last example with the CD collection, when we want to add more CDs? With the current DTD, we cannot add any more CDs without getting an error. Try it and see. When you specify a child element (or elements) the way we did, only one of each child element can be used. Not very suitable for a CD collection is it? We can use something called suffixes to add functionality to the <!ELEMENT> tag. Suffixes are added to the end of the specified child element(s). There are 3 main suffixes that can be used:

  • ( No suffix ): Only 1 child can be used.
  • ( + ): One or more elements can be used.
  • ( * ): Zero or more elements can be used.
  • ( ? ): Zero or one element may be used.

Validating for multiple children with a DTD[edit]

So in the case of our CD collection XML file, we can add more CDs to the list by adding a + suffix:

<!ELEMENT cd_collection(cd+)>

Using more internal formatting tags[edit]

Bold tags, B's for example are also defined in the DTD as elements, that are optional like thus:

<ELEMENT notes (#PCDATA | b | i)*>
   <!ELEMENT b (#PCDATA)*>
   <!ELEMENT i (#PCDATA)*>
]>

_______________

<classList classNumber="303" building="UWINNIPEG_DCE" originalDeveloper="Khal Shariff">
 <student>
   <firstName>Kenneth
   </firstName>
   <lastName>Branaugh
   </lastName>
   <studentNumber>
   </studentNumber>
   <notes><b>Excellent </b>, Kenneth is doing well.
   </notes>
etc

Case Study on BMEcat[edit]

One of the first major national projects for the use of XML as a B2B exchange format was initiated by the federal association for material management, purchasing and logistics (BME) in cooperation with leading German companies, e.g. Bayer, BMW, SAP and Siemens. They all created a standard for the exchange of product catalogues. This project was named BMEcat. The result of this initiative is a DTD collection for the description of product catalogues and related transactions (new catalogue, updating of product data and updating of prices).

Companies operating in the electronic commerce (suppliers, purchasing companies and market places) exchange increasingly large amounts of data. They quickly reach their limits here by the variety of data exchange formats. The BMEcat solution creates a basis for a straightforward transfer of catalogue data from various data formats. This lays the foundation to bringing forward the goods traffic through the Internet in Germany. The use of the BMEcat reduces the costs for all parties as standard interfaces can be used.

The XML-based standard BMEcat was successfully implemented in many projects. Nowadays a variety of companies applies BMEcat and use it for the exchange of their product catalogs in this established standard.


A BMEcat catalogue (Version 1.2) consists of the following main elements:

CATALOG This element contains the essential information of a shopping catalog, e.g. language version and validity. BMEcat expects exactly one language per catalog.

SUPPLIER This element includes identification and address of the catalog suppliers. BMEcat expects exactly one supplier per catalog.

BUYER This element contains the name and address of the catalogue recipient. BMEcat expects no more than one recipient per catalog.

AGREEMENT This element contains one or more framework agreement IDs associated with the appropriate validity period. BMEcat expects all prices of a catalogue belonging to the contract mentioned above.

CLASSIFICATION SYSTEM This element allows the full transfer of one or more classification systems, including feature definitions and key words.

CATALOG GROUP SYSTEM This element originates from version 1.0. It is mainly used for the transfer of tree-structures which facilitate the navigation of a user in the target system (Browser).

ARTICLE (since 2005 PRODUCT) This element represents a product. It contains a set of standard attributes.

ARTICLE PRICE (since 2005 PRODUCT PRICE) This element represents a price. The support of different pricing models is very powerful in comparison with other exchange formats. Season prices, country prices, different currencies and different validity periods, etc. will be supported.

ARTICLE FEATURE (since 2005 PRODUCT FEATURE) This element allows the transfer of characteristic values. You can either record predefined group characteristics or individual product characteristics.

VARIANT This element allows listing of product variants, without having to duplicate them. However, the variations of BMEcat only apply to individual changes in value, leading to a change of Article ID. Otherwise there can’t exist any dependences on other attributes (especially at prices).

MIME This element includes any number of additional documents such as product images, data sheets, or websites.

ARTICLE REFERENCE (since 2005 REFERENCE PRODUCT) This element allows cross-referencing between articles within a catalogue as well as between catalogues. These references may used restrictedly for mapping product bundles.

USER DEFINED EXTENSION This element enables transportation of data at the outside the BMEcat standards. The transmitter and receiver have to be coordinated.

You can find a typical BMEcat file here.

ONLINE Validator[edit]

http://www.stg.brown.edu/service/xmlvalid/

Well-formed and valid XML[edit]

Well-formed XML  -  An XML document that correctly abides by the rules of XML syntax.

Valid XML  -  An XML document that adheres to the rules of an XML schema (which we will discuss shortly). To be valid an XML document must first be well-formed.


A Valid XML Document must be Well-formed. But, a Well-formed XML Document might not be valid - in other words, a well-formed XML document, that meets the criteria for XML syntax, might not meet the criteria for the XML schema, and will therefore be invalid.

For example, think of the situation where your XML document contains the following (for this schema):

  <city>
    <cityName>Boston</cityName>
    <country>United States</country>
    <adminUnit>Massachusetts</adminUnit>
  :
  :
  :
  </city>

Notice that the elements do not appear in the correct sequence according to the schema (cityName, adminUnit, country). The XML document can be validated (using validation software) against its declared schema – the validation software would then catch the out of sequence error.


Using an XML Editor[edit]

Check chapter XML Editor for instructions on how to start an XML editor. Once you have followed the steps to get started you can copy the code in the sample XML document and paste it into the XML editor. Then check your results. Is the XML document well-formed? Is the XML document valid? (you will need to have copied and pasted the schema in order to validate - we will look at schemas next)


XML schema[edit]

An XML schema is an XML document. XML schemas have an .xsd file extension.

An XML schema is used to govern the structure and content of an XML document by providing a template for XML documents to follow in order to be valid. It is a guide for how to structure your XML document as well as indicating your XML document's components (elements and attributes - and their relationships). An XML editor will examine an XML document to ensure that it conforms to the specifications of the XML schema it is written against - to ensure it is valid.

XML schemas engender confidence in data transfer. With schemas, the receiver of data can feel confident that the data conforms to expectations. The sender and the receiver have a mutual understanding of what the data represent.

Because an XML schema is an XML document, you use the same language - standard XML markup syntax - with elements and attributes specific to schemas.


A schema defines:

  • the structure of the document
  • the elements
  • the attributes
  • the child elements
  • the number of child elements
  • the order of elements
  • the names and contents of all elements
  • the data type for each element

For more detailed information on XML schemas and reference lists of: Common XML Schema Primitive Data Types, Summary of XML Schema Elements, Schema Restrictions and Facets for data types, and Instance Document Attributes, click on this wikibook link => http://en.wikibooks.org/wiki/XML_Schema


Schema reference[edit]

This is the part of the XML Document that references an XML Schema:

Exhibit 5: XML document's schema reference

  <tourGuide
      xmlns:xsi=<nowiki>'http://www.w3.org/2001/XMLSchema-instance'</nowiki>
      xsi:noNamespaceSchemaLocation='city.xsd'>

This is the part we left out when we described the root element in the basic XML document from the previous section. The additional attributes of the root element <tourGuide> reference the XML schema (it is the schemaLocation attribute).

xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'  -  references the W3C Schema-instance namespace
xsi:noNamespaceSchemaLocation='city.xsd'  -  references the XML schema document (city.xsd)

Schema document[edit]

Below is a sample XML schema using our TourGuide model. We will refer to it as we describe the parts of an XML schema.

Exhibit 6: XML schema document for city entity

  <?xml version="1.0" encoding="UTF-8"?>
  <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
   elementFormDefault="unqualified">  
    <xsd:element name="tourGuide">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="city" type="cityDetails" minOccurs = "1" maxOccurs="unbounded" />
            </xsd:sequence>
        </xsd:complexType>
     </xsd:element>
     <xsd:complexType name="cityDetails">
        <xsd:sequence> 
             <xsd:element name="cityName" type="xsd:string"/>
             <xsd:element name="adminUnit" type="xsd:string"/>
             <xsd:element name="country" type="xsd:string"/>
             <xsd:element name="population" type="xsd:integer"/>
             <xsd:element name="area" type="xsd:integer"/>
             <xsd:element name="elevation" type="xsd:integer"/>
             <xsd:element name="longitude" type="xsd:decimal"/>
             <xsd:element name="latitude" type="xsd:decimal"/>
             <xsd:element name="description" type="xsd:string"/>
             <xsd:element name="history" type="xsd:string"/>
         </xsd:sequence>
     </xsd:complexType>
  </xsd:schema>
  <!--
    Note: Latitude and Longitude are decimal data types.
    The conversion is from the usual form (e.g., 50º 17' 35")
    to a decimal by using the formula degrees+min/60+secs/3600.
  -->


Prolog[edit]

Remember that the XML schema is essentially an XML document and therefore must begin with the prolog, which in the case of a schema includes:

  • the XML declaration
  • the schema element declaration


The XML declaration:

  <?xml version="1.0" encoding="UTF-8"?>

The schema element declaration:

<xsd:schema xmlns:xsd="<nowiki>http://www.w3.org/2001/XMLSchema</nowiki>" elementFormDefault="unqualified">

The schema element is similar to a root element - it contains all other elements in the schema.

Attributes of the schema element include:

xmlns  -  XML NameSpace - the URL for the site that describes the XML elements and data types used in the schema.

You can find more about namespaces here => Namespace.

xmlns:xsd  -  All the elements and attributes with the 'xsd' prefix adhere to the vocabulary designated in the given namespace.

elementFormDefault  -  elements from the target namespace are either required or not required to be qualified with the namespace prefix. This is mostly useful when more than one namespace is referenced. In this case, 'elementFormDefault' must be qualified, because you must indicate which namespace you are using for each element. If you are referencing only one namespace, then 'elementFormDefault' can be unqualified. Perhaps, using qualified as the default is most prudent, this way you do not accidentally forget to indicate which namespace you are referencing.

Element declarations[edit]

Define the elements in the schema.

Include:

  • the element name
  • the element data type (optional)

Basic element declaration format: <xsd:element name="name" type="type">

Simple type[edit]

declares elements that:

  • do NOT have Child Elements
  • do NOT have Attributes

example: <xsd:element name="cityName" type="xsd:string" />

Default Value

If an element is not assigned a value then the default value is assigned.

example: <xsd:element name="description" type="xsd:string" default="really cool place to visit!" />

Fixed Value

An attribute that is defined as fixed must be empty or contained the specified fixed value. No other values are allowed.

example: <xsd:element name="description" type="xsd:string" '''fixed="you must visit this place - it is awesome!"''' />

Complex type[edit]

declares elements that:

  • can have Child Elements
  • can have Attributes

examples:

1. The root element 'tourGuide' contains a child element 'city'. This is shown here:

Nameless complex type

     <xsd:element name="tourGuide">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="city" type="cityDetails" minOccurs = "1" maxOccurs="unbounded" />
            </xsd:sequence>
        </xsd:complexType>
     </xsd:element>

Occurrence Indicators:

  • minOccurs = the minimum number of times an element can occur (here it is 1 time)
  • maxOccurs = the maximum number of times an element can occur (here it is an unlimited number of times, 'unbounded')


2. The parent element 'city' contains many child elements: 'cityName', 'adminUnit', 'country', 'population', etc. Why does this complex element set not start with the line: <xsd:element name="city" type="cityDetails">? The element 'city' was already defined above within the complex element 'tourGuide' and it was given the type, 'cityDetails'. This data type, 'cityDetails', is utilized here in identifying the sequence of child elements for the parent element 'city'.

Named Complex Type - and therefore can be reused in other parts of the schema

   <xsd:complexType name="cityDetails">
        <xsd:sequence>
             <xsd:element name="cityName" type="xsd:string"/>
             <xsd:element name="adminUnit" type="xsd:string"/>
             <xsd:element name="country" type="xsd:string"/>
             <xsd:element name="population" type="xsd:integer"/>
             <xsd:element name="area" type="xsd:integer"/>
             <xsd:element name="elevation" type="xsd:integer"/>
             <xsd:element name="longitude" type="xsd:decimal"/>
             <xsd:element name="latitude" type="xsd:decimal"/>
             <xsd:element name="description" type="xsd:string"/>
             <xsd:element name="history" type="xsd:string"/>
         </xsd:sequence>
   </xsd:complexType>

The <xsd:sequence> tag indicates that the child elements must appear in the order, the sequence, specified here.

Compare the sample XML Schema and the sample XML Document - try to observe patterns in the code and how the XML Schema sets up the XML Document.


3. Elements that have attributes are also designated as complex type.

a. this XML Document line: <adminUnit class="state" name="Cayo" /> would be defined in the XML Schema as:

     <xsd:element name="adminUnit">
          <xsd:complexType>
               <xsd:attribute name="class" type="xsd:string" />
               <xsd:attribute name="name" type="xsd:string" />
          </xsd:complexType>
     </xsd:element>

b. this XML Document line: <adminUnit class="state">Cayo</adminUnit> would be defined in the XML Schema as:

     <xsd:element name="adminUnit">
          <xsd:complexType>
               <xsd:simpleContent>
             		<xsd:extension base="xsd:string">
                                <xsd:attribute name="class" type="xsd:string" />
                        </xsd:extension>
	       </xsd:simpleContent>
          </xsd:complexType>
     </xsd:element>

Attribute declarations[edit]

Attribute declarations are used in complex type definitions. We saw some attribute declarations in the third example of the Complex Type Element.

<xsd:attribute name="class" type="xsd:string" />


Data type declarations[edit]

These are contained within element and attribute declarations as: type=" " .

Common XML Schema Data Types

XML schema has a lot of built-in data types. The most common types are:

string a string of characters
decimal a decimal number
integer an integer
boolean the values true or false or 1 or 0
date a date, the date pattern can be specified such as YYYY-MM-DD
time a time of day, the time pattern can be specified such as HH:MM:SS
dateTime a date and time combination
anyURI if the element will contain a URL


For an entire list of built-in simple data types see http://www.w3.org/TR/xmlschema-2/#built-in-datatypes



Using an XML Editor => XML Editor

This link will take you to instructions on how to start an XML editor. Once you have followed the steps to get started you can copy the code in the sample XML schema document and paste it into the XML editor. Then check your results. Is the XML schema well-formed? Is the XML schema valid?


XML stylesheet (XSL)[edit]

An XML Stylesheet is an XML Document. XML Stylesheets have an .xsl file extension.

The eXtensible Stylesheet Language (XSL) provides a means to transform and format the contents of an XML document for display. Since an XML document does not contain tags a browser understands, such as HTML tags, browsers cannot present the data without a stylesheet that contains the presentation information. By separating the data and the presentation logic, XSL allows people to view the data according to their different needs and preferences.

The XSL Transformation Language (XSLT) is used to transform an XML document from one form to another, such as creating an HTML document to be viewed in a browser. An XSLT stylesheet consists of a set of formatting instructions that dictate how the contents of an XML document will be displayed in a browser, with much the same effect as Cascading Stylesheets (CSS) do for HTML. Multiple views of the same data can be created using different stylesheets. The output of a stylesheet is not restricted to a browser.


During the transformation process, XSLT analyzes the XML document and converts it into a node tree – a hierarchical representation of the entire XML document. Each node represents a piece of the XML document, such as an element, attribute or some text content. The XSL stylesheet contains predefined “templates” that contain instructions on what to do with the nodes. XSLT will use the match attribute to relate XML element nodes to the templates, and transform them into the resulting document.

Exhibit 7: XML stylesheet document for city entity

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html"/> 
    <xsl:template match="/">
        <html>
            <head>
                <title>Tour Guide</title>
            </head>
            <body>
                <h2>Cities</h2>
                <xsl:apply-templates select="tourGuide"/>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="tourGuide">
        <xsl:for-each select="city">
            <br/><xsl:value-of select="continentName"/><br/>
            <xsl:value-of select="cityName"/><br/>
            <xsl:text>Population: </xsl:text>
            <xsl:value-of select='format-number(population, "##,###,###")'/><br/>
            <xsl:value-of select="country"/>
            <br/>
        </xsl:for-each>     
    </xsl:template>
</xsl:stylesheet>


The output of the city.xsl stylesheet in Table 2-3 will look like the following:

Cities

Europe
Madrid
Population: 3,128,600
Spain

Asia
Shanghai
Population: 18,880,000

China


You will notice that the stylesheet consists of HTML to inform the media tool (a web browser) of the presentation design. If you do not already know HTML this may seem a little confusing. Online resources such as the W3Schools tutorials can help with the basic understanding you will need =>(http://www.w3schools.com/html/default.asp).

Incorporated within the HTML is the XML that supplies the data, the information, contained within our XML document. The XML of the stylesheet indicates what information will be displayed and how. So, the HTML constructs a display and the XML plugs in values within that display. XSL is the tool that transforms the information into presentational form, but at the same time keeps the meaning of the data.

XML at Bertelsmann - a case study

The German Bertelsmann Inc. is a privately owned media conglomerate operating in 56 countries. It has interests in such businesses as TV broadcast (RTL), magazine (Gruner & Jahr), books (Random House) etc. In 2005 its 89 000 employees generated 18 billion € of revenue.

A major concern of such a diversified business is utilizing synergies. Management needs to make sure the Random House employees don´t spend time and money figuring out what RTL TV journalists already have come up with.

Thus knowledge management based on IT promises huge time savings. Consequently Bertelsmann in 2002 started a project called BeCom. BeCom´s purpose was to enable the different Bertelsmann businesses to use the same data for their different media applications. XML is crucial in this project, because it allows for separating data (document) from presentation (style sheet). Thus data can both be examined statistically and be modified to fit different media like TV and newspapers.

Statistical XML data management for example enables employees to benefit from CBR (Case Based Reasoning). CBR allows a Bertelsmann employee who searches for specific content to profit from previous search findings of other Bertelsmann employees, thus gaining info which is much more contextual than isolated research results only. Besides XML data management, Bertelsmann TV and Book units can apply this optimized data in their specific media using a variety of lay-out applications like 3B2 or QuarkXPress.


Prolog[edit]

  • the XML declaration;
  • the stylesheet declaration;
  • the namespace declaration;
  • the output document format.
 <?xml version="1.0" encoding="UTF-8"?>
 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="html"/>


The XML declaration

 <?xml version="1.0" encoding="UTF-8"?>


The stylesheet & namespace declarations

     <xsl:stylesheet version="1.0" xmlns:xsl="<nowiki>http://www.w3.org/1999/XSL/Transform</nowiki>">
  • identifies the document as an XSL style sheet;
  • identifies the version number;
  • refers to the W3C XSL namespace - the URL for the site that describes the XML elements and data types used in the schema. You can find more about namespaces here => Namespace. Every time the xsl: prefix is used it references the given namespace.


The output document format

      <xsl:output method="html"/>

This element designates the format of the output document and must be a child element of <xsl:stylesheet>

Templates[edit]

The <xsl:template> element is used to create templates that describe how to display elements and their content. Above, in the XSL introduction, we mentioned that XSL breaks up the XML document into nodes and works on individual nodes. This is done with templates. Each template within an XSL describes a single node. To identify which node a given template is describing, use the 'match' attribute. The value given to the 'match' attribute is called a pattern. Remember: (node tree – a hierarchical representation of the entire XML document. Each node represents a piece of the XML document, such as an element, attribute or some text content). Wherever there is branching in the node tree, there is a node. <xsl:template> defines the start of a template and contains rules to apply when a specified node is matched.


the match attribute

   <xsl:template match="/">

This template match attribute associates the XML document root (/), the whole branch of the XML source document, with the HTML document root. Contained within this template element is the typical HTML markup found at the beginning of any HTML document. This HTML is written to the output. The XSL looks for the root match and then outputs the HTML, which the browser understands.

   <xsl:template match="tourGuide">

This template match attribute associates the element 'tourGuide' with the display rules described within this element.


Elements[edit]

Elements specific to XSL:

XSL Element Meaning
(from our sample XSL)
<xsl:text> Prints the actual text found between this element's tags
<xsl:value-of> This element is used with a 'select' attribute to look up the value of the node selected and plug it into the output.
<xsl:for-each> This element is used with a 'select' attribute to handle elements that repeat by looping through all the nodes in the selected node set.
<xsl:apply-templates> This element will apply a template to a node or nodes. If it uses a 'select' attribute then the template will be applied only to the selected child node(s) and can specify the order of child nodes. If no 'select' attribute is used then the template will be applied to the current node and all its child nodes as well as text nodes.

For more XSL elements => http://www.w3schools.com/xsl/xsl_w3celementref.asp .

Language-Specific Validation and Transformation Methods[edit]

PHP Methods of XML Dom Validation[edit]

Using the DOM DocumentObjectModel to validate XML and with a DTD DocumentTypeDeclaration and the PHP language on a server and more http://wiki.cc/php/Dom_validation

Browser Methods[edit]

Place this line of code in your .xml document after the XML declaration (prologue).

 <?xml-stylesheet type="text/xsl" href="tourGuide.xsl"?>

PHP XML Production[edit]

 <?php
 $xmlData = "";
 mysql_connect('localhost','root','')
 or die('Failed to connect to the DBMS');
 // make connection to database
 mysql_select_db('issd')
 or die('Failed to open the requested database');
 $result = mysql_query('SELECT * from students') or die('Query to like get the records failed');
 if (mysql_num_rows($result)<1){
    die ('');
 }
 $xmlString = "<classlist>\n";
 $xmlString .= "\t<student>";
 while ($row = mysql_fetch_array($result)) {
         $xmlString .=  "
          \t<firstName>
              ".$row['firstName']."
           </firstName>\n
            \t<lastName>
              ".$row['lastName']."
          \t</lastName>\n";         
      }
 $xmlString .= "</student>\n";
 $xmlString .= "</classlist>";
 echo $xmlString;
 $myFile = "classList.xml"; //any file
 $fh = fopen($myFile, 'w') or die("can't open file"); //create filehandler
 fwrite($fh, $xmlString); //write the data into the file
 fclose($fh); //ALL DONE!
 ?>

PHP Methods of XSLT Transformation[edit]

This one is good for PHP5 and wampserver (latest). Please ensure that *xsl* is NOT commented out in the php.ini file.

 <?php
 // Load the XML source
 $xml = new DOMDocument;
 $xml->load('tourguide.xml');
 $xsl = new DOMDocument;
 $xsl->load('tourguide.xsl');
 // Configure the transformer
 $proc = new XSLTProcessor;
 $proc->importStyleSheet($xsl); // attach the xsl rules
 echo $proc->transformToXML($xml);
 ?>


Example 1, Using within PHP itself (use phpInfo() function to check XSLT extension; enable if needed) This example might produce XHTML. Please note it could produce anything defined by the XSL.

 <?php
 $xhtmlOutput = xslt_create();
 $args = array();
 $params = array('foo' => 'bar');
 $theResult = xslt_process(
                         $xhtmlOutput,
                         'theContentSource.xml',
                         'theTransformationSource.xsl',
                         null,
                         $args,
                         $params
                        );
 xslt_free($xhtmlOutput); // free that memory
 // echo theResult or save it to a file or continue processing (perhaps instructions)
 ?>

Example 2:

 <?php
 if (PHP_VERSION >= 5) {
   // Emulate the old xslt library functions
   function xslt_create() {
       return new XsltProcessor();
   }
   function xslt_process($xsltproc,
                         $xml_arg,
                         $xsl_arg,
                          $xslcontainer = null,
                         $args = null,
                         $params = null) {
       // Start with preparing the arguments
       $xml_arg = str_replace('arg:', '', $xml_arg);
       $xsl_arg = str_replace('arg:', '', $xsl_arg);
       // Create instances of the DomDocument class
       $xml = new DomDocument;
       $xsl = new DomDocument;
       // Load the xml document and the xsl template
       $xml->loadXML($args[$xml_arg]);
       $xsl->loadXML($args[$xsl_arg]);
       // Load the xsl template
       $xsltproc->importStyleSheet($xsl);
       // Set parameters when defined
       if ($params) {
           foreach ($params as $param => $value) {
               $xsltproc->setParameter("", $param, $value);
           }
       }
       // Start the transformation
       $processed = $xsltproc->transformToXML($xml);
       // Put the result in a file when specified
       if ($xslcontainer) {
           return @file_put_contents($xslcontainer, $processed);
       } else {
           return $processed;
       }
   }
   function xslt_free($xsltproc) {
       unset($xsltproc);
   }
 }
 $arguments = array(
   '/_xml' => file_get_contents("xml_files/201945.xml"),
   '/_xsl' => file_get_contents("xml_files/convertToSql_new2.xsl")
 );
 $xsltproc = xslt_create();
 $html = xslt_process(
   $xsltproc,
   'arg:/_xml',
   'arg:/_xsl',
   null,
   $arguments
 );
 xslt_free($xsltproc);
 print $html;
 ?>

PHP file writing code[edit]

 $myFile = "testFile.xml"; //any file
 $fh = fopen($myFile, 'w') or die("can't open file"); //create filehandler
 $stringData = "<foo>\n\t<bar>\n\thello\n"; // get a string ready to write
 fwrite($fh, $stringData); //write the data into the file
 $stringData2 = "\t</bar>\n</foo>";
 fwrite($fh, $stringData2); //write more data into the file
 fclose($fh); //ALL DONE!

XML Colors[edit]

For use in your stylesheet: these colors can be used for both background and font

http://www.w3schools.com/html/html_colors.asp

http://www.w3schools.com/html/html_colorsfull.asp

http://www.w3schools.com/html/html_colornames.asp


Using an XML Editor => XML Editor

This link will take you to instructions on how to start an XML editor. Once you have followed the steps to get started you can copy the code in the sample XML stylesheet document and paste it into the XML editor. Then check your results. Is the XML stylesheet well-formed?


XML at Thomas Cook - a case study[edit]

As the leading travel company and most widely recognized brands in the world, Thomas Cook works across the travel value chain - airlines, hotels, tour operators, travel and incoming agencies, providing its customers with the right product in all market segments across the globe. Employing over 11,000 staff, the Group has 33 tour operators, around 3,600 travel agencies, a fleet of 80 aircraft and a workforce numbering some 26,000. Thomas Cook operates throughout a network of 616 locations in Europe and overseas. The company is now the second largest travel group in Europe and the third largest in the world.

As Thomas Cook sells other companies´ products, ranging from packaged holidays to car hires, it needs to regularly change its online brochure. Before Thomas Cook started using XML, it put information into HTML format, and would take upto six weeks to get an online brochure up and running online. XML helps do this job in about three days. This helps provide all of Thomas Cook´s current and potential customers and its various agencies in different geographical locations with updated information, instead of having to wait six weeks for new information to be released.

XML allows Thomas Cook to put content information into a single database, which can be re-used as many times as required. "We did not want to keep having to re-do the same content, we wanted the ability to switch it on immediately," said Gwyn Williams, who is content manager at Thomascook.com. "This has brought internal benefits such as being able to re-deploy staff into more value added areas." Thomascook.com currently holds 65,000 pages of brochure and travel guide information and an online magazine in XML format.

Thomas Cook started using XML at a relatively early stage. As Thomas Cook has a large database, the early use of XML will stand it in good stead. At some point, the databases will have to be incorporated into XML, and it is reported that XML databases are quicker than conventional databases, giving Thomas Cook a slight competitive advantage against those who do not use XML.

Thomas Cook has found that this can lead to substantial cost reductions as well as consistency of information across all channels. By implementing a central content management system to facilitate brochure production and web publications, they have centralized the production, maintenance and distribution of content across their brands and channels.

Summary[edit]

From the previous chapter Introduction to XML, you have learned the need for data exchange and the usefulness of XML in data exchange. In this chapter, you have learned more about the three major XML files: the XML document, the XML schema, and the XML stylesheet. You learned the correct documentation required for each type of file. You learned basic rules of syntax applicable for all XML documents. You learned how to integrate the three types of XML documents. And you learned the definition and distinction between a well-formed document and a valid document. By following the XML Editor links, you were able to see the results of the sample code and learn how to use an XML Editor.

Below are Exercises and Answers for further practice. Good Luck!


Definitions[edit]

XML
SGML
Dan Connelly
RSS
XML Declaration
parent
child
sibling
element
attribute
*Well-formed XML
PCDATA

Exercises[edit]

Exercise 1.

a)Using "tourguide" above as a good example, create an XML document whose root is "classlist" . This CLASSLIST is created from a starting point of single entity, STUDENT. Any number of students contain elements: firstname, lastname, emailaddress.

Answers[edit]



Xml book cover wiki.png XML - Managing Data Exchange
Chapters
Appendices
Exercises
Related Topics
Computer Science Home
Library and Information Science Home
Markup Languages
Get Involved
To do list
Contributors list
Contributing to Wikibooks
Previous Chapter Next Chapter
Basic data structures The one-to-one relationship




Learning objectives

  • Learn different techniques of implementing one-to-many relationships in XML
  • create custom data types in an XML schema
  • create empty elements with attributes in an XML document
  • define a presentation layout for an XML document using a table with varying background colors and font characteristics, and display images in an XML stylesheet



Introduction[edit]

In a one-to-many relationship, one object can reference several instances of another. A model is mapped into a schema whereby each data model entity becomes a complex element type. Each data model attribute becomes a simple element type, and the one-to-many relationship is recorded as a sequence.

Exhibit 1:Data model for 1:m relationship


In the previous chapter, we introduced a simple XML schema, XML document, and an XML stylesheet for a single entity data model. We now include more features of each of the key aspects of XML.

Implementing a one-to-many relationship[edit]

There are three different techniques for implementing a one-to-many relationship:

Containment relationship: A structure is defined where one element is contained within another. The "contained" element ceases to exist when the "container" element is removed. For instance, where a city has many hotels, the hotels are "contained" in the city.

  <cityDetails>
    <cityName>Belmopa</cityName>
    <hotelDetails>
      <hotelName>Bull Frog Inn</hotelName>
    </hotelDetails>
    <hotelDetails>
      <hotelName>Pook's Hill Lodge</hotelName>
    </hotelDetails>
  </cityDetails>
  <cityDetails>
    <cityName>Kuala Lumpur</cityName>
    <hotelDetails>
      <hotelName>Pan Pacific Kuala Lumpur</hotelName>
    </hotelDetails>
    <hotelDetails>
      <hotelName>Mandarin Oriental Kuala Lumpur</hotelName>
    </hotelDetails>
  </cityDetails>

Intra-document relationships: In a case where you have one city with many hotels, rather than a city containing hotels, a hotel will have a "location in" relationship to a city. A city id is used as a reference on the hotel element. Therefore, rather than the hotels being contained in the city, they now just reference the city's id via the cityRef attribute. This is very similar to a foreign key in a relational database.

  <cityDetails>
   <city ID="c1">
    <cityName>Belmopa</cityName>
   </city ID>
   <city ID="c2">
    <cityName>Kuala Lumpur</cityName>
   </city ID>
  </cityDetails>
  <hotelDetails>
    <hotel cityRef="c1">
      <hotelName>Bull Frog Inn</hotelName>
    </hotel>
    <hotel cityRef="c2">
      <hotelName>Pan Pacific Kuala Lumpur</hotelName>
    </hotel>
  </hotelDetails>

Inter-document relationships: The inter-document relationship is much like the intra-document relationship. It also uses the id and idRef attributes to assign an attribute to a parent attribute. The difference is that the inter-document relationship is used when tables, such as the city and hotel tables, might live in different filesystems or tablespaces.

  <city id="c1">
    <cityName>Belmopa</cityName>
  </city>
  <city id="c2">
    <cityName>Kuala Lumpur</cityName>
  </city>
  <hotel>
    <city href="cityDetails.xml#c1"/>
    <hotelName>Bull Frog Inn</hotelName>
  </hotel>
  <hotel>
    <city href="cityDetails.xml#c2"/>
    <hotelName>Pan Pacific Kuala Lumpur</hotelName>
  </hotel>


Exhibit 2:Checklist for deciding what technique to use:

<table width="100%" border="1" cellspacing="0" cellpadding="0">
  <tr>
    <th width="30%">Technique</th>
    <th width="25%">Passing Data</th>
    <th width="15%">Flexibility</th>
    <th width="30%">Ease of Use</th>
  </tr>
  <tr style="text-align:center">
    <td>Containment</td>
    <td>Excellent</td>
    <td>Fair</td>
    <td>Excellent</td>
  </tr>
  <tr>
    <td style="text-align:center">Intra-Document</td>
    <td style="text-align:center">Good</td>
    <td style="text-align:center">Good</td>
    <td style="text-align:center">Good</td>
  </tr>
  <tr>
    <td style="text-align:center">Inter-Document</td>
    <td style="text-align:center">Fair</td>
    <td style="text-align:center">Excellent</td>
    <td style="text-align:center">Fair</td>
  </tr>
</table>

XML schema[edit]

Some of the built-in data types for an XML schema were introduced in the previous chapter, but still, there are more that are very useful, such as anyURI, date, time, year, and month. In addition to the built-in data types, a custom data type can be defined by the schema designer to accept specific data input. As we have learned, data are defined in XML documents using markup tags defined in an XML schema. However, some elements might not have values. An empty element tag can be used to address this situation. An empty element tag (and any custom markup tag) can contain attributes that add additional information about the tag without adding extra text to the element. An example will be shown in the chapter, using attributes in an empty element tag.

Empty elements with attributes in XML document[edit]

Elements can have different content types depending on how each element is defined in the XML schema. The different types are element content, mixed content, simple content, and empty content. An XML element consists of everything from the start of the element tag to the close of that element tag.

  • An element with element content is the root element - everything in between the opening and closing tags consists of elements only.
Example: <tourGuide>
      :
  </tourGuide>
  • A mixed content element is one that has text and as well as other elements between its opening and closing tags.
Example: <restaurant>My favorite restaurant is
  <restaurantName>Provino's Italian Restaurant</restaurantName>
      :
  </restaurant>
  • A simple content element is one that contains only text between its opening and closing tags.
Example: <restaurantName>Provino's Italian Restaurant</restaurantName>
  • An empty content element, which is an empty element, is one that does not contain anything between its opening and closing tags (or the element tag is opened and ended with a single tag, by using / before the closing of the opening tag.
Example: <hotelPicture filename="pan_pacific.jpg" size="80"
          value="Image of Pan Pacific"/>

An empty element is useful when there is no need to specify its content or that the information describing the element is fixed. Two examples illustrated this concept. First, a picture element that references the source of an image with its attributes, but has no need in specifying text content. Second, the owner’s name is fixed for a company, thus it can specify the related information inside the owner tag using attributes. An attribute is meta-information, information that describes the content of the element.

European Central Bank's use of XML[edit]

<?xml version="1.0" encoding="UTF-8"?>
<gesmes:Envelope xmlns:gesmes="http://www.gesmes.org/xml/2002-08-01" 
xmlns="http://www.ecb.int/vocabulary/2002-08-01/eurofxref">
    <gesmes:subject>Reference rates</gesmes:subject>
    <gesmes:Sender>
        <gesmes:name>European Central Bank</gesmes:name>
    </gesmes:Sender>
    <Cube>
        <Cube time="2004-05-28">
            <Cube currency="USD" rate="1.2246"/>
            <Cube currency="JPY" rate="135.77"/>
            <Cube currency="DKK" rate="7.4380"/>
            <Cube currency="GBP" rate="0.66730"/>
            <Cube currency="SEK" rate="9.1150"/>
            <Cube currency="CHF" rate="1.5304"/>
            <Cube currency="ISK" rate="87.72"/>
            <Cube currency="NOK" rate="8.2120"/>
        </Cube>
    </Cube>
 
<!--For the sake of illustration, some of the currencies are omitted 
in the preceding code.Banks, consultants, currency traders, 
and firms involved in international trade are the major users 
of this information.-->
 
</gesmes:Envelope>

XML schema data types[edit]

Some of the commonly used data types, such as string, decimal, integer, and boolean, are introduced in chapter 2. The following are a few more data types that are useful.

Exhibit 3:Other data types:

Type Format Example Comment
year YYYY 1999  
month YYYY-MM 1999-03 Month type is used when the day is irrelevant for the data element
time hh:mm:ss.sss with optional time zone indicator 20:14:05 Z for UTC or one of –hh:mm or +hh:mm to indicate the difference from UTC. This time type is used when you want the content to represent a particular time of day that recurs every day, such as 4:15 pm.
date YYYY-MM-DD 1999-03-14  
anyURI The domain name specified beginning with http:// http://www.panpacific.com  

More data types[edit]

Besides the built-in data types, custom data types can be created as required. A custom data type can be a simple type or complex type. For simplicity, we create a custom data type that is a simple type, which means that the element does not contain other elements or attributes. It contains text only. The creation of a custom simple type starts from using a built-in simple type and applying it with restrictions, or facets, to limit the acceptable values of the tag. A custom simple type can be nameless or named. If the custom simple type is to be used only once, then it makes sense to not name it; thus, that custom type will only be used in where it is defined. Since a named custom type can be referenced (by its name), that custom type can be used wherever necessary.

A pattern can be used to specify exactly how the content of the element should look. For example, one might want to specify the format of a telephone number, a postal code, or a product code. By having a defined pattern for certain elements, the data exchanged will be uniform and the values will be consistent when stored in a database. A useful way to set patterns is through Regex, which will be discussed in later chapters.

Schema examples[edit]

The following is a schema that extends the schema introduced in the previous chapter to include a one-to-many relationship of city to hotels with two examples of custom data types.

Exhibit 1:Data model for 1:m relationship

1:m relationship - City Hotel

Important, this is a continuing example, so new code is added to the last chapter's example!

Containment example[edit]

 <?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
 
  <!--Tour Guide-->
 
  <xsd:element name="tourGuide">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="city" type="cityDetails" minOccurs="1" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
 
  <!--This will contain the City details-->
 
  <xsd:complexType name="cityDetails">
    <xsd:sequence>
      <xsd:element name="cityName" type="xsd:string"/>
      <xsd:element name="adminUnit" type="xsd:string"/>
      <xsd:element name="country" type="xsd:string"/>
 
      <!--The element Continent uses a Nameless Custom Simple Type-->
 
      <xsd:element name="continent">
        <xsd:simpleType>
          <xsd:restriction base="xsd:string">
            <xsd:enumeration value="Asia"/>
            <xsd:enumeration value="Africa"/>
            <xsd:enumeration value="Australia"/>
            <xsd:enumeration value="Europe"/>
            <xsd:enumeration value="North America"/>
            <xsd:enumeration value="South America"/>
            <xsd:enumeration value="Antarctica"/>
          </xsd:restriction>
        </xsd:simpleType>
      </xsd:element>
      <xsd:element name="population" type="xsd:integer"/>
      <xsd:element name="area" type="xsd:integer"/>
      <xsd:element name="elevation" type="xsd:integer"/>
      <xsd:element name="longitude" type="xsd:decimal"/>
      <xsd:element name="latitude" type="xsd:decimal"/>
      <xsd:element name="description" type="xsd:string"/>
      <xsd:element name="history" type="xsd:string"/>
      <xsd:element name="hotel" type="hotelDetails" minOccurs="1" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
 
  <!-- This will contain the Hotel details-->
 
  <xsd:complexType name="hotelDetails">
    <xsd:sequence>
      <xsd:element name="hotelName" type="xsd:string"/>
      <xsd:element name="hotelPicture"/>
      <xsd:element name="streetAddress" type="xsd:string"/>
      <xsd:element name="postalCode" type="xsd:string" minOccurs="0"/>
      <xsd:element name="phone" type="xsd:string"/>
      <xsd:element name="emailAddress" type="emailAddressType" minOccurs="0"/>
 
      <!-- The custom simple type, emailAddressType, defined in the xsd:complexType, 
           is used as the type of the emailAddress element. -->
 
      <xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
      <xsd:element name="hotelRating" type="xsd:integer"/>
    </xsd:sequence>
  </xsd:complexType>
 
  <!-- NOTE: Since postalCode, emailAddress, and websiteURL are not standard elements that
          must be provided, the minOccurs=”0” indicates that they are optional -->
 
  <!--This is a Named Custom SimpleType that is called from Hotel whenever someone types in an 
      email address-->
 
  <xsd:simpleType name="emailAddressType">
    <xsd:restriction base="xsd:string">
 
      <!--You can learn more about this pattern by reading the Regex section.-->
 
      <xsd:pattern value="\w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:schema>

Intra-document example[edit]

 <?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
 
  <!--Tour Guide-->
 
  <xsd:element name="tourGuide">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="city" type="cityDetails" minOccurs="1" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
 
  <!--This will contain the City details-->
 
  <xsd:complexType name="cityDetails">
    <xsd:sequence>
      <xsd:element name="cityID" type="xsd:ID"/>
      <xsd:element name="cityName" type="xsd:string"/>
      <xsd:element name="adminUnit" type="xsd:string"/>
      <xsd:element name="country" type="xsd:string"/>
 
      <!--The element Continent uses a Nameless Custom Simple Type-->
 
      <xsd:element name="continent">
        <xsd:simpleType>
          <xsd:restriction base="xsd:string">
            <xsd:enumeration value="Asia"/>
            <xsd:enumeration value="Africa"/>
            <xsd:enumeration value="Australia"/>
            <xsd:enumeration value="Europe"/>
            <xsd:enumeration value="North America"/>
            <xsd:enumeration value="South America"/>
            <xsd:enumeration value="Antarctica"/>
          </xsd:restriction>
        </xsd:simpleType>
      </xsd:element>
      <xsd:element name="population" type="xsd:integer"/>
      <xsd:element name="area" type="xsd:integer"/>
      <xsd:element name="elevation" type="xsd:integer"/>
      <xsd:element name="longitude" type="xsd:decimal"/>
      <xsd:element name="latitude" type="xsd:decimal"/>
      <xsd:element name="description" type="xsd:string"/>
      <xsd:element name="history" type="xsd:string"/>
     </xsd:sequence>
  </xsd:complexType>
 
  <!-- This will contain the Hotel details-->
 
  <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="hotel" type="hotelDetails" minOccurs="1" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  <xsd:complexType name="hotelDetails">
    <xsd:sequence>
      <xsd:element name="cityRef" type="xsd:IDRef"/>
      <xsd:element name="hotelName" type="xsd:string"/>
      <xsd:element name="hotelPicture"/>
      <xsd:element name="streetAddress" type="xsd:string"/>
      <xsd:element name="postalCode" type="xsd:string" minOccurs="0"/>
      <xsd:element name="phone" type="xsd:string"/>
      <xsd:element name="emailAddress" type="emailAddressType" minOccurs="0"/>
 
      <!-- The custom simple type, emailAddressType, defined in the xsd:complexType, 
           is used as the type of the emailAddress element. -->
 
      <xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
      <xsd:element name="hotelRating" type="xsd:integer"/>
    </xsd:sequence>
  </xsd:complexType>
 
  <!-- NOTE: Since postalCode, emailAddress, and websiteURL are not standard elements that
          must be provided, the minOccurs=”0” indicates that they are optional -->
 
  <!--This is a Named Custom SimpleType that is called from Hotel whenever someone types in an 
      email address-->
 
  <xsd:simpleType name="emailAddressType">
    <xsd:restriction base="xsd:string">
 
      <!--You can learn more about this pattern by reading the Regex section.-->
 
      <xsd:pattern value="\w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:schema>

Inter-document example[edit]

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
 
  <!--Tour Guide-->
 
  <xsd:element name="tourGuide">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="city" type="cityDetails" minOccurs="1" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
 
  <!--This will contain the City details-->
 
  <xsd:complexType name="cityDetails">
    <xsd:sequence>
      <xsd:element name="cityID" type="xsd:ID"/>
      <xsd:element name="cityName" type="xsd:string"/>
      <xsd:element name="adminUnit" type="xsd:string"/>
      <xsd:element name="country" type="xsd:string"/>
 
      <!--The element Continent uses a Nameless Custom Simple Type-->
 
      <xsd:element name="continent">
        <xsd:simpleType>
          <xsd:restriction base="xsd:string">
            <xsd:enumeration value="Asia"/>
            <xsd:enumeration value="Africa"/>
            <xsd:enumeration value="Australia"/>
            <xsd:enumeration value="Europe"/>
            <xsd:enumeration value="North America"/>
            <xsd:enumeration value="South America"/>
            <xsd:enumeration value="Antarctica"/>
          </xsd:restriction>
        </xsd:simpleType>
      </xsd:element>
      <xsd:element name="population" type="xsd:integer"/>
      <xsd:element name="area" type="xsd:integer"/>
      <xsd:element name="elevation" type="xsd:integer"/>
      <xsd:element name="longitude" type="xsd:decimal"/>
      <xsd:element name="latitude" type="xsd:decimal"/>
      <xsd:element name="description" type="xsd:string"/>
      <xsd:element name="history" type="xsd:string"/>
     </xsd:sequence>
  </xsd:complexType>
  <!-- This will contain the Hotel details-->
 
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
 
  <!--Tour Guide 2-->
 
  <xsd:element name="tourGuide2">
  <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="hotel" type="hotelDetails" minOccurs="1" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  <xsd:complexType name="hotelDetails">
    <xsd:sequence>
      <xsd:element name="cityRef" type="xsd:IDRef"/>
      <xsd:element name="hotelName" type="xsd:string"/>
      <xsd:element name="hotelPicture"/>
      <xsd:element name="streetAddress" type="xsd:string"/>
      <xsd:element name="postalCode" type="xsd:string" minOccurs="0"/>
      <xsd:element name="phone" type="xsd:string"/>
      <xsd:element name="emailAddress" type="emailAddressType" minOccurs="0"/>
 
      <!-- The custom simple type, emailAddressType, defined in the xsd:complexType, 
           is used as the type of the emailAddress element. -->
 
      <xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
      <xsd:element name="hotelRating" type="xsd:integer"/>
    </xsd:sequence>
  </xsd:complexType>
 
  <!-- NOTE: Since postalCode, emailAddress, and websiteURL are not standard elements that
          must be provided, the minOccurs=”0” indicates that they are optional -->
 
  <!--This is a Named Custom SimpleType that is called from Hotel whenever someone types in an 
      email address-->
 
  <xsd:simpleType name="emailAddressType">
    <xsd:restriction base="xsd:string">
 
      <!--You can learn more about this pattern by reading the Regex section.-->
 
      <xsd:pattern value="\w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:schema>

Refers to Chapter 2 - A single entity for steps in using NetBeans to create the above XML schema.

XML document[edit]

Attributes

  • The valid element naming structure applies to attribute names as well
  • In a given element, all attributes’ names must be unique
  • An attribute may not contain the symbol ‘<’ The character string ‘&lt;’ can be used to represent it
  • Each attribute must have a name and a value. (i.e. <hotelPicture filename=“pan_pacific.jpg” />, filename is the name and pan_pacific.jpg is the value)
  • If the assigned value itself contains a quoted string, the type of quotation marks must differ from those used to enclose the entire value. (For instance, if double quotes are used to enclose the whole value then use single quotes for the string: <name familiar=”’Jack’”>John Smith</name>)
  <?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="city_hotel.xsl"?>
<tourGuide xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="TourGuide3.xsd">
 
    <!--This is where you define the first city and all its attributes-->
 
    <city>
        <cityName>Belmopa</cityName>
        <adminUnit>Cayo</adminUnit>
        <country>Belize</country>
 
        <!--The content of the element “continent” must be one of the values specified in the set of 
            acceptable values in the XML schema for the element “continent”-->
 
        <continent>South America</continent>
        <population>11100</population>
        <area>5</area>
        <elevation>130</elevation>
        <longitude>12.3</longitude>
        <latitude>123.4</latitude>
        <description>Belmopan is the capital of Belize</description>
        <history>Belmopan was established following devastation of the former capitol, Belize City ,
            by Hurricane Hattie in 1965. High ground and open space influenced the choice and
            ground-breaking began in 1966. By 1970 most government offices and operations had
            already moved to the new location. </history>
 
        <!--This is where you would store the name of the Hotel and its attributes-->
 
        <!--Notice that the hotelDetails elements did not contain the postalCode entity. The document is 
            still valid, because postalCode is optional-->
 
        <hotel>
            <hotelName>Bull Frog Inn</hotelName>
 
            <!--The empty element, hotelPicture, contains attributes: “filename”, “size”, and “value”, to 
                                indicate the name and location of the image file, the desired size, and 
                                the description of the empty element, hotelPicture-->
 
            <hotelPicture filename="bull_frog_inn.jpg" size="80" value="Image of Bull Frog Inn"
                imageURL="http://www.bullfroginn.com"/>
            <streetAddress>25 Half Moon Avenue</streetAddress>
            <phone>501-822-3425</phone>
 
            <!--The emailAddress elements must match the pattern specified in the schema to be valid -->
 
            <emailAddress>bullfrog@btl.net</emailAddress>
            <websiteURL>http://www.bullfroginn.com/</websiteURL>
            <hotelRating>4</hotelRating>
        </hotel>
 
        <!--This is where you put the information for another Hotel-->
 
        <hotel>
            <hotelName>Pook's Hill Lodge</hotelName>
            <hotelPicture filename="pook_hill_lodge.jpg" size="80" value="Image of Pook's Hill
                Lodge" imageURL="http://www.global-travel.co.uk/pook1.htm"/>
            <streetAddress>Roaring River</streetAddress>
            <phone>440-126-854-1732</phone>
            <emailAddress>info@global-travel.co.uk</emailAddress>
            <websiteURL>http://www.global-travel.co.uk/pook1.htm</websiteURL>
            <hotelRating>3</hotelRating>
        </hotel>
    </city>
 
    <!--This is where you define another city and its attributes-->
 
    <city>
        <cityName>Kuala Lumpur</cityName>
        <adminUnit>Selangor</adminUnit>
        <country>Malaysia</country>
        <continent>Asia</continent>
        <population>1448600</population>
        <area>243</area>
        <elevation>111</elevation>
        <longitude>101.71</longitude>
        <latitude>3.16</latitude>
        <description>Kuala Lumpur is the capital of Malaysia and is the largest city in the nation.
        </description>
        <history>The city was founded in 1857 by Chinese tin miners and superseded Klang. In 1880
            the British government transferred their headquarters from Klang to Kuala Lumpur , and
            in 1896 it became the capital of Malaysia. </history>
 
        <!--This is where you put the information for a Hotel-->
 
        <hotel>
            <hotelName>Pan Pacific Kuala Lumpur </hotelName>
            <hotelPicture filename="pan_pacific.jpg" size="80" value="Image of Pan Pacific"
             imageURL="http://www.malaysia-hotels-discount.com/hotels/kualalumpur/pan_pacific_hotel/index.shtml"/>
            <streetAddress>Jalan Putra</streetAddress>
            <postalCode>50746</postalCode>
            <phone>1-866-260-0402</phone>
            <emailAddress>president@panpacific.com</emailAddress>
            <websiteURL>http://www.panpacific.com</websiteURL>
            <hotelRating>5</hotelRating>
        </hotel>
 
        <!--This is where you put the information for another Hotel-->
 
        <hotel>
            <hotelName>Mandarin Oriental Kuala Lumpur </hotelName>
            <hotelPicture filename="mandarin_oriental.jpg" size="80" value="Image of Mandarin
                Oriental" imageURL="http://www.mandarinoriental.com/kualalumpur"/>
            <streetAddress>Kuala Lumpur City Centre</streetAddress>
            <postalCode>50088</postalCode>
            <phone>011-603-2380-8888</phone>
            <emailAddress>mokul-sales@mohg.com</emailAddress>
            <websiteURL>http://www.mandarinoriental.com/kualalumpur/</websiteURL>
            <hotelRating>5</hotelRating>
        </hotel>
    </city>
</tourGuide>

Table 3-2: XML Document for a one-to-many relationship – city_hotel.xml

Refers to Chapter 2 - A single entity for steps in using NetBeans to create the above XML document.

XML style sheet[edit]

<?xml version="1.0" encoding="UTF-8"?>
  <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="html"/>    
        <xsl:template match="/">
            <html>
                <head>
                    <title>Tour Guide</title>
                </head>
                <body>
                    <h2>Cities</h2>
                    <xsl:apply-templates select="tourGuide"/>
                </body>
            </html>
        </xsl:template>
        <xsl:template match="tourGuide">
            <xsl:for-each select="city">
                <xsl:text>City: </xsl:text>
                <xsl:value-of select="cityName"/>
                <br/>
                <xsl:text>Population: </xsl:text>
                <xsl:value-of select="population"/>
                <br/>
                <xsl:text>Country: </xsl:text>
                <xsl:value-of select="country"/>
                <br/>
 
                <xsl:for-each select="hotel">
                    <xsl:text>Hotel: </xsl:text>
                    <xsl:value-of select="hotelName"/>
                    <br/>
                </xsl:for-each>
 
               <br/>
            </xsl:for-each>     
        </xsl:template>    
  </xsl:stylesheet>

Summary[edit]

Besides the simple built-in data types (e.g, year, month, time, anyURI, and date) schema designers may create custom data types to suit their needs. A simple custom data type can be created from one of the built-in data types by applying to it some restrictions, facets (enumerations that specify a set of acceptable values), or specific patterns.

An empty element does not contain any text, however, it may contain attributes to provide additional information about that element.

The presentation layout for displaying a HTML page can include code for style tags, background color, font size, font weight, and alignment. Table tags can be used to organize the layout of content in a HTML page, and images can also be displayed using an image tag.


Exercises[edit]

In order to learn more about the one-to-many relationship, exercises are provided.

Answers[edit]

In order to learn more about the one-to-many relationship, answers are provided to go with the exercises above.


Xml book cover wiki.png XML - Managing Data Exchange
Chapters
Appendices
Exercises
Related Topics
Computer Science Home
Library and Information Science Home
Markup Languages
Get Involved
To do list
Contributors list
Contributing to Wikibooks
Previous Chapter Next Chapter
The one-to-many relationship The many-to-many relationship



Learning objectives

  • Create a schema for a data model containing a 1:1 relationship
  • Place restrictions on elements or attributes in an XML schema
  • Specify fixed or default values for an element in an XML schema


Introduction[edit]

In the previous chapter, some new features of XML schemas, documents, and stylesheets were introduced as well as how to model a one-to-many relationship. In this chapter, we will introduce the modeling of a one-to-one relationship in XML. We will also introduce more features of an XML schema.


A one-to-one (1:1) relationship[edit]

The following diagram shows a one-to-one and a one-to-many relationship. The one-to-one relationship records each country as a single top destination.


Xmldm1to1.png

Exhibit 4-1: Data model for a 1:1 relationship

XML schema[edit]

A one-to-one (1:1) relationship is represented in the data model in Exhibit 4-1. The addition of country and destination to the data model allows the 1:1 relationship named topDestination. A country has many different destinations, but only one top destination. The XML schema in Exhibit 4-2 shows how to represent a 1:1 relationship in an XML schema.

XML schema example[edit]

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified"> 
<!--
Tour Guide
--> 
 <xsd:element name="tourGuide"> 
  <xsd:complexType> 
   <xsd:sequence> 
    <xsd:element name="country" type="countryDetails" minOccurs="1" maxOccurs="unbounded" /> 
   </xsd:sequence> 
  </xsd:complexType> 
 </xsd:element> 
<!--
Country
--> 
 <xsd:complexType name="countryDetails"> 
  <xsd:sequence> 
   <xsd:element name="countryName" type="xsd:string" minOccurs="1" maxOccurs="1"/> 
   <xsd:element name="population" type="xsd:integer" minOccurs="0" maxOccurs="1" default="0"/> 
   <xsd:element name="continent" minOccurs="0" maxOccurs="1"> 
    <xsd:simpleType> 
     <xsd:restriction base="xsd:string"> 
      <xsd:enumeration value="Asia"/> 
      <xsd:enumeration value="Africa"/> 
      <xsd:enumeration value="Australasia"/> 
      <xsd:enumeration value="Europe"/> 
      <xsd:enumeration value="North America"/> 
      <xsd:enumeration value="South America"/> 
      <xsd:enumeration value="Antarctica"/> 
     </xsd:restriction> 
    </xsd:simpleType> 
   </xsd:element> 
   <xsd:element name="topDestination" type="destinationDetails" minOccurs="0" maxOccurs="1"/> 
   <xsd:element name="destination" type="destinationDetails" minOccurs="0" maxOccurs="unbounded"/> 
  </xsd:sequence> 
 </xsd:complexType> 
<!--
Destination
--> 
 <xsd:complexType name="destinationDetails"> 
  <xsd:all> 
   <xsd:element name="destinationName" type="xsd:string"/> 
   <xsd:element name="description" type="xsd:string"/> 
   <xsd:element name="streetAddress" type="xsd:string" minOccurs="0"/> 
   <xsd:element name="telephoneNumber" type="xsd:string" minOccurs="0"/> 
   <xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/> 
  </xsd:all> 
 </xsd:complexType> 
</xsd:schema>


Exhibit 4-2: XML Schema for a one-to-one relationship

New elements in schema[edit]


Let’s examine the new elements and attributes in the schema in Exhibit 4-2.

  • Country is a complex type defined in City to represent the 1:M relationship between a country and its cities.
  • Destination is a complex type defined in Country to represent the 1:M relationship between a country and its many destinations.
  • topDestination is a complex type defined in Country to represent the 1:1 relationship between a country and its top destination.

Restrictions in schema[edit]


Placing restrictions on elements was introduced in the previous chapter; however, there are more potentially useful restrictions that can be placed on an element. Restrictions can be placed on elements and attributes that affect how the processor handles whitespace characters:

<xsd:element name="streetAddress">
 <xsd:simpleType>
  <xsd:restriction base="xsd:string">
   <xsd:whiteSpace value="preserve"/>
  </xsd:restriction>
 </xsd:simpleType>
</xsd:element>

White space & length constraints[edit]

The whiteSpace constraint is set to "preserve", which means that the XML processor will not remove any white space characters. Other useful restrictions include the following:

  • Replace – the XML processor will replace all whitespace characters with spaces.
<xsd:whiteSpace value="replace"/>
  • Collapse – The processor will remove all whitespace characters.
<xsd:whiteSpace value="collapse"/>
  • Length, maxLength, minLength—the length of the element can be fixed or can have a predefined range.
<xsd:length value="8"/>
<xsd:minLength value="5"/>
<xsd:maxLength value="8"/>

Order indicators[edit]

In addition to placing restrictions on elements, order indicators can be used to define in what order elements should occur.

All indicator[edit]

The <all> indicator specifies by default that the child elements can appear in any order and that each child element must occur once and only once:

<xsd:element name="person">
 <xsd:complexType>
  <xsd:all>
   <xsd:element name="firstname" type="xsd:string"/>
   <xsd:element name="lastname" type="xsd:string"/>
  </xsd:all>
 </xsd:complexType>
</xsd:element>

Choice indicator[edit]

The <choice> indicator specifies that either one child element or another can occur:

<xsd:element name="person">
 <xsd:complexType>
  <xsd:choice>
   <xsd:element name="employee" type="employee"/>
   <xsd:element name="visitor" type="visitor"/>
  </xsd:choice>
 </xsd:complexType>
</xsd:element>

Sequence indicator[edit]

The <sequence> indicator specifies that the child elements must appear in a specific order:

<xsd:element name="person">
 <xsd:complexType>
  <xsd:sequence>
   <xsd:element name="firstname" type="xsd:string"/>
   <xsd:element name="lastname" type="xsd:string"/>
  </xsd:sequence>
 </xsd:complexType>
</xsd:element>

XML document[edit]


The XML document in Exhibit 4-3 shows how the new elements (country and destination) defined in the XML schema found in Exhibit 4-2 are used in an XML document. Note that the child elements of <topDestination> can appear in any order because of the <xsd:all> order indicator used in the schema.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="newXMLSchema.xsl" media="screen"?>
<tourGuide xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="XMLSchema.xsd">   
<!--
Malaysia
-->   
<country> 
   <countryName>Malaysia</countryName> 
   <population>22229040</population> 
   <continent>Asia</continent> 
   <topDestination> 
    <description>A popular duty-free island north of Penang.</description> 
    <destinationName>Pulau Langkawi</destinationName> 
   </topDestination> 
   <destination> 
    <destinationName>Muzium Di-Raja</destinationName> 
    <description>The original palace of the Sultan</description>
    <streetAddress>122 Muzium Road</streetAddress>
    <telephoneNumber>48494030</telephoneNumber>
    <websiteURL>www.muziumdiraja.com</websiteURL> 
   </destination> 
   <destination> 
    <destinationName>Kinabalu National Park</destinationName> 
    <description>A national park</description>
    <streetAddress>54 Ocean View Drive</streetAddress>
    <telephoneNumber>4847101</telephoneNumber>
    <websiteURL>www.kinabalu.com</websiteURL> 
   </destination> 
  </country>
<!--
Belize
--> 
  <country> 
   <countryName>Belize</countryName> 
   <population>249183</population> 
   <continent>South America</continent> 
   <topDestination> 
    <destinationName>San Pedro</destinationName> 
    <description>San Pedro is an island off the coast of Belize</description> 
   </topDestination> 
   <destination> 
    <destinationName>Belize City</destinationName> 
    <description>Belize City is the former capital of Belize</description>
    <websiteURL>www.belizecity.com</websiteURL> 
   </destination> 
   <destination> 
    <destinationName>Xunantunich</destinationName> 
    <description>Mayan ruins</description>
    <streetAddress>4 High Street</streetAddress>
    <telephoneNumber>011770801</telephoneNumber> 
   </destination> 
  </country> 
  </tourGuide>

Exhibit 4-3: XML Document for a one-to-one relationship

Summary[edit]

Schema designers may place restrictions on the length of elements and on how the processor handles white space. Schema designers may also specify fixed or default values for an element. Order indicators can be used to specify the order in which elements must appear in an XML document.

Exercises[edit]


Answers[edit]



Xml book cover wiki.png XML - Managing Data Exchange
Chapters
Appendices
Exercises
Related Topics
Computer Science Home
Library and Information Science Home
Markup Languages
Get Involved
To do list
Contributors list
Contributing to Wikibooks
Previous Chapter Next Chapter
The one-to-one relationship Recursive relationships



Learning objectives
  • Learn different methods to represent a many-to-many relationship using XML
  • Create XML schemas using the "Eliminate" and "ID/IDREF" methods to structure content based on a many-to-many relationship
  • Create the corresponding XML documents for the "Eliminate" and "ID/IDREF" methods
  • Learn to use the key function in an XML stylesheet to format data structured with the "ID/IDREF" method
  • Create a basic XML stylesheet that incorporates the key function


Introduction[edit]

In the previous chapters, you learned how to use XML to structure and format data based on one-to-one and one-to-many relationships. Because XML provides the means to model data using hierarchical parent-child relationships, the one-to-one and one-to-many relationships are relatively simple to represent in XML. However, this hierarchical parent-child structure is difficult to use to model the many-to-many relationship, a common relationship between entities in many situations.

In this chapter, we will explore the pros and cons of a few methods that are used to model a many-to-many relationship in XML; these methods offer compromises in overcoming the problems that arise when applying this relationship to XML. In particular, we will see examples of how to model the many-to-many relationship using two different methods, "Eliminate" and "ID/IDREF." Additionally, in the XML stylesheet, we will learn how to implement the key function to display the data that was modeled using the "ID/IDREF" method.

Problems: many-to-many relationship[edit]

In XML, the parent-child relationship is most commonly used to represent a relationship. This can easily be applied to a one-to-one or one-to-many relationship. A many-to-many relationship is not supported directly by XML; the parent-child relationship will not work as each element may only have a single parent element. There are couple of possible solutions to get around this.

Solutions: many-to-many relationship[edit]

Eliminate[edit]

Create XML documents that eliminate the need for a many-to-many relationship
By limiting the extent of information that is conveyed, you can get around the need for a many-to-many relationship. Instead of trying to have one XML document encompass all of the information, separate the information where one document describes only one of the entities that participates in the many-to-many relationship. Using our tourGuide relationship for example, one way for us to accomplish this would be creating a separate XML document for each hotel. The relationship with amenity would ultimately then become a one-to-many. This method is more suitable for situations in which the scope of data exchange can be limited to subsets of data. However, using this method for more broadly scoped data exchange, you may repeat data several times, especially if there are many attributes. To avoid this redundancy, use the ID/IDREF method.

ID/IDREF[edit]

Represent the many-to-many relationship using unique identifiers
Although not the most user-friendly way to handle this problem, one way of getting around the many-to-many relationship is by creating keys that would uniquely identify each entity. To do this, an element with ID or IDREF attributes-types must be specified within the XML schema. To use a data modeling analogy, ID is similar to the primary key, and IDREF is similar to the foreign key.

Many-to-many relationship data model[edit]

Exhibit 1: Data model for a m:m relationship
Data Model for a m:m relationship

The relationship reads, a hotel can have many amenities, and an amenity can exist at many hotels.

As you will notice, in order to represent a many-to-many relationship, two entities were added. The middle entity is necessary for the data model to represent an associative entity that stores data about the relationship between hotel and amenity. Using our Tour Guide example, "Amenity" was added to represent a list of possible amenities that a hotel can possess.

The following examples illustrate methods to represent a many-to-many relationship in XML.

Eliminate: sample solution[edit]

In this example, the many-to-many relationship has been converted to a one-to-many relationship.

XML schema[edit]

Exhibit 2: XML schema for "Eliminate" method

<?xml version="1.0" encoding="UTF-8" ?>
<!--
     Document   : amenity1.xsd
     Created on : February 4, 2006
     Author     : Dr. Rick Watson
-->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
    <xsd:element name="hotelGuide">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="hotel" type="hotelDetails" minOccurs="1" maxOccurs="unbounded"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
    <xsd:simpleType name="emailAddressType">
        <xsd:restriction base="xsd:string">
            <xsd:pattern value="\w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*"/>
        </xsd:restriction>
    </xsd:simpleType>
    <xsd:complexType name="hotelDetails">
        <xsd:sequence>
            <xsd:element name="hotelPicture"/>
            <xsd:element name="hotelName" type="xsd:string"/>
            <xsd:element name="streetAddress" type="xsd:string"/>
            <xsd:element name="postalCode" type="xsd:string" minOccurs="0"/>
            <xsd:element name="telephoneNumber" type="xsd:string"/>
            <xsd:element name="emailAddress" type="emailAddressType" minOccurs="0"/>
            <xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
            <xsd:element name="hotelRating" type="xsd:integer" default="0"/>
            <xsd:element name="lowerPrice" type="xsd:positiveInteger"/>
            <xsd:element name="upperPrice" type="xsd:positiveInteger"/>
            <xsd:element name="amenity" type="amenityValue" minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>
    <xsd:complexType name="amenityValue">
        <xsd:sequence>
            <xsd:element name="amenityType" type="xsd:string"/>
            <xsd:element name="amenityOpenHour" type="xsd:time"/>
            <xsd:element name="amenityCloseHour" type="xsd:time"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:schema>

XML document[edit]

Exhibit 3: XML document for "Eliminate" method

<?xml version="1.0" encoding="UTF-8"?>
<!--
     Document   : amenity1.xml
     Created on : February 4, 2006
     Author     : Dr. Rick Watson
-->
<hotelGuide xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="amenity1.xsd">
    <hotel>
        <hotelPicture/>
        <hotelName>Narembeen Hotel</hotelName>
        <streetAddress>Churchill Street</streetAddress>
        <telephoneNumber>+61 (08) 9064 7272</telephoneNumber>
        <emailAddress>narempub@oz.com.au</emailAddress>
        <hotelRating>1</hotelRating>
        <lowerPrice>50</lowerPrice>
        <upperPrice>100</upperPrice>
        <amenity>
            <amenityType>Restaurant</amenityType>
            <amenityOpenHour>06:00:00</amenityOpenHour>
            <amenityCloseHour>22:00:00 </amenityCloseHour>
        </amenity>
        <amenity>
            <amenityType>Pool</amenityType>
            <amenityOpenHour>06:00:00</amenityOpenHour>
            <amenityCloseHour>18:00:00 </amenityCloseHour>
        </amenity>
        <amenity>
            <amenityType>Complimentary Breakfast</amenityType>
            <amenityOpenHour>07:00:00</amenityOpenHour>
            <amenityCloseHour>10:00:00 </amenityCloseHour>
        </amenity>
    </hotel>
    <hotel>
        <hotelPicture/>
        <hotelName>Narembeen Caravan Park</hotelName>
        <streetAddress>Currall Street</streetAddress>
        <telephoneNumber>+61 (08) 9064 7308</telephoneNumber>
        <emailAddress>naremcaravan@oz.com.au</emailAddress>
        <hotelRating>1</hotelRating>
        <lowerPrice>20</lowerPrice>
        <upperPrice>30</upperPrice>
        <amenity>
            <amenityType>Pool</amenityType>
            <amenityOpenHour>10:00:00</amenityOpenHour>
            <amenityCloseHour>22:00:00 </amenityCloseHour>
        </amenity>
    </hotel>
</hotelGuide>

ID/IDREF: sample solution[edit]

To avoid redundancy, we create a separate element, "amenity," which is included at the top of the schema along with "hotel." Remember, the data types ID and IDREF are synonymous with the primary key and foreign key, respectively. For every foreign key (IDREF), there must be a matching primary key (ID). Note that the IDREF data type has to be an alphanumeric string.

The following example illustrates the ID/IDREF approach. Notice that the ID for the amenity pool is defined as "k1," and every hotel with a pool as an amenity references "k1," using IDREF. If the IDREF does not match any ID, then the document will not validate.

XML schema[edit]

Exhibit 4: XML schema for "ID/IDREF" method

<?xml version="1.0" encoding="UTF-8" ?>
<!--
     Document   : amenity2.xsd
     Created on : February 4, 2006
     Author     : Dr. Rick Watson
-->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
    <xsd:element name="hotelGuide">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="hotel" type="hotelDetails" minOccurs="1" maxOccurs="unbounded"/>
                <xsd:element name="amenity" type="amenityList" minOccurs="1" maxOccurs="unbounded"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
    <xsd:simpleType name="emailAddressType">
        <xsd:restriction base="xsd:string">
            <xsd:pattern value="\w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*"/>
        </xsd:restriction>
    </xsd:simpleType>
    <xsd:complexType name="hotelDetails">
        <xsd:sequence>
            <xsd:element name="hotelPicture"/>
            <xsd:element name="hotelName" type="xsd:string"/>
            <xsd:element name="streetAddress" type="xsd:string"/>
            <xsd:element name="postalCode" type="xsd:string" minOccurs="0"/>
            <xsd:element name="telephoneNumber" type="xsd:string"/>
            <xsd:element name="emailAddress" type="emailAddressType" minOccurs="0"/>
            <xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
            <xsd:element name="hotelRating" type="xsd:integer" default="0"/>
            <xsd:element name="lowerPrice" type="xsd:positiveInteger"/>
            <xsd:element name="upperPrice" type="xsd:positiveInteger"/>
            <xsd:element name="amenities" type="amenityDesc" minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>
    <xsd:complexType name="amenityDesc">
        <xsd:sequence>
            <xsd:element name="amenityIDREF" type="xsd:IDREF"/>
            <xsd:element name="amenityOpenHour" type="xsd:time"/>
            <xsd:element name="amenityCloseHour" type="xsd:time"/>
        </xsd:sequence>
    </xsd:complexType>
    <xsd:complexType name="amenityList">
        <xsd:sequence>
            <xsd:element name="amenityID" type="xsd:ID"/>
            <xsd:element name="amenityType" type="xsd:string"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:schema>

XML document[edit]

Exhibit 5: XML document for "ID/IDREF" method

<?xml version="1.0" encoding="UTF-8"?>
<!--
     Document   : amenity2.xml
     Created on : February 4, 2006
     Author     : Dr. Rick Watson
-->
<?xml-stylesheet href="amenity2.xsl" type="text/xsl" media="screen"?>
<hotelGuide xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="amenity2.xsd">
    <hotel>
        <hotelPicture/>
        <hotelName>Narembeen Hotel</hotelName>
        <streetAddress>Churchill Street</streetAddress>
        <telephoneNumber>+61 (08) 9064 7272</telephoneNumber>
        <emailAddress>narempub@oz.com.au</emailAddress>
        <hotelRating>1</hotelRating>
        <lowerPrice>50</lowerPrice>
        <upperPrice>100</upperPrice>
        <amenities>
            <amenityIDREF>k2</amenityIDREF>
            <amenityOpenHour>06:00:00</amenityOpenHour>
            <amenityCloseHour>22:00:00 </amenityCloseHour>
        </amenities>
        <amenities>
            <amenityIDREF>k1</amenityIDREF>
            <amenityOpenHour>06:00:00</amenityOpenHour>
            <amenityCloseHour>18:00:00 </amenityCloseHour>
        </amenities>
        <amenities>
            <amenityIDREF>k5</amenityIDREF>
            <amenityOpenHour>07:00:00</amenityOpenHour>
            <amenityCloseHour>10:00:00 </amenityCloseHour>
        </amenities>
    </hotel>
    <hotel>
        <hotelPicture/>
        <hotelName>Narembeen Caravan Park</hotelName>
        <streetAddress>Currall Street</streetAddress>
        <telephoneNumber>+61 (08) 9064 7308</telephoneNumber>
        <emailAddress>naremcaravan@oz.com.au</emailAddress>
        <hotelRating>1</hotelRating>
        <lowerPrice>20</lowerPrice>
        <upperPrice>30</upperPrice>
        <amenities>
            <amenityIDREF>k1</amenityIDREF>
            <amenityOpenHour>10:00:00</amenityOpenHour>
            <amenityCloseHour>22:00:00 </amenityCloseHour>
        </amenities>
    </hotel>
    <amenity>
        <amenityID>k1</amenityID>
        <amenityType>Pool</amenityType>
    </amenity>
    <amenity>
        <amenityID>k2</amenityID>
        <amenityType>Restaurant</amenityType>
    </amenity>
    <amenity>
        <amenityID>k3</amenityID>
        <amenityType>Fitness room</amenityType>
    </amenity>
    <amenity>
        <amenityID>k4</amenityID>
        <amenityType>Complimentary breakfast</amenityType>
    </amenity>
    <amenity>
        <amenityID>k5</amenityID>
        <amenityType>in-room data port</amenityType>
    </amenity>
    <amenity>
        <amenityID>k6</amenityID>
        <amenityType>Water slide</amenityType>
    </amenity>
</hotelGuide>

Key function: XML stylesheet[edit]

In order to set up an XML stylesheet using the ID/IDREF method for a many-to-many relationship, the key function should be used. In the stylesheet, the <xsl:key> element specifies the index, which is used to return a node-set from the XML document.

A key consists of the following:

1. the node that has the key
2. the name of the key
3. the value of a key

The following XML stylesheet illustrates how to use the key function to present content that is structured in a many-to-many relationship.

XML stylesheet[edit]

Exhibit 6: XML stylesheet for "ID/IDREF" method

<?xml version="1.0" encoding="UTF-8"?>
<!--
     Document   : amenity2.xsl
     Created on : February 4, 2006
     Author     : Dr. Rick Watson
-->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:key name="amList" match="amenity" use="amenityID"/>
    <xsl:output method="html"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Hotel Guide</title>
            </head>
            <body>
                <h2>Hotels</h2>
                <xsl:apply-templates select="hotelGuide"/>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="hotelGuide">
        <xsl:for-each select="hotel">
            <xsl:value-of select="hotelName"/>
            <br/>
            <xsl:for-each select="amenities">
                <xsl:value-of select="key('amList',amenityIDREF)/amenityType"/>
                <xsl:text>   </xsl:text>
                <xsl:value-of select="amenityOpenHour"/> - 
                <xsl:value-of select="amenityCloseHour"/>
                <BR/>
            </xsl:for-each>
            <br/>
            <br/>
        </xsl:for-each>
        <br/>
    </xsl:template>
</xsl:stylesheet>

Expedia.de: XML and affiliate marketing[edit]

Expedia.de is the German subsidiary of expedia.com, the internet-based travel agency headquartered in Bellevue, Washington, USA. It offers its customers the booking of airline tickets, car rentals, vacation packages and various other attractions and services via its website and by phone. Its websites attract more than 70 million visitors each month. Currently expedia.com employs 4.600 employees serving customers in the United States, Canada, the UK, France, Germany, Italy, and Australia.

For marketing purposes expedia.de set up an affiliate marketing program. Affiliate marketing is a way to reach potential customers without any financial risk for the company intending to advertise (merchant). The merchant gives website owners, which are called affiliates, the opportunity to refer to the merchant page, offering commission-based monetary rewards as incentives. In the case of Expedia.de the affiliate partners receive a commission every time users from their websites book travel on expedia.de. So the affiliates can concentrate on selling and the merchant takes care of handling the transactions.

To ease the business of the affiliate partners – and of course to make the program more attractive – Expedia.de offers its partners a service called xmlAdEd. xmlAdEd is a service providing current product information on using XML. Affiliates using this service are able to request more than 8 million of travel offerings in XML format via HTTP-request. The data is updated several times a day. In the HTTP-request you can set certain parameters such as location, price, airport code, …

The use of XML in this case gives the affiliates several advantages:
- Efficient and flexible processing of the data because of separation of structure, content and style.
- Platform-independent processing of the data.
- Lossless conversion into other file formats.
- Easy integration in their websites.
- Possibility to create an own web shop in individual design

By providing their affiliates product information in XML, expedia.de not only eases the business of their partners, but also ensures that customers receive consistent, up-to-date information on their services.

Summary[edit]

When describing a many-to-many relationship in XML, there are a few solutions available for designers to use. In choosing how to represent the many-to-many relationship, the designer not only must consider the most efficient way to represent the information, but also the audience for which the document is intended and how the document will be used.

References[edit]

http://www-128.ibm.com/developerworks/xml/library/x-xdm2m.html

http://www.w3.org/TR/xslt#key

Exercises[edit]

Answers[edit]



Xml book cover wiki.png XML - Managing Data Exchange
Chapters
Appendices
Exercises
Related Topics
Computer Science Home
Library and Information Science Home
Markup Languages
Get Involved
To do list
Contributors list
Contributing to Wikibooks
Previous Chapter Next Chapter
The many-to-many relationship Data schemas



Learning objectives

  • Understand the concept of a recursive relationship
  • Create a schema for a one-to-one recursive relationship
  • Create a schema for a one-to-many recursive relationship
  • Create a schema for a many-to-many recursive relationship
  • Define a unique identifier in a schema
  • Create a primary key/foreign key relationship




Introduction[edit]

Recursive relationships are an interesting and more complex concept than the relationships you have seen in the previous chapters. A recursive relationship occurs when there is a relationship between an entity and itself. For example, a one-to-many recursive relationship occurs when an employee is the manager of other employees. The employee entity is related to itself, and there is a one-to-many relationship between one employee (the manager) and many other employees (the people who report to the manager). Because of the more complex nature of these relationships, we will need slightly more complex methods of mapping them to a schema and displaying them in a style sheet.

The one-to-one recursive relationship[edit]

Continuing with the tour guide model, we will develop a schema that shows cities that have hosted the Olympics and the previous host city. Since the previous host is another city and only one city can be the previous host this is a one to one recursive relationship.

Recursive


host.xsd (XML schema for a one-to-one recursive model)[edit]

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
attributeFormDefault="unqualified">
 
<xsd:element name="cities">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element name="city" type="cityType" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element>
 
<xsd:complexType name="cityType">
  <xsd:sequence>
    <xsd:element name="cityID" type="xsd:ID"/>
     <xsd:element name="cityName" type="xsd:string"/>
     <xsd:element name="cityCountry" type="xsd:string"/>
     <xsd:element name="cityPop" type="xsd:integer"/>
     <xsd:element name="cityHostYr" type="xsd:integer"/>
     <xsd:element name="cityPreviousHost" type="xsd:IDREF" minOccurs="0" maxOccurs="1"/>
  </xsd:sequence>
</xsd:complexType>
</xsd:schema>

Exhibit 1: XML schema for Host City Entity

host.xml (XML document for a one-to-one recursive model)[edit]

<?xml version="1.0" encoding="UTF-8"?>
<cities xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:noNamespaceSchemaLocation='host.xsd'>
  <city>
    <cityID>c1</cityID>
    <cityName>Atlanta</cityName>
    <cityCountry>USA</cityCountry>
    <cityPop>4000000</cityPop>
    <cityHostYr>1996</cityHostYr>    
  </city>
  <city>
    <cityID>c2</cityID>
    <cityName>Sydney</cityName>
    <cityCountry>Australia</cityCountry>
    <cityPop>4000000</cityPop>
    <cityHostYr>2000</cityHostYr>  
    <cityPreviousHost>c1</cityPreviousHost>   
  </city>
  <city>
    <cityID>c3</cityID>
    <cityName>Athens</cityName>
    <cityCountry>Greece</cityCountry>
    <cityPop>3500000</cityPop>
    <cityHostYr>2004</cityHostYr>  
    <cityPreviousHost>c2</cityPreviousHost>   
  </city>
</cities>

Exhibit 2: XML Document for Olympic Host City

The one-to-many recursive relationship[edit]

A hypothetical sports team is divided into squads with each squad having a captain. Every person on the team is a player, regardless of whether they are a squad captain. Since a squad captain is a player, this situation meets the definition of a recursive relationship—a squad captain is also a player and has a one-to-many relationship with the other players. This is a one-to-many recursive relationship because one captain has many players under him/her. See the example below for how to model the relationship.

team.xsd (XML schema for a one-to-many recursive model)[edit]

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
<xsd:element name="team">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element name="player" type="playerType" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element>
  <xsd:complexType name="playerType">
    <xsd:sequence>
      <xsd:element name="playerID" type="xsd:ID"/>
      <xsd:element name="playerName" type="xsd:string"/>
      <xsd:element name="playerCap" type="playerC" minOccurs="0" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
  <xsd:complexType name="playerC">
    <xsd:sequence>
      <xsd:element name="memberOf" type="xsd:IDREF"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:schema>

Exhibit 3: XML schema for Team Entity

team.xml (XML document for a one-to-many recursive model)[edit]

<?xml version="1.0" encoding="UTF-8"?>
<team xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:noNamespaceSchemaLocation='Recursive1toMSchema.xsd'>
<player>
   <playerID>c1</playerID>
   <playerName>Tommy Jones</playerName>
   <playerCap>
      <memberof>c3</memberof>
   </playerCap>
</player>
<player>
   <playerID>c2</playerID>
   <playerName>Eddie Thomas</playerName>
   <playerCap>
      <memberof>c3</memberof>
   </playerCap>
</player>
<player>
   <playerID>c3</playerID>
   <playerName>Sean McCombs</playerName>
</player>
<player>
   <playerID>c4</playerID>
   <playerName>Patrick O’Shea</playerName>
   <playerCap>
      <memberof>c3</memberof>
    </playerCap>
</player>
</team>

Exhibit 4: XML Document for Team Entity

Natural one-to-many recursive structure[edit]

A more natural approach for most one-to-many recursive relationships is to use XML's heirarchical nature to directly represent the heirarchy. Consider Locations:

<?xml version="1.0" encoding="UTF-8"?>
<location type="country">
  <name>USA</name>
  <sub-locations>
    <location type="state">
      <name>Ohio</name>
      <sub-locations>
        <location type="city"><name>Akron</name></location>
        <location type="city"><name>Columbus</name></location>
      </sub-location>
    </location>
  </sub-locations>
</location>

The many-to-many recursive relationship[edit]

Think you're getting a feel for recursive relationships yet? Well, there is still the third and final relationship to add to your repertoire — the many-to-many recursive. A common example of a many-to-many recursive relationship is when one item can be comprised of many items of the same data type as itself, and each of those sub-items may belong to another parent item of the same data type. Sound confusing? Let's look at the example of a product that can consist of a single item or multiple items (i.e., a packaged product). The example below describes tourist products that can be packaged together to create a new product.

product.xsd (XML schema for a many-to-many recursive model)[edit]

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
    <xsd:element name="products">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="product" type="prodType" maxOccurs="unbounded"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
    <xsd:complexType name="prodType">
        <xsd:sequence>
            <xsd:element name="prodID" type="xsd:ID"/>
            <xsd:element name="prodName" type="xsd:string"/>
            <xsd:element name="prodCost" type="xsd:decimal" minOccurs="0"/>
            <xsd:element name="prodPrice" type="xsd:decimal"/>
            <xsd:element name="components" type="componentsType" minOccurs="0" maxOccurs="1"/>
        </xsd:sequence>
    </xsd:complexType>
    <xsd:complexType name="componentsType">
        <xsd:sequence>
            <xsd:element name="component" type="xsd:IDREF"/>
            <xsd:element name="componentqty" type="xsd:integer"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:schema>

Exhibit 5: XML schema for Product Entity

product.xml (XML document for a many-to-many recursive model)[edit]

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="product.xsl"?>
<products xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="product.xsd">
    <product>
        <prodID>p1000</prodID>
        <prodName>Animal photography kit</prodName>
        <prodPrice>725</prodPrice>
        <components>
            <component>p101</component>
            <componentqty>1</componentqty>
        </components>
    </product>
    <product>
        <prodID>p101</prodID>
        <prodName>Camera case</prodName>
        <prodCost>150</prodCost>
        <prodPrice>300</prodPrice>
    </product>
</products>

Exhibit 6: XML Document for Product Entity

Summary[edit]

When the child has the same type of data as its parent in a parent-child type data relationship, this is a sign of the existence of a recursive relationship. The xsd:ID and xsd:IDREF elements can be used in a schema to create primary key-foreign key values in an XML document.


Exercises[edit]


Answers[edit]

External Links



Xml book cover wiki.png XML - Managing Data Exchange
Chapters
Appendices
Exercises
Related Topics
Computer Science Home
Library and Information Science Home
Markup Languages
Get Involved
To do list
Contributors list
Contributing to Wikibooks
Previous Chapter Next Chapter
Recursive relationships DTD




Learning objectives

  • Overview of Data Schemas
  • Starting your schema the right way
  • Entities in general
  • The Parent Child Structure
  • Attributes and Restrictions
  • Ending your schema the right way

Initiated by:

The University of Georgia

Terry College of Business

Department of Management Information Systems


Introduction[edit]

Data schemas are the foundation of all XML pages. They define objects, their relationships, their attributes, and the structure of the data model. Without them, XML documents would not exist. In this chapter, you will come to understand the purpose of XML data schemas, their intricate parts, and how to utilize them. Also, examples will be included for you to copy when creating your own data schema, making your job a lot easier. At the bottom of this Web page a whole Schema has been included, from which parts have been included in the different sections throughout this chapter. Refer to it if you would like to see how the whole Schema works as one.

Overview of Data Schemas[edit]

The data schema, all technicalities aside, is the data model with which all the XML information is conveyed. It has a hierarchy structure starting with a root element (to be explained later) and goes all the way down to cover even the most minute detail of the model with detailed steps in between. Data schemas have two main parts, the entities and their relationships. The entities contained in a data schema represent objects from the model. They have unique identifiers, attributes, and names for what kind of object they are. The relationships in the schema represent the relationships between the objects, simple enough. Relationships can be one to one, one to many, many to many, recursive, and any other kind you could find in a data model. Now we will begin to create our own data schema.

Starting your schema the right way[edit]

All schemas begin the same way, no matter what type of objects they represent. The first line in every Schema is this declaration:

<?xml version="1.0" encoding="UTF-8"?>

Exhibit 1: XML Declaration

Exhibit 1 simply tells the browser or whatever file/program accessing this schema that it is an XML file and uses the encoding structure "UTF-8". You can copy this to use to start your own XML file. Next comes the Namespace declaration:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">

Exhibit 2: Namespace Declaration

Namespaces are basically dictionaries containing definitions of most of the coding in the schema. For example, when creating a schema, if you declare an object to be of type "String", the definition of the type "String" is contained in the Namespace along with all of its attributes. This is true for most of the code you write. If you have made or seen other schemas, most of the code is prefaced by "xsd:". A good example is something like "xsd:sequence" or "xsd:complexType". sequence and complexType are both objects defined in the Namespace that has been linked to the prefix "xsd". In fact, you could theoretically name the default Namespace anything, as long as you referenced it the same way throughout the Schema. The most common Namespace which contains most of the XML objects is http://www.w3.org/2001/XMLSchema. Now onto Exhibit 2.

The first part lets any file/program know that this file is a schema. Pretty easy to understand. Like the XML declaration, this is universal to XML schemas and you can use it in yours. The second part is the actual Namespace declaration; xmlns stands for XML NameSpace. This defines the Schema's default Namespace and is usually the one given in the code. Again, I would recommend using this code to start your Schemas. The last part is difficult to understand, but here is a pretty detailed explanation. Using "unqualified" is most applicable until you get to some really complicated code.

Entities in general[edit]

Entities are basically the objects a Schema is created to represent. As stated before, they have attributes and relationships. We will now go much further into explaining exactly what they are and how to write code for them.

There are two types of Entities: simpleType and complexType. A simpleType object has one value associated with it. A string is a perfect example of a simpleType object as it only contains the value of the string. Most simpleTypes used will be defined in the default Namespace; however, you can define your own simpleType at the bottom of the Schema (this will be brought up in the restrictions section). Because of this, the only objects you will most often need to include in your Schema are complexTypes. A complexType is an object with more than one attribute associated with it, and it may or may not have a child elements attached to it. Here is an example of a complexType object:

<xsd:complexType name="GenreType">
  <xsd:sequence>
    <xsd:element name="name" type="xsd:string"/>
    <xsd:element name="description" type="xsd:string"/>
    <xsd:element name="movie" type="MovieType" minOccurs="1" maxOccurs="unbounded"/>
  </xsd:sequence>
</xsd:complexType>

Exhibit 3: The complexType Element

This code begins with the declaration of a complexType and its name. When other entities refer to it, such as a parent element, it will refer to this name. The 2nd line begins the sequence of attributes and child elements, which are all declared as an "element". The elements are declared as elements with the 1st part of the line of code, and their name to which other documents will refer is included as the "name" as the 2nd part. After the first two declarations comes the "type" declaration. Note that for the name and description elements their type is "xsd:string" showing that the type string is defined in the Namespace "xsd". For the movie element, the type is "MovieType", and because there is no Namespace before "MovieType", it is assumed that this type is included in this Schema. (it could refer to a type defined in another Schema if the other Schema was included at the top of the Schema. don't worry about that now) "minOccurs" and "maxOccurs" represents the relationship between Genre's and MovieTypes. "minOccurs" can be either 0 or an arbitrary number, depending only on the data model. "maxOccurs" can be either 1 (a one to one relationship), an arbitrary number (a one to many relationship), or "unbounded" (a one to many relationship).

For each schema, there must be one root element. This entity contains every other entity underneath it in the hierarchy. For instance, when creating a schema to include a list of movies, the root element would be something like MovieDatabase, or maybe MovieCollection, just something that would logically contain all the other objects (like genre, movie, actor, director, plotline, etc.) It is always started with this line of code: <xsd:element name="xxx"> showing that it is the root element and then goes on as a normal complexType. All other objects will begin with either simpleType or complexType. Here is sample code for a MovieDatabase root element:

<xsd:element name="MovieDatabase">
     <xsd:complexType>
       <xsd:sequence>
         <xsd:element name="Genre" type="GenreType" minOccurs="1" maxOccurs="unbounded"/>            
       </xsd:sequence>
     </xsd:complexType>
   </xsd:element>

Exhibit 4: The Root Element

This represents a MovieDatabase where the child element of MovieDatabase is a Genre. From there it goes onto movie, etc. We will continue to use this example help you better understand.

The Parent / Child Relationship[edit]

The Parent / Child Relationship is a key topic in Data Schemas. It represents the basic structure of the data model's hierarchy by clearly laying out the top down configuration. Look at this piece of code which shows how movies have actors associated with them:

<xsd:complexType name="MovieType">
  <xsd:sequence>
    <xsd:element name="name" type="xsd:string"/>       
    <xsd:element name="actor" type="ActorType" minOccurs="1" maxOccurs="unbounded"/>
  </xsd:sequence>
</xsd:complexType>
 
<xsd:complexType name="ActorType">
  <xsd:sequence>
    <xsd:element name="lname" type="xsd:string"/>
    <xsd:element name="fname" type="xsd:string"/>
  </xsd:sequence>
</xsd:complexType>

Exhibit 5: The Parent/Child Relationship

Within each MovieType, there is an element named "actor" which is of "ActorType". When the XML document is populated with information, the surrounding tags for actor will be <actor></actor> and not <ActorType></ActorType>. To keep your Schema flowing smoothly and without error, the type field in the Parent Element will always equal the name field in the declaration of the complexType Child Element.

Attributes and Restrictions[edit]

An attribute of an entity is a simpleType object in that it only contains one value. <xsd:element name="lname" type="xsd:string"/> is a good example of an attribute. It is declared as an element, has a name associated with it, and has a type declaration. Located in the appendix of this chapter is a long list of simpleTypes built into the default Namespace. Attributes are incredibly simple to use, until you try and restrict them.

In some cases, certain data must abide by a standard to maintain data integrity. An example of this would be a Social Security number or an email address. If you have a database of email addresses that sends mass emails to, you would need all of them to be valid addresses, or else you'd get tons of error messages each time you send out that mass email. To avoid this problem, you can essentially take a known simpleType and add a restriction to it to better suit your needs. Now you can do this two ways, but one is simpler and better to use in Data Schemas. You could edit the simpleType within its declaration in the Parent Element, but it gets messy, and if another Schema wants to use it, the code must be written again. The better way to do it is to list a new type at the bottom of the Schema that edits a previously known simpleType. Here is an example of this with a Social Security number:

<xsd:simpleType name="emailaddressType">
  <xsd:restriction base="xsd:string">
    <xsd:pattern value="[^@]+@[^\.]+\..+"/>
  </xsd:restriction>
</xsd:simpleType>
 
<xsd:simpleType name="ssnType">
  <xsd:restriction base="xsd:string">
    <xsd:pattern value="\d{3}-\d{2}-\d{4}"/>
  </xsd:restriction>
</xsd:simpleType>

Exhibit 6: Restriction on a simpleType

This was included in the Schema below the last Child Element and before the closing </xsd:schema>. The first line declares the simpleType and gives it a name, "ssnType". You could name yours anything you want, as long as you reference it correctly throughout the Schema. By doing this, you can use this type anywhere in the Schema, or anywhere in another Schema, provided the references are correct. The second line lets the Schema know it is a restricted type and its base is a string defined in the default Namespace. Basically, this type is a string with a restriction on it, and the third line is the actual restriction. It can be one of many types of restrictions, which are listed in the Appendix of this chapter. This one happens to be of type "pattern". A "pattern" means that only a certain sequence of characters will be allowed in the XML document and is defined in the value field. This particular one means three digits, a hyphen, two digits, a hyphen, and four digits. To learn more about how to use restrictions, follow this link to the W3 school's section on restrictions.

Not of little import: Introducing the <xsd:import> tag[edit]

The <xsd:import> tag is used to import a schema document and the namespace associated with the data types defined within the schema document. This allows an XML schema document to reference a type library using namespace names (prefixes). Let's take a closer look at a simple XML instance document for a store that uses these multiple namespace names:

<?xml version="1.0" encoding="UTF-8"?>
<store:SimpleStore xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.opentourism.org/xmltext/SimpleStore.xsd"
  xmlns:store="http://www.opentourism.org/xmltext/Store"
  xmlns:MGR="http://www.opentourism.org/xmltext/CoreSchema">
  <!-- Note the explicitly defined namespace declarations, the prefix store 
     represents data types defined in the     
     <code>http://www.opentourism.org/xmltext/Store.xml</code> namespace and the 
     prefix MGR represents data types defined in the 
     <code>http://www.opentourism.org/xmltext/CoreSchema</code> namespace. 
     Also, notice that there is no default namespace declaration – every element
     and attribute must be associated with a namespace (we will see this is 
     necessary weh we examine the schema document)  
-->
  <store:Store>
    <MGR:Name xmlns:MGR=" http://www.opentourism.org/xmltext/CoreSchema ">
      <MGR:FirstName>Michael</MGR:FirstName>
      <MGR:MiddleNames>Jay</MGR:MiddleNames>
      <MGR:LastName>Fox</MGR:LastName>
    </MGR:Name>
    <store:StoreName>The Gap</store:StoreName>
    <store:StoreAddress>
      <store:Street>86 Nowhere Ave.</store:Street>
      <store:City>Los Angeles</store:City>
      <store:State>CA</store:State>
      <store:ZipCode>75309</store:ZipCode>
    </store:StoreAddress>
    <!-- More store information would go here. -->
  </store:Store>
  <!-- More stores would go here. -->
</store:SimpleStore>

Exhibit 7 XML Instance Document – [1]


Let's look at the schema document and see how the <xsd:import> tag was used to import data types from a type library (external schema document).

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns="http://www.opentourism.org/xmltext/Store.xml"
  xmlns:MGR="http://www.opentourism.org/xmltext/CoreSchema"
  targetNamespace="http://www.opentourism.org/xmltext/Store.xml" elementFormDefault="qualified">
  <!-- The prefix MGR is bound to the following namespace name: 
          <code>http://www.opentourism.org/xmltext/CoreSchema</code>
          The managerTypeLib.xsd schema document is imported by associating the 
          schema with the <code>http://www.opentourism.org/xmltext/CoreSchema</code> 
          namespace name, which was bound to the MGR prefix. 
          The elementFormDefault attribute has the value ‘qualified' indicating that 
          an XML instance document must use qualified names for every element(default
          namespace can not be used)  
-->
  <!-- The target namespace and default namespace are the same  -->
  <xsd:import namespace="http://www.opentourism.org/xmltext/CoreSchema"
    schemaLocation="ManagerTypeLib.xsd"/>
  <xsd:element name="SimpleStore">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="Store" type="StoreType" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
  <xsd:complexType name="StoreType">
    <xsd:sequence>
      <xsd:element ref="MGR:Name"/>
      <xsd:element name="StoreName" type="xsd:string"/>
      <xsd:element name="StoreAddress" type="StoreAddressType"/>
    </xsd:sequence>
  </xsd:complexType>
  <xsd:complexType name="StoreAddressType">
    <xsd:sequence>
      <xsd:element name="Street" type="xsd:string"/>
      <xsd:element name="City" type="xsd:string"/>
      <xsd:element name="State" type="xsd:string"/>
      <xsd:element name="ZipCode" type="xsd:string"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:schema>

Exhibit 8: XML Schema [http://www.opentourism.org/xmltext/SimpleStore.xsd


Like the include tag and the redefine tag, the import tag is another means of incorporating any data types from an external schema document into another schema document and must occur before any element or attribute declarations. These mechanisms are important when XML schemas are modularized and type libraries are being maintained and used in multiple schema documents.

When the whole is greater than the sum of its parts:
Schema Modularization
[edit]

Now that we have covered all three methods of incorporating external XML schemas, let’s consider the importance of these mechanisms. As is typical with most programming code, redundancy is frowned upon; this is true for custom data type definitions as well. If a custom data type already exists that can be applied to an element in your schema document, does it not make sense to use this data type rather than create it again within your new schema document? Moreover, if you know that a single data type can be reused for several applications, should you not have a method for referencing that data type when you need it?

The idea behind modular schemas is to examine what your schema does, determine what data types are frequently used in one form or another and develop a type library. As your needs for more complex schemas increase you can continue to add to your library, reuse data types in your type library, and redefine those data types as needed. An example of this reuse would be a schema for customer information – different departments would use different schemas as they would need only partial customer information. However most, if not all, departments would need some specific customer information, like name and contact information, which could be incorporated in the individual departmental schema documents.

Schema modularization is a “best practice”. By maintaining a type library and reusing and redefining types in the type library, you can help ensure that your XML schema documents don't become overwhelming and difficult to read. Readability is important, because you may not be the only one using these schemas, and it is important that others can easily understand your schema documents.

“Choose, but choose wisely…”: Schema alternatives[edit]

Thus far in this book we have only discussed XML schemas as defined by the World Wide Web Consortium (W3C). Yet there are other methods of defining the data contained within an XML instanced document, but we will only mention the two most popular and well known alternatives: Document Type Definition (DTD) and Relax NG Schema.

We will cover DTDs in the next chapter. Relax NG schema is a newer and has many of the same features that W3C XML schema have; Relax NG also claims to be simpler, and easier to learn, but this is very subjective. For more about Relax NG, visit: http://www.relaxng.org/

Appendix[edit]

First is the full Schema used in the examples throughout this chapter:

<?xml version="1.0" encoding="UTF-8"?>
 
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="unqualified">
 
   <xsd:element name="MovieDatabase">
     <xsd:complexType>
       <xsd:sequence>
         <xsd:element name="Genre" type="GenreType" minOccurs="1" maxOccurs="unbounded"/>            
       </xsd:sequence>
     </xsd:complexType>
   </xsd:element>
 
     <xsd:complexType name="GenreType">
       <xsd:sequence>
         <xsd:element name="name" type="xsd:string"/>
         <xsd:element name="description" type="xsd:string"/>
         <xsd:element name="movie" type="MovieType" minOccurs="1" maxOccurs="unbounded"/>
       </xsd:sequence>
     </xsd:complexType>
 
     <xsd:complexType name="MovieType">
       <xsd:sequence>
         <xsd:element name="name" type="xsd:string"/>
         <xsd:element name="rating" type="xsd:string"/>
         <xsd:element name="director" type="xsd:string"/>
         <xsd:element name="writer" type="xsd:string"/>
         <xsd:element name="year" type="xsd:int"/>
         <xsd:element name="tagline" type="xsd:string"/>       
         <xsd:element name="actor" type="ActorType" minOccurs="1" maxOccurs="unbounded"/>
       </xsd:sequence>
     </xsd:complexType>
 
     <xsd:complexType name="ActorType">
       <xsd:sequence>
         <xsd:element name="lname" type="xsd:string"/>
         <xsd:element name="fname" type="xsd:string"/>
         <xsd:element name="gender" type="xsd:string"/>
         <xsd:element name="bday" type="xsd:string"/>
         <xsd:element name="birthplace" type="xsd:string"/>
         <xsd:element name="ssn" type="ssnType"/>
       </xsd:sequence>
     </xsd:complexType>
 
     <xsd:simpleType name="ssnType">
       <xsd:restriction base="xsd:string">
         <xsd:pattern value="\d{3}-\d{2}-\d{4}"/>
       </xsd:restriction>
   </xsd:simpleType>
 
</xsd:schema>

It’s time to go back to the beginning…and review all of the schema data types, elements, and attributes that we have covered thus far (and maybe a few that we have not). The following tables will detail the XML data types, elements and attributes that can be used in an XML Schema.

Primitive Types

This is a table with all the primitive types the attributes in your schema can be.

Type Syntax Legal value example Constraining facets
xsd:anyURI <xsd:element name = “url” type = “xsd:anyURI” /> http://www.w3.com length, minLength, maxLength, pattern, enumeration, whitespace
xsd:boolean <xsd:element name = “hasChildren” type = “xsd:boolean” /> true or false or 1 or 0 pattern and whitespace
xsd:byte <xsd:element name = “stdDev” type = “xsd:byte” /> -128 through 127 length, minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, whitespace, and totalDigits
xsd:date <xsd:element name = “dateEst” type = “xsd:date” /> 2004-03-15 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace
xsd:dateTime <xsd:element name = “xMas” type = “xsd:dateTime” /> 2003-12-25T08:30:00 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace
xsd:decimal <xsd:element name = “pi” type = “xsd:decimal” /> 3.1415292 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, fractionDigits, and totalDigits
xsd:double <xsd:element name = “pi” type = “xsd:double” /> 3.1415292 or INF or NaN minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:duration <xsd:element name = “MITDuration” type = “xsd:duration” /> P8M3DT7H33M2S
xsd:float <xsd:element name = “pi” type = “xsd:float” /> 3.1415292 or INF or NaN minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:gDay <xsd:element name = “dayOfMonth” type = “xsd:gDay” /> ---11 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:gMonth <xsd:element name = “monthOfYear” type = “xsd:gMonth” /> --02-- minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:gMonthDay <xsd:element name = “valentine” type = “xsd:gMonthDay” /> --02-14 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:gYear <xsd:element name = “year” type = “xsd:gYear” /> 1999 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:gYearMonth <xsd:element name = “birthday” type = “xsd:gYearMonth” /> 1972-08 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:ID <xsd:attribute name="id" type="xsd:ID"/> id-102 length, minLength, maxLength, pattern, enumeration,   and whitespace
xsd:IDREF <xsd:attribute name="version" type="xsd:IDREF"/> id-102 length, minLength, maxLength, pattern, enumeration,   and whitespace
xsd:IDREFS <xsd:attribute name="versionList" type="xsd:IDREFS"/> id-102 id-103 id-100 length, minLength, maxLength, pattern, enumeration,   and whitespace
xsd:int <xsd:element name = “age” type = “xsd:int” /> 77 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, whitespace, and totalDigits
xsd:integer <xsd:element name = “age” type = “xsd:integer” /> 77 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:long <xsd:element name = “cannelNumber” type = “xsd:int” /> 214 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:negativeInteger <xsd:element name = “belowZero” type = “xsd:negativeInteger” /> -123 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, and totalDigits
xsd:nonNegativeInteger <xsd:element name = “numOfchildren” type = “xsd:nonNegativeInteger” /> 2 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, and totalDigits
xsd:nonPositiveInteger <xsd:element name = “debit” type = “xsd:nonPositiveInteger” /> 0 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, and totalDigits
xsd:positiveInteger <xsd:element name = “credit” type = “xsd:positiveInteger” /> 500 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, and totalDigits
xsd:short <xsd:element name = “numOfpages” type = “xsd:short” /> 476 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, and totalDigits
xsd:string <xsd:element name = “name” type = “xsd:string” /> Joeseph length, minLength, maxLength, pattern, enumeration,   whitespace, and totalDigits
xsd:time <xsd:element name = “credit” type = “xsd:time” /> 13:02:00 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace,

Schema Elements
( from http://www.w3schools.com/schema/schema_elements_ref.asp )

Here is a list of all the elements which can be included in your schemas.

Element Explanation
all Specifies that the child elements can appear in any order. Each child element can occur 0 or 1 time
annotation Specifies the top-level element for schema comments
any Enables the author to extend the XML document with elements not specified by the schema
anyAttribute Enables the author to extend the XML document with attributes not specified by the schema
appInfo Specifies information to be used by the application (must go inside annotation)
attribute Defines an attribute
attributeGroup Defines an attribute group to be used in complex type definitions
choice Allows only one of the elements contained in the <choice> declaration to be present within the containing element
complexContent Defines extensions or restrictions on a complex type that contains mixed content or elements only
complexType Defines a complex type element
documentation Defines text comments in a schema (must go inside annotation)
element Defines an element
extension Extends an existing simpleType or complexType element
field Specifies an XPath expression that specifies the value used to define an identity constraint
group Defines a group of elements to be used in complex type definitions
import Adds multiple schemas with different target namespace to a document
include Adds multiple schemas with the same target namespace to a document
key Specifies an attribute or element value as a key (unique, non-nullable, and always present) within the containing element in an instance document
keyref Specifies that an attribute or element value correspond to those of the specified key or unique element
list Defines a simple type element as a list of values
notation Describes the format of non-XML data within an XML document
redefine Redefines simple and complex types, groups, and attribute groups from an external schema
restriction Defines restrictions on a simpleType, simpleContent, or a complexContent
schema Defines the root element of a schema
selector Specifies an XPath expression that selects a set of elements for an identity constraint
sequence Specifies that the child elements must appear in a sequence. Each child element can occur from 0 to any number of times
simpleContent Contains extensions or restrictions on a text-only complex type or on a simple type as content and contains no elements
simpleType Defines a simple type and specifies the constraints and information about the values of attributes or text-only elements
union Defines a simple type as a collection (union) of values from specified simple data types
unique Defines that an element or an attribute value must be unique within the scope

Schema Restrictions and Facets for data types
( from http://www.w3schools.com/schema/schema_elements_ref.asp )

Here is a list of all the types of restrictions which can be included in your schema.

Constraint Description
enumeration Defines a list of acceptable values
fractionDigits Specifies the maximum number of decimal places allowed. Must be equal to or greater than zero
length Specifies the exact number of characters or list items allowed. Must be equal to or greater than zero
maxExclusive Specifies the upper bounds for numeric values (the value must be less than this value)
maxInclusive Specifies the upper bounds for numeric values (the value must be less than or equal to this value)
maxLength Specifies the maximum number of characters or list items allowed. Must be equal to or greater than zero
minExclusive Specifies the lower bounds for numeric values (the value must be greater than this value)
minInclusive Specifies the lower bounds for numeric values (the value must be greater than or equal to this value)
minLength Specifies the minimum number of characters or list items allowed. Must be equal to or greater than zero
pattern Defines the exact sequence of characters that are acceptable
totalDigits Specifies the exact number of digits allowed. Must be greater than zero
whiteSpace Specifies how white space (line feeds, tabs, spaces, and carriage returns) are handled

Regex

Special regular expression (regex) language can be used to construct a pattern. The regex language in XML Schema is based on Perl's regular expression language. The following are some common notations:

. (the period for any character at all
\d for any digit
\D for any non-digit
\w for any word (alphanumeric) character
\W for any non-word character (i.e. -, +, =)
\s for any white space (including space, tab, newline, and return)
\S for any character that is not white space
x* to have zero or more x's
(xy)* to have zero or more xy's
x+ repetition of the x, at least once
x? to have one or zero x's
(xy)? To have one or no xy's
[abc] to include one of a group of values
[0-9] to include the range of values from 0 to 9
x{5} to have exactly 5 x's (in a row)
x{5,} to have at least 5 x's (in a row)
x{5,8} at least 5 but at most 8 x's (in a row)
(xyz){2} to have exactly 2 xyz's (in a row)
For example, the pattern for validating a Social Security Number is \d{3}-\d{2}-\d{4}

The schema code for emailAddressType is \w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*

[w+] at least one word (alphanumeric) character, e. g. answer
[W*] followed by none, one or many non-word character(s), e. g. -
[w*@{1}] followed by any (or none) word character and one at-sign, e. g. my@
[w+] followed by at least one word character, e. g. mail
[W*] followed by none, one or many non-word character(s), e. g. _
[w+.] followed by at least one word character and period, e. g. please.
[w+.*] zero to infinite times followed by the previous string, e. g. opentourism.
[w*] finally followed by none, one or many word character(s) e. g. org
email-address: answer-my@mail_please.opentourism.org

Instance Document Attributes
These attributes do NOT need to be declared within the schemas

Attribute Explanation Example
xsi:nil Indicates that a certain element does not have a value or that the value is unknown.   The element must be set to nillable inside the schema document:

<xsd:element name=”last_name” type=”xsd:string” nillable=true”/>

<full_name xmlns:xsi= ”http://www.w3.org/2001/XMLSchema-instance”>    <first_name>Madonna</first_name>

<last_name xsi:nil=”true”/> </full_name>

xsi:noNamespaceSchemaLocation Locates the schema for elements that are not in any namespace <radio xsi:noNamespaceSchemaLocation= ”http://www.opentourism.org/xmtext/radio.xsd”>

<!—radio stuff goes here -- > </radio>

xsi:schemaLocation Locates schemas for elements and attributes that are in a specified namespace <radio xmlns= ”http://www.opentourism.org/xmtext/NS/radio xmlns:xsi= ”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation= ”http://www.arches.uga.eduNS/radio”http://www.opentourism.org/xmtext/radio.xsd”>

<!—radio stuff goes here -- > </radio>

xsi:type Can be used in instance documents to indicate the type of an element. <height xsi:type=”xsd:decimal”>78.9</height>


For more information on XML Schema structures, data types, and tools you can visit http://www.w3.org/XML/Schema.


Xml book cover wiki.png XML - Managing Data Exchange
Chapters
Appendices
Exercises
Related Topics
Computer Science Home
Library and Information Science Home
Markup Languages
Get Involved
To do list
Contributors list
Contributing to Wikibooks
Previous Chapter Next Chapter
DTD XPath




Learning objectives

  • List the differences between XHTML and HTML
  • Create a valid, well-formed XHTML document
  • Convert an existing HTML document to XHTML
  • Decide when XHTML is more appropriate than HTML


In previous chapters, we have learned how to generate HTML documents from XML documents and XSL stylesheets. In this chapter, we will learn how to convert those HTML documents into valid XHTML. We will discuss why XHTML has evolved as a standard and when it should be used.

The Evolution of XHTML[edit]

Originally, Web pages were designed in HTML. Unfortunately most implementations of this markup language allow all sorts of mistakes and bad formatting. Major browsers were designed to be forgiving, and poor code would display with few problems in most cases. This poor code was often not portable between browsers, e.g. a page would render in Netscape but not Internet Explorer or vice versa. The accounting for human error and bad formatting takes an amount of processing power that small handheld devices might not have. Thus when displaying data on handhelds, a tiny mistake can crash the device.

XHTML partially mitigates these problems. The processing burden is reduced by requiring XHTML documents to conform to the much stricter rules defined in XML. Aside from the stricter rules, HTML 4.01 and XHTML 1.0 are functionally equivalent. If a document breaks XML's well-formedness rules, an XHTML-compliant browser must not render the page. If a document is well-formed but invalid, an XHTML-compliant browser may render the page, so a significant number of mistakes still slip through.

In this chapter, we will examine in detail how to create an XHTML document.

The biggest problem with HTML from a design standpoint is that it was never meant to be a graphical design language. The original version of HTML was intended to structure human readable content (e.g. marking a section of text as a paragraph), not to format it (e.g. this paragraph should be displayed in 14pt Arial). HTML has evolved far past its original purpose and is being stretched and manipulated to cover cases that the original HTML designers never imagined.

The recommended solution is to use a separate language to describe the presentation of a group of documents. Cascading Style Sheets (CSS) is a language used for describing presentation. From version 1.1 of XHTML upwards web pages must be formatted using CSS or a language with equivalent capabilites such as XSLT (XSL Transformations). The use of CSS or XSLT is optional in XHTML 1.0 unless the strict variant is used. HTML 4.01 supports CSS but not XSLT.

So What is XHTML?[edit]

As you might have guessed, XHTML stands for eXtensible HyperText Markup Language. It is a cross between HTML and XML. It fulfills two major purposes that were ignored by HTML:

  1. XHTML is a stricter standard than HTML. XHTML documents must be well-formed just like regular XML. This reduces vagaries and inconsistency between browsers, because browsers do not have to decide how to display a badly-formed page. Malformed XHTML is not allowed.
    Note 1: Browsers only enforce well-formedness if the MIME type is set to application/xhtml+xml. If the MIME type is set to text/html, the browser will allow badly-formed documents. There are a large number of 'XHTML' documents on the web that are badly-formed and get away with it because their MIME type is text/html.
    Note 2: Browsers are not required to check for validity. See Invalid XHTML below for an example.
  2. XHTML allows for modularization (m12n). For different environments different element and attribute subsets can be defined.

The best thing about XHTML is that it is almost the same as HTML! If you know how to write an HTML document, it will be very simple for you to create an XHTML document without too much trouble. The biggest thing that you must keep in mind is that unlike with HTML, where simple errors like missing a closing tag are ignored by the browser, XHTML code must be written according to an exact specification. We will see later that adhering to these strict specifications actually allows XHTML to be more flexible than HTML.

XHTML Document Structure[edit]

At a minimum, an XHTML document must contain a DOCTYPE declaration and four elements: html, head, title, and body:

<!DOCTYPE ... >
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="...">
   <head>
      <title></title>
   </head>
   <body></body>
</html>

The opening html tag of an XHTML document must include a namespace declaration for the XHTML namespace.

The DOCTYPE declaration should appear immediately before the html tag in an XHTML document. It can follow one of three formats.

XHTML 1.0 Strict[edit]

<!DOCTYPE html
 PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
The Strict declaration is the least forgiving. This is the preferred DOCTYPE for new documents. Strict documents tend to be streamlined and clean. All formatting will appear in Cascading Style Sheets rather than the document itself. Elements that should be included in the Cascading Style Sheet and not the document itself include, but are not limited to:
<body text="blue">, <u>nderline</u>, <b>old</b>, <i>talics</i>, and <font color="#9900FF" face="Arial" size="+2">

There are also certain instances where your code needs to be nested within block elements.

Incorrect Example:
<p>I hope that you enjoy</p> your stay.
Correct Example:
<p>I hope that you enjoy your stay.</p>

XHTML 1.0 Transitional[edit]

<!DOCTYPE html
 PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

This declaration is intended as a halfway house for migrating legacy HTML documents to XHTML 1.0 Strict. The W3C encourages authors to use the Strict DOCTYPE for new documents. (The XHTML 1.0 Transitional DTD refers readers to the relevant note in the HTML4.01 Transitional DTD.)

This DOCTYPE does not require CSS for formatting; although, it is recommended. It generally tolerates inline elements found where block-level elements are expected.

There are a couple of reasons why you might choose this DOCTYPE for new documents.

  • You require backwards compatibility with browsers that support the formatting elements of XHTML but do not support CSS. This is a very small fraction of general users (less than 1%). Many browsers that don't support CSS don't support HTML 4.0 or XHTML either. However, it may be useful on a corporate intranet that has a larger than normal fraction of very old (pre-2000) browsers.
  • You need to link to frames. Using frames is discouraged as they work badly in many browsers.

XHTML 1.0 Frameset[edit]

<!DOCTYPE html
 PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

If you are creating a page with frames, this declaration is appropriate. However, since frames are generally discouraged when designing Web pages, this declaration should be used rarely.

XML Prolog[edit]

Additionally, XHTML authors are encouraged by the W3C to include the following processing instruction as the first line of each document:

<?xml version="1.0" encoding="UTF-8"?>

Although it is recommended by the standard, this processing instruction may cause errors in older Web browsers including Internet Explorer version 6. It is up to the individual author to decide whether to include the prolog.

Language[edit]

It is good practice to include the optional xml:lang attribute [2] on the html element to describe the document's primary language. For compatibility with HTML the lang attribute should also be specified with the same value. For an English language document use:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

The xml:lang and lang attributes can also be specified on other elements to indicate changes of language within the document, e.g. a French quotation in an English document.

Converting HTML to XHTML[edit]

In this section, we will discover how to transform an HTML document into an XHTML document. We will examine each of the following rules:

  • Documents must be well-formed
    • Tags must be properly nested
    • Elements must be closed
  • Tags must be lowercase
  • Attribute names must be lowercase
  • Attribute values must be quoted
  • Attributes cannot be minimized
  • The name attribute is replaced with the id attribute (in XHTML 1.0 both name and id should be used with the same value to maintain backwards-compatibility).
  • Plain ampersands are not allowed
  • Scripts and CSS must be escaped(enclose them within the tags <![CDATA[ and ]]>) or preferably moved into external files.

Documents must be well-formed[edit]

Because XHTML conforms to all XML standards, an XHTML document must be well-formed according to the W3C's recommendations for an XML document. Several of the rules here reemphasize this point. We will consider both incorrect and correct examples.

Tags must be properly nested[edit]

Browsers widely tolerate badly nested tags in HTML documents.

<b><u>
This text is probably bold and underlined, but inside incorrectly nested tags.
</b></u>

The text above would display as bold and underlined, even though the end tags are not in the proper order. An XHTML page will not display if the tags are improperly nested, because it would not be considered a valid XML document. The problem can be easily fixed.

<b><u>
This text is bold and underlined and inside properly nested tags.
</u></b>

Elements must be closed[edit]

Again, XHTML documents must be considered valid XML documents. For this reason, all tags must be closed. HTML specifications listed some tags as having "optional" end tags, such as the <p> and <li> tags.

<p>Here is a list:
<ul>
   <li>Item 1
   <li>Item 2
   <li>Item 3
</ul>

In XHTML, the end tags must be included.

<p>Here is a list: </p>
<ul>
   <li>Item 1</li>
   <li>Item 2</li>
   <li>Item 3</li>
</ul>

What should we do about HTML tags that do not have a closing tag? Some special tags do not require or imply a closing tag.

<img src="titlebar.gif" alt="Title">
<hr>
<br>
<p>Welcome to my web page!</p>

In XHTML, the XML rule of including a closing slash within the tag must be followed.

<img src="titlebar.gif" alt="title" />
<hr />
<br />
<p>Welcome to my Web page!</p>

Note that some of today's browsers will incorrectly render a page if the closing slash does not have a space before it (<br/>). Although it is not part of the official recommendation, you should always include the space (<br />) for compatibility purposes.

Here are the common empty tags in HTML:

  • area
  • base
  • basefont
  • br
  • hr
  • img
  • input
  • link
  • meta
  • param

Tags must be lowercase[edit]

In HTML, tags could be written in either lowercase or uppercase. In fact, some Web authors preferred to write tags in uppercase to make them easier to read. XHTML requires that all tags be lowercase.

<H1>This is an example of bad case.</h1>

This difference is necessary because XML differentiates between cases. XML would read <H1> and <h1> as different tags, causing problems in the above example.

<h1>This is an example of good case.</h1>

The problem can be easily fixed by changing all tags to lowercase.

Attribute names must be lowercase[edit]

Following the pattern of writing all tags in lowercase, all attribute names must also be in lowercase.

<p CLASS="specialText">Important Notice</p>

The correct tags are easy to create.

<p class="specialText">Important Notice</p>

Attribute values must be quoted[edit]

Some HTML values do not require quotation marks around them. They are understood by browsers.

<table border=1 width=100%>
</table>

XHTML requires all attributes to be quoted. Even numeric, percentage, and hexadecimal values must appear in quotations for them to be considered part of a proper XHTML document.

<table border="1"  width="100%">
</table>

Attributes cannot be minimized[edit]

HTML allowed some attributes to be written in shorthand, such as selected or noresize.

<form>
   <input checked ... />
   <input disabled ... />
</form>

When using XHTML, attribute minimization is forbidden. Instead, use the syntax x="x", where x is the attribute that was formerly minimized.

<form>
   <input checked="checked"  .../>
   <input disabled="disabled"  .../>
</form>

A complete list of minimized attributes follows:

  • checked
  • compact
  • declare
  • defer
  • disabled
  • ismap
  • nohref
  • noresize
  • noshade
  • nowrap
  • readonly
  • selected
  • multiple

The name attribute is replaced with the id attribute[edit]

HTML 4.01 standards define a name attribute for the tags a, applet, frame, iframe, img, and map.

<a name="anchor">
<img src="banner.gif" name="mybanner" />
</a>

XHTML has deprecated the name attribute. Instead, the id attribute is used. However, to ensure backwards compatibility with today's browsers, it is best to use both the name and id attributes.

<a name="anchor" id="anchor" >
<img src="banner.gif" name="mybanner" id="mybanner"  />
</a>

As technology advances, it will eventually be unnecessary to use both attributes and XHTML 1.1 removed name altogether.

Ampersands are not supported[edit]

Ampersands are illegal in XHTML.

<a href="home.aspx?status=done&amp;itWorked=false">Home &amp; Garden</a>

They must instead be replaced with the equivalent character code &amp;.

<a href="home.aspx?status=done&amp;amp;itWorked=false">Home &amp;amp; Garden</a>

Image alt attributes are mandatory[edit]

Because XHTML is designed to be viewed on different types of devices, some of which are not image-capable, alt attributes must be included for all images.

<img src="titlebar.gif">

Remember that the img tag must include a closing slash in XHTML!

<img src="titlebar.gif" alt="title"  />

Scripts and CSS must be escaped[edit]

Internal scripts and CSS often include characters like the ampersand and less-than characters.

<script language="JavaScript">
   <!--
      document.write('Hello World!'); 
   //-->
</script>

If you are using internal scripts or CSS, enclose them within the tags <![CDATA[ and ]]>. This will mark them as character data that should not be parsed. If you do not use these tags, characters like & and < will be treated as start-of-character entities (like &nbsp;) and tags (like <b>) respectively. This will cause your page to behave unpredictably, and it may invalidate your code.

Additionally, the type attribute is mandatory for scripts. The comment tags <!-- and --> that have traditionally been used to hide JavaScript from noncompliant browsers should not be included. The XML standard states that text enclosed in comment tags may be completely excluded from rendered documents, which would lose all script enclosed in the tags.

<script type="text/javascript" language="javascript">
/*<![CDATA[*/
   document.write('Hello World!');
/*]]>*/
</script>

Also document.write(); is not permitted in XHTML documents. You must used node creation methods such as document.createElementNS(); instead. Confusingly, document.write(); will appear to work as expected if the document is incorrectly served with a MIME type of text/html (the type for HTML documents), instead of application/xhtml+xml (the type for XHTML documents). If the MIME type is text/html the document will be parsed as HTML which allows document.write();. Parsing the document as HTML defeats the purpose of writing it in XHTML.

Similar changes must be made for internal stylesheets.

<style>
<!--
   .SpecialClass {
      color: #000000;
   }
-->
</style>

The type attribute must be included, and the CDATA tags should be used.

<style type="text/css">
/*<![CDATA[*/
   .SpecialClass {
      color: #000000;
   }
/*]]>*/
</style>

Because scripts and CSS may complicate an XHTML document, it is strongly recommended that they be placed in external .js and .css files, respectively. They can then be linked to from your XHTML document.

<script src="myscript.js" type="text/javascript" />
 
<link href="styles.css" type="text/css" rel="stylesheet" />

Some elements may not be nested[edit]

The W3C recommendations state that certain elements may not be contained within others in an XHTML document, even when no XML rules are violated by the inclusion. Elements affected are listed below.

Element Cannot contain ...
a a
pre big, img, object, small, sub, sup
button button, fieldset, form, iframe, input, isindex, label, select, textarea
label label
form form

When to convert[edit]

By now, it probably sounds as though converting an HTML document into XHTML is easy, but tedious. When would you want to convert your existing pages into XHTML? Before deciding to change your entire Web site, consider these questions.

  • Do you want your pages to be easily viewed over a nontraditional Internet-capable device, such as a PDA or Web-enabled telephone? Will this be a goal of your site in the future? XHTML is the language of choice for Web-enabled portable devices. Now may be a good time for you to commit to creating an all-XHTML site.
  • Do you plan to work with XML in the future? If so, XHTML may be a logical place to begin. If you head up a team of designers who are accustomed to using HTML, XHTML is a small step away. It may be less intimidating for beginners to learn XHTML than it is to try teaching them all about XML from scratch.
  • Is it important that your site be current with the most recent W3C standards? Staying on top of current standards will make your site more stable and help you stay updated in the future, as you will only have to make small changes to upgrade your site to the newest versions of XHTML as they are approved by the W3C.
  • Will you need to convert your documents to another format? As a valid XML document, XHTML can utilize XSL to be converted into text, plain HTML, another XHTML document, or another XML document. HTML cannot be used for this purpose.

If you answered yes to any of the above questions, then you should probably convert your Web site to XHTML.

MIME Types[edit]

XHTML 1.0 documents should be served with a MIME Type of application/xhtml+xml to Web browsers that can accept this type. XHTML 1.0 may be served with the MIME type text/html to clients that cannot accept application/xhtml+xml provided that the XHTML complies with the additional constraints in [Appendix C] of the XHTML 1.0 specification. If you cannot configure your Web server to serve documents as different MIME types, you probably should not convert your Web site to XHTML.

You should check that your XHTML documents are served correctly to browsers that support application/xhtml+xml, e.g. Mozilla Firefox. Use 'Page Info' to verify that the type is correct.

XHTML 1.1 documents are often not backwards compatible with HTML and should not be served with a MIME type of text/html.[3]

Help Converting[edit]

HTML Tidy[edit]

When creating HTML, it's very easy to make a mistake by leaving out an end tag or not properly nesting tags. HTML Tidy is a wonderful application that can be used to correct a number of errors with poorly formed HTML documents and convert it into XHTML. Tidy can also format ugly code to be more readable, including code generated by WYSIWYG editors. HTML Tidy can't generate clean code when it encounters problems it isn't sure of how to fix. In these cases, it will generate an error to let you know where the mistake is located in your document.

A few examples of problems that HTML Tidy can remedy:

  • Missing or mismatched end tags.
  • Improperly nested elements.
  • Mixed up tags.
  • Add a missing "/" to properly close tags.
  • Insert missing tags into lists.
  • Add missing quotes around attribute values.
  • Ability to insert the correct DOCTYPE value based on your code (can also recognize and report proprietary elements).

HTML Tidy can also be customized at runtime using a wide array of command line arguments. It is capable of indenting code to make it more readable as well as replacing FONT, NOBR, and CENTER tags with style tags and rules using CSS. Tidy can also be taught new tags by declaring them in the configuration file.

You can read more about HTML Tidy at the W3C's HTML Tidy site, as well as download the application as a binary or get the source code. There are several sites that offer HTML Tidy as an online service including the W3C and Site Valet.

You can also validate your page using the validator available at http://validator.w3.org/.

When not to convert[edit]

You shouldn't convert your Web pages if they will always be served with a MIME type of text/html. Make sure you know how to configure your server or server-side script to perform HTTP content negotiation so that XHTML capable browsers receive XHTML marked as application/xhtml+xml. If you can't set up content negotiation, stick to HTML 4.01. People viewing your Web pages with mainstream browsers will be unable to tell the difference between a valid HTML 4.01 web page and a valid XHTML 1.0 Web page.

Make sure the automated tests you run on your site simulate connections from both XHTML-compatible browsers, e.g. Mozilla Firefox, and non–XHTML-compatiable browsers, e.g. Internet Explorer 6.0. This is particularly important if you use Javascript on your Web site. If maintaining two copies of your test suite is too time consuming, don't convert.

Bear in mind that valid HTML 4.01 Strict documents generally require less effort to convert to XHTML 1.1 than valid XHTML 1.0 Transitional documents. A valid HTML 4.01 Strict document can only contain elements that are valid in XHTML 1.1, although a few attributes may need changing. XHTML 1.0 Transitional documents on the other hand can contain ten element types and more than a dozen attributes that are not valid in XHTML 1.1. The XHTML 1.0 Transitional body element alone has six atrributes that are not supported in XHTML 1.1.

Don't be pressured into using XHTML by people talking vaguely about bad practice. Pin them down to what they mean by bad practice. If they start talking about separation of content and presentation, they have confused the differences between HTML and XHTML with the differences between the Transitional and Strict doctypes. Both XHTML 1.0 Transitional and HTML 4.01 Transitional allow you to mix presentation and content in the same document, i.e. they allow this type of bad practice. Both HTML 4.01 Strict and XHTML 1.0 Strict force you to move the bulk of the presentation (but not all of it) in to CSS or an equivalent language. All four doctypes allow you to use embedded stylesheets, whereas, true separation requires that all CSS and Javascript be moved to external files.

XHTML 1.1[edit]

XHTML 1.0 is a suitable markup language for most purposes. It provides the option to separate content and presentation, which fits the needs of most Web authors. XHTML 1.1 enforces the separation of content and presentation. All deprecated elements and attributes have been removed. It also removes two attributes that were retained in XHTML 1.0 purely for backwards-compatibility. The lang attribute is replaced by xml:lang and name is replaced by id. Finally it adds support for ruby text found in East Asian documents.

DOCTYPE[edit]

The DOCTYPE for XHTML 1.1 is:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

Modularization[edit]

The modularization of XHTML, or XHTML m12n, provides suggestions for customizing XHTML, either by integrating subsets of XHTML into other XML applications or extending the XHTML element set. The framework defines two proceses:

  • How to group elements and attributes into "modules"
  • How to combine modules to create new markup languages

The resulting languages, which the W3C calls "XHTML Host Languages", are based on the familiar XHTML structure but specialized for specific purposes. XHTML 1.1 is an example of a host language. It was created by grouping the different elements available to XHTML.

XHTML variations, while possible in theory, have not been widely adopted. There is continuing work being done to develop host languages, but their details are beyond the scope of this discussion.

Invalid XHTML[edit]

XHTML-compliant browsers are allowed to render invalid XHTML documents provded that the documents are well-formed. A simple example is given below:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>Invalid XHTML</title>
  </head> 
  <body>
     <p>This sentence contains a <p>nested paragraph.</p></p>
  </body>
</html>

Save the example as invalid.xhtml (the .xhtml extension is important) and open the page with Mozilla Firefox. The page will render even though it is invalid.


Summary[edit]

XHTML stands for eXtensible HyperText Markup Language. XHTML is very similar to HTML, but it is stricter and easier to parse. XHTML documents must be well-formed just like regular XML. XHTML allows for modularization. XHTML code must be written according to an exact specification unlike with HTML, where simple errors like missing a closing tag are ignored by the browser. Adhering to these strict specifications actually allows XHTML to be more flexible than HTML. The benefits described in this summary are only gained if the MIME type of the document is application/xhtml+xml. XHTML documents can be validated but most browsers choose not to.

Exercises[edit]

Answers[edit]



Xml book cover wiki.png XML - Managing Data Exchange
Chapters
Appendices
Exercises
Related Topics
Computer Science Home
Library and Information Science Home
Markup Languages
Get Involved
To do list
Contributors list
Contributing to Wikibooks
Previous Chapter Next Chapter
XHTML XLink



Learning objectives

  • Be able to conceptualize an XML document as a node tree
  • Refer groups of elements in an XML document
  • Understand the differences between abbreviated and unabbreviated XPath syntax
  • Understand the differences between absolute and relative Paths
  • Be able to use XPath predicates and functions to refine an XPath's node-set


Introduction[edit]

Throughout the previous chapters you have learned the basic concepts of XSL and how you must refer to nodes in an XML document when performing an XSL transformation. Up to this point you have been using a straightforward syntax for referring to nodes in an XML document. Although the syntax you have used so far has been XPath there are many more functions and capabilities that you will learn in this chapter. As you begin to comprehend how path language is used for referring to nodes in an XML document your understanding of XML as a tree structure will begin to fall into place. This chapter contains examples that demonstrate many of the common uses of XPath, but for the full XPath specification, see the latest version of the standard at:

http://www.w3.org/TR/xpath

XSL uses XPath heavily.

XPath[edit]

When you go to copy a file or ‘cd’ into a directory at a command prompt you often type something along the lines of ‘/home/darnell/’ to refer to folders. This enables you to change into or refer to folders throughout your computer’s file system. XML has a similar way of referring to elements in an XML document. This special syntax is called XPath, which is short for XML Path Language.

XPath is a language for finding information in an XML document. XPath is used to navigate through elements and attributes in an XML document.

XPath, although used for referring to nodes in an XML tree, is not itself written in XML. This was a wise choice on the part of the W3C, because trying to specify path information in XML would be a very cumbersome task. Any characters that form XML syntax would need to be escaped so that it is not confused with XML when being processed. XPath is also very succinct, allowing you to call upon nodes in the XML tree with a great degree of specificity without being unnecessarily verbose.

XML as a tree structure[edit]

The great benefit about XML is that the document itself describes the structure of data. If any of you have researched your family history, you have probably come across a family tree. At the top of the tree is some early ancestor and at the bottom of the tree are the latest children.

With a tree structure you can see which children belong to which parents, which grandchildren belong to which grandparents and many other relationships.

The neat thing about XML is that it also fits nicely into this tree structure, often referred to as an XML Tree.

Understanding node relationships[edit]

We will use the following example to demonstrate the different node relationships.

<bookstore>
	<book> 
		<title>Less Than Zero</title>
		<author>Bret Easton Ellis</author>
		<year>1985</year>
		<price>13.95</price>
	</book> 
</bookstore>
Parent
Each element and attribute has one parent.
The book element is the parent of the title, author, year, and price:
Children
Element nodes may have zero, one or more children.
The title, author, year, and price elements are all children of the book element:
Siblings
Nodes that have the same parent.
The title, author, year, and price elements are all siblings:
Ancestors
A node's parent, parent's parent, etc.
The ancestors of the title element are the book element and the bookstore element:
Descendants
A node's children, children's children, etc.
Descendants of the bookstore element are the book, title, author, year, and price elements:

Also, it is still useful in some ways to think of an XML file as simultaneously being a serialized file, like you would view it in an XML editor. This is so you can understand the concepts of preceding and following nodes. A node is said to precede another if the original node is before the other in document order. Likewise, a node follows another if it is after that node in document order. Ancestors and descendants are not considered to be either preceding or following a node. This concept will come in handy later when discussing the concept of an axis.

Abbreviated vs. Unabbreviated XPath syntax[edit]

XPath was created so that nodes can be referred to very succinctly, while retaining the ability to search on many options. Most uses of XPath will involve searching for child nodes, parent nodes, or attribute nodes of a particular node. Because these uses are so common, an abbreviated syntax can be used to refer to these commonly-searched nodes. Following is an XML document that simulates a tree (the type that has leaves and branches.) It will be used to demonstrate the different types of syntax.

<?xml version="1.0" encoding="UTF-8"?>
    <trunk name="the_trunk"> 
        <bigBranch name="bb1" thickness="thick"> 
            <smallBranch name="sb1"> 
                <leaf name="leaf1" color="brown" />
		<leaf name="leaf2" weight="50" />
		<leaf name="leaf3" /> 
	    </smallBranch> 
	    <smallBranch name="sb2">
                <leaf name="leaf4" weight="90" /> 
		<leaf name="leaf5" color="purple" />   
            </smallBranch>
        </bigBranch> 
        <bigBranch name="bb2">
            <smallBranch name="sb3"> 
		<leaf name="leaf6" /> 
	    </smallBranch> 
	    <smallBranch name="sb4">	
		<leaf name="leaf7" /> 
		<leaf name="leaf8" /> 
		<leaf name="leaf9" color="black" /> 
		<leaf name="leaf10" weight="100" />	 
            </smallBranch>
        </bigBranch> 
    </trunk>

Exhibit 9.2: tree. xml – Example XML page

Following are a few examples of XPath location paths in English, Abbreviated XPath, then Unabbreviated XPath.


Selection 1:

English: All <leaf> elements in this document that are children of <smallBranch> elements that are children of <bigBranch> elements, that are children of the trunk, which is a child of the root.
Abbreviated: /trunk/bigBranch/smallBranch/leaf
Unabbreviated: /child::trunk/child::bigBranch/child::smallBranch/child::leaf

Selection 2:

English: The <bigBranch> elements with ‘name’ attribute equal to ‘bb3,’ that are children of the trunk element, which is a child of the root.
Abbreviated: /trunk/bigBranch[@name=’bb3’]
Unabbreviated: /child::trunk/child::bigBranch[attribute::name=’bb3’]

Notice how we can specify which bigBranch objects we want by using a predicate in the previous example. This narrows the search down to only bigBranch nodes that satisfy the predicate. The predicate is the part of the XPath statement that is in square brackets. In this case, the predicate is asking for bigBranch nodes with their ‘name’ attribute set to ‘bb3’.

The last two examples assume we want to specify the path from the root. Let’s now assume that we are specifying the path from a <smallBranch> node.

Selection 3:

English:The parent node of the current <smallBranch>. (Notice that this selection is relative to a <smallBranch>)
Abbreviated: ..
Unabbreviated: parent::node()

When using the Unabbreviated Syntax, you may notice that you are calling a parent or child followed by two colons (::). Each of those are called an axis. You will learn more about axes shortly.

Also, this may be a good time to explain the concept of a location path. A location path is the series of location steps taken to reach the node/nodes being selected. Location steps are the parts of XPath statements separated by / characters. They are one step on the way to finding the nodes you would like to select.

Location steps are comprised of three parts: an axis (child, parents, descendant, etc.), a node test (name of a node, or a function that retrieves one or more nodes), and a series of predicates (tests on the retrieved nodes that narrow the results, eliminating nodes that do not pass the predicate’s test).

So, in a location path, each of its location steps returns a node-list. If there are further steps on the path after a location step, the next step is executed on all the nodes returned by that step.

Relative vs. Absolute paths[edit]

When specifying a path with XPath, there are times when you will already be ‘in’ a node. But other times, you will want to select nodes starting from the root node. XPath lets you do both. If you have ever worked with websites in HTML, it works the same way as referring to other files in HTML hyperlinks. In HTML, you can specify an Absolute Path for the hyperlink, describing where another page is with the server name, folders, and filename all in the URL. Or, if you are referring to another file on the same site, you need not enter the server name or all of the path information. This is called a Relative Path. The concept can be applied similarly in XPath.

You can tell the difference by whether there is a ‘/’ character at the beginning of the XPath expression. If so, the path is being specified from the root, which makes it an Absolute Path. But if there is no ‘/’ at the beginning of the path, you are specifying a Relative Path, which describes where the other nodes are relative to the context node, or the node for which the next step is being taken.

Below is an XSL stylesheet (Exhibit 9.3) for use with our tree.xml file above (Exhibit 9.2).

<?xml version="1.0" encoding="UTF-8" ?>
 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="html"/>
 
<!-- Example of an absolute link. The element '/child::trunk' 
 is being specified from the root element. -->
 
 <xsl:template match="/child::trunk"> 
 
<html> 
    <head> 
        <title>XPath Tree Tests</title> 
    </head> 
     <body> 
 
<!-- Example of a relative link. The <for-each> xsl statement will 
    execute for every <bigBranch> node in the 
    ‘current’ node, which is the <trunk>node. -->
 
 <xsl:for-each select="child::bigBranch">
 
         <xsl:call-template name="print_out" />
           </xsl:for-each>
        </body> 
   </html> 
</xsl:template> 
      <xsl:template name="print_out"> 
             <xsl:value-of select="attribute::name" /> <br/>    
   </xsl:template>
 </xsl:stylesheet>

Exhibit 9.3: xsl_tree.xsl – Example of both a relative and absolute path

Four types of XPath location paths[edit]

In the last two sections you learned about two different distinctions to separate out different location paths: Unabbreviated vs. Abbreviated and Relative vs. Absolute. Combining these two concepts could be helpful when talking about XPath location paths. Not to mention, it could make you sound really smart in front of your friends when you say things like:

  1. Abbreviated Relative Location Paths- Use of abbreviated syntax while specifying a relative path.
  2. Abbreviated Absolute Location Paths- Use of abbreviated syntax while specifying a absolute path.
  3. Unabbreviated Relative Location Paths- Use of unabbreviated syntax while specifying a relative path.
  4. Unabbreviated Absolute Location Paths- Use of unabbreviated syntax while specifying a absolute path.

I only mention this four-way distinction now because it could come in handy while reading the specification, or other texts on the subject.

XPath axes[edit]

In XPath, there are some node selections whose performance requires the Unabbreviated Syntax. In this case, you will be using an axis to specify each location step on your way through the location path.

From any node in the tree, there are 13 axes along which you can step. They are as follows:


Axes Meaning
ancestor:: Parents of the current node up to the root node
ancestor-or-self:: Parents of the current node up to the root node and the current node
attribute:: Attributes of the current node
child:: Immediate children of the current node
descendant:: Children of the current node (including children's children)
descendant-or-self:: Children of the current node (including children's children) and the current node
following:: Nodes after the current node (excluding children)
following-sibling:: Nodes after the current node (excluding children) at the same level
namespace:: XML namespace of the current node
parent:: Immediate parent of the current node
preceding:: Nodes before the current node (excluding children)
preceding-sibling:: Nodes before the current node (excluding children) at the same level
self:: The current node

XPath predicates and functions[edit]

Sometimes, you may want to use a predicate in an XPath Location Path to further filter your selection. Normally, you would get a set of nodes from a location path. A predicate is a small expression that gets evaluated for each node in a set of nodes. If the expression evaluates to ‘false’, then the node is not included in the selection. An example is as follows:

//p[@class=‘alert’]

In the preceding example, every <p> tag in the document is checked to see if its ‘class’ attribute is set to ‘alert’. Only those <p> tags with a ‘class’ attribute with value ‘alert’ are included in the set of nodes for this location path.

The following example uses a function, which can be used in a predicate to get information about the context node.

/book/chapter[position()=3]

This previous example selects only the chapter of the book in the third position. So, for something to be returned, the current <book> element must have at least 3 <chapter> elements.

Also notice that the position function returns an integer. There are many functions in the XPath specification. For a complete list, see the W3C specification at http://www.w3.org/TR/xpath#corelib

Here are a few more functions that may be helpful:

number last() – last node in the current node set

number position() – position of the context node being tested

number count(node-set) – the number of nodes in a node-set

boolean starts-with(string, string) – returns true if the first argument starts with the second

boolean contains(string, string) – returns true if the first argument contains the second

number sum(node-set) – the sum of the numeric values of the nodes in the node-set

number floor(number) – the number, rounded down to the nearest integer

number ceiling(number) – the number, rounded up to the nearest integer

number round(number) – the number, rounded to the nearest integer

Example[edit]

The following XML document, XSD schemas, and XSL stylesheet examples are to help you put everything you have learned in this chapter together using real life data. As you study this example you will notice how XPath can be used in the stylesheet to call and modify the output of specific information from the document.

Below is an XML document (Exhibit 9.4)

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="movies.xsl" type="text/xsl" media="screen"?>
<movieCollection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="movies.xsd"> 
 
<movie>
    <movieTitle>Meet the Parents</movieTitle>
    <movieSynopsis>
    Greg Focker is head over heels in love with his girlfriend Pam, and is ready to
    pop the big question. When his attempt to propose is thwarted by a phone call
    with the news that Pam's younger sister is getting married, Greg realizes that
    the key to Pam's hand in marriage lies with her formidable father.
    </movieSynopsis>
    <role>
        <roleIDREF>bs1</roleIDREF>
        <roleType>Lead Actor</roleType>
    </role>
    <role>
        <roleIDREF>tp1</roleIDREF>
        <roleType>Lead Actress</roleType>
    </role>
    <role>
        <roleIDREF>rd1</roleIDREF>
        <roleType>Lead Actor</roleType>
    </role>
    <role>
        <roleIDREF>bd1</roleIDREF>
        <roleType>Supporting Actress</roleType>
    </role>      
</movie>
 
<movie>
    <movieTitle>Elf</movieTitle>
    <movieSynopsis>
    One Christmas Eve, a long time ago, a small baby at an orphanage crawled into
    Santa’s bag of toys, only to go undetected and accidentally carried back to Santa’s 
    workshop in the North Pole. Though he was quickly taken under the wing of a surrogate
    father and raised to be an elf, as he grows to be three sizes larger than everyone else, 
    it becomes clear that Buddy will never truly fit into the elf world. What he needs is
    to find his real family. This holiday season, Buddy decides to find his true place in the
    world and sets off for New York City to track down his roots.
    </movieSynopsis>
    <role>
        <roleIDREF>wf1</roleIDREF>
        <roleType>Lead Actor</roleType>
    </role>
    <role>
        <roleIDREF>jc1</roleIDREF>
        <roleType>Supporting Actor</roleType>
    </role>
    <role>
        <roleIDREF>zd1</roleIDREF>
        <roleType>Lead Actress</roleType>
    </role>
    <role>
        <roleIDREF>ms1</roleIDREF>
        <roleType>Supporting Actress</roleType>
    </role>      
    </movie>
 
<castMember>
    <castMemberID>rd1</castMemberID>
    <castFirstName>Robert</castFirstName>
    <castLastName>De Niro</castLastName>
    <castSSN>489-32-5984</castSSN>
    <castGender>male</castGender>     
</castMember> 
 
<castMember>
    <castMemberID>bs1</castMemberID>
    <castFirstName>Ben</castFirstName>
    <castLastName>Stiller</castLastName>
    <castSSN>590-59-2774</castSSN>
    <castGender>male</castGender>     
</castMember>
 
<castMember>
    <castMemberID>tp1</castMemberID>
    <castFirstName>Teri</castFirstName>
    <castLastName>Polo</castLastName>
    <castSSN>099-37-8765</castSSN>
    <castGender>female</castGender>      
</castMember>  
 
<castMember>
    <castMemberID>bd1</castMemberID>
    <castFirstName>Blythe</castFirstName>
    <castLastName>Danner</castLastName>
    <castSSN>273-44-8690</castSSN>
    <castGender>male</castGender>     
</castMember> 
 
<castMember>
    <castMemberID>wf1</castMemberID>
    <castFirstName>Will</castFirstName>
    <castLastName>Ferrell</castLastName>
    <castSSN>383-56-2095</castSSN>
    <castGender>male</castGender>     
</castMember>
 
<castMember>
    <castMemberID>jc1</castMemberID>
    <castFirstName>James</castFirstName>
    <castLastName>Caan</castLastName>
    <castSSN>389-49-3029</castSSN>
    <castGender>male</castGender>      
</castMember> 
 
<castMember>
    <castMemberID>zd1</castMemberID>
    <castFirstName>Zooey</castFirstName>
    <castLastName>Deschanel</castLastName>
    <castSSN>309-49-4005</castSSN>
    <castGender>female</castGender>      
</castMember>
 
<castMember>
    <castMemberID>ms1</castMemberID>
    <castFirstName>Mary</castFirstName>
    <castLastName>Steenburgen</castLastName>
    <castSSN>988-43-4950</castSSN>
    <castGender>female</castGender>      
</castMember>
 
</movieCollection>

Exhibit 9.4: movies_xpath.xml

Below is the second XML document (Exhibit 9.5)

<?xml version="1.0" encoding="UTF-8"?>
 
<cities xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="cities.xsd">
 
<city>
    <cityID>c2</cityID>
    <cityName>Mandal</cityName>
    <cityPopulation>13840</cityPopulation>
    <cityCountry>Norway</cityCountry>
    <tourismDescription>A small town with a big atmosphere.  Mandal provides comfort
away from normal luxuries.
    </tourismDescription>
    <capitalCity>c3</capitalCity>
</city>
 
<city>
    <cityID>c3</cityID>
    <cityName>Oslo</cityName>
    <cityPopulation>533050</cityPopulation>
    <cityCountry>Norway</cityCountry>
    <tourismDescription>Oslo is the capital of Norway for many reasons.
    It is also the capital location for tourism.  The culture, shopping,
    and attractions can all be experienced in Oslo.  Just remember
    to bring your wallet.
    </tourismDescription>
</city>
 
</cities>

Exhibit 9.5: cites__xpath.xml

Below is the Movies schema (Exhibit 9.6)

<?xml version="1.0" encoding="UTF-8"?>
 
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
 
  <!--Movie Collection-->
 
  <xsd:element name="movieCollection">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="movie" type="movieDetails" minOccurs="1" maxOccurs="unbounded"/>
 
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
 
  <!--This contains the movie details.-->
 
  <xsd:complexType name="movieDetails">
    <xsd:sequence>
      <xsd:element name="movieTitle" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/>
      <xsd:element name="movieSynopsis" type="xsd:string"/>
      <xsd:element name="role" type="roleDetails" minOccurs="1" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
 
 <!--The contains the genre details.-->
 
  <xsd:complexType name="roleDetails">
    <xsd:sequence>
       <xsd:element name="roleIDREF" type="xsd:IDREF"/>
       <xsd:element name="roleType" type="xsd:string"/>     
    </xsd:sequence>
  </xsd:complexType>
 
  <xsd:simpleType name="ssnType">
       <xsd:restriction base="xsd:string">
           <xsd:pattern value="\d{3}-\d{2}-\d{4}"/>
       </xsd:restriction>
   </xsd:simpleType>
 
 <xsd:complexType name="castDetails">
    <xsd:sequence>
       <xsd:element name="castMemberID" type="xsd:ID"/>
       <xsd:element name="castFirstName" type="xsd:string"/>
       <xsd:element name="castLastName" type="xsd:string"/>
       <xsd:element name="castSSN" type="ssnType"/>
       <xsd:element name="castGender" type="xsd:string"/>   
    </xsd:sequence>
  </xsd:complexType>
 
</xsd:schema>

Exhibit 9.6: movies.xsd

Below is the Cities schema (Exhibit 9.7)

<?xml version="1.0" encoding="UTF-8"?>
 
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
attributeFormDefault="unqualified">
 
<xsd:element name="cities">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element name="city" type="cityType" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element>
<xsd:complexType name="cityType">
  <xsd:sequence>
    <xsd:element name="cityID" type="xsd:ID"/>
     <xsd:element name="cityName" type="xsd:string"/>
     <xsd:element name="cityPopulation" type="xsd:integer"/>
     <xsd:element name="cityCountry" type="xsd:string"/>
     <xsd:element name="tourismDescription" type="xsd:string"/>
     <xsd:element name="capitalCity" type="xsd:IDREF" minOccurs="0" maxOccurs="1"/>
  </xsd:sequence>
</xsd:complexType>
</xsd:schema>

Exhibit 9.7: cities.xsd


Below is the XSL stylesheet (Exhibit 9.8)

<?xml version="1.0" encoding="UTF-8"?>
 
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="castList" match="castMember" use="castMemberID"/>
<xsl:output method="html"/>
 
<!-- example of using an abbreviated absolute path to pull info 
from cities_xpath.xml for the city "Oslo" specifically -->
 
<!-- specify absolute path to select cityName and assign it the variable "city" -->
<xsl:variable name="city" select="document('cities_xpath.xml')
/cities/city[cityName='Oslo']/cityName" />
 
<!-- specify absolute path to select cityCountry and assign it the variable "country" -->
<xsl:variable name="country" select="document('cities_xpath.xml')
/cities/city[cityName='Oslo']/cityCountry" />
 
<!-- specify absolute path to select tourismDescription and assign it the variable "description" -->
<xsl:variable name="description" select="document('cities_xpath.xml')
/cities/city[cityName='Oslo']/tourismDescription" />
 
<xsl:template match="/">
<html>
    <head>
        <title>Movie Collection</title>
    </head>
    <body>
        <h2>Movie Collection</h2>
    <xsl:apply-templates select="movieCollection"/>
    </body>
</html>
</xsl:template>
<xsl:template match="movieCollection">
 
<!-- let's say we just want to see the actors. -->
<!--
<xsl:for-each select="movie">
<hr />
<br />
<b><xsl:text>Movie Title: </xsl:text></b>
<xsl:value-of select="movieTitle"/>
<br />
<br />
<b><xsl:text>Movie Synopsis: </xsl:text></b>           
<xsl:value-of select="movieSynopsis"/>
<br />
<br />-->
 
<!-- actor info begins here. -->
<b><xsl:text>Cast: </xsl:text></b>
<br />
<!-- specify an abbreviated relative path here for "role." 
NOTE: there is no predicate in this one; it's just a path. -->
 
<xsl:for-each select="movie/role"> 
<xsl:sort select="key('castList',roleIDREF)/castLastName"/>
<xsl:number value="position()" format="&#xa; 0. " />
<xsl:value-of select="key('castList',roleIDREF)/castFirstName"/>
<xsl:text>   </xsl:text>               
<xsl:value-of select="key('castList',roleIDREF)/castLastName"/>               
<xsl:text>,   </xsl:text>
<xsl:value-of select="roleType"/>
<br />
<xsl:value-of select="key('castList',roleIDREF)/castGender"/>
<xsl:text>,   </xsl:text>
<xsl:value-of select="key('castList',roleIDREF)/castSSN"/>
<br />
<br />
</xsl:for-each>      
<!--
</xsl:for-each>-->
<hr />
 
<!--calling the variables -->
 
<font color="red">
<p><b>Travel Advertisement</b></p>
 
<!-- reference the city, followed by a comma, and then the country -->
<p><xsl:value-of select="$city" />, <xsl:value-of select="$country" /></p>
 
<!-- reference the description -->
<xsl:value-of select="$description" />
 
</font>      
</xsl:template>
</xsl:stylesheet>

Exhibit 9.6: movies.xsl

Summary[edit]

Throughout the chapter we have learned many of the features and capabilities of the XML Path Language. You should now have a good understanding of node relationships though the use of the XML tree structure. Using the concept of Abbreviated and Unabbreviated location paths allows us to narrow our searches down to only a particular element by satisfying the predicate in the square brackets. Relative and Absolute are used for specifying the path to your location. The Relative path gives the file location in relation to the current working directory while the Absolute path gives an exact location of a file or directory name within a computer or file system. Both of these concepts can be combined to come up with four types of XPath location paths: Abbreviated Relative, Abbreviated Absolute, Unabbreviated Relative, and lastly Unabbreviated Absolute. If further filtering is required XPath predicates and functions can be used. These allow for the predicate to be evaluated for such things as true/false and count functions. When used correctly XPath can be a very powerful tool in the XML language.

Exercises[edit]

Answers[edit]



Xml book cover wiki.png XML - Managing Data Exchange
Chapters
Appendices
Exercises
Related Topics
Computer Science Home
Library and Information Science Home
Markup Languages
Get Involved
To do list
Contributors list
Contributing to Wikibooks
Previous Chapter Next Chapter
XPath CSS



Learning objectives

  • Learn different techniques of implementing XLink's in XML
  • create a custom XLink
  • learn the functionality behind various XLink parameters

sponsored by:

The University of Georgia

Terry College of Business

Department of Management Information Systems



Introduction[edit]

Through the use of Uniform Resource Identifiers (URI's), an XLink allows elements to be inserted into XML documents that create links between resources such as documents, images, files and other pages. An XLink is similar in concept to an HTML hyperlink, but is more powerful and flexible.

This chapter will be a general overview of the XLink syntax. It will also provide exposure to some of XLink's basic concepts. For the full XLink specification, see the latest version of the standard at:

http://www.w3.org/TR/xlink

XLink[edit]

XLinks create a linking relationship between two or more resources. They allow for any XML element, image, text or markup files to be specified in the link.

By using a method similiar to the centralized formatting of XSL stylesheets, XLinks allow a document's hyperlinks to be isolated and centralized in a separate document. As a linked document's addresses changes, the XLink remains functional.

The use of XLink requires the declaration of the XLink namespace. This namespace provides the global attributes for type, href, role, arcrole, title, show, actuate, label, from and to. The following example would make the prefix xlink available within the tourGuide element.

<tourGuide
  xmlns:xlink="http://www.w3.org/1999/xlink">
  ...
</tourGuide>

XLink global attributes[edit]

The following table outlines the attributes that can be used with the xlink namespace. The global attributes are type, href, role, arcrole, title, show, actuate, label, from, and to. The table also includes descriptions of how the attributes can be used.


Exhibit 1: Table of global attributes

Attributes

Description and Valid Values

type

Describes the meaning of an item

  • simple - basic format similar to html linkage
  • extended - more complex than simple types with multi functional format
  • resource - provides local resources
  • locator - provides remote resources
  • arc - provides the ability to traverse from one resource to another
  • title - readable name or explanation of link

href

Location of resource

  • value is URI

role

Description of XLink's content

  • value is URI
  • describes the element whose role it is

arcrole

Description of XLink's content

  • value is URI
  • describe the relationship between the two sides of the arc

title

Name displayed, usually short description of link

show

Describes behavior of the browser once the XLink has been actuated and loaded

  • new - load in a new window or frame
  • replace - load in same window or frame
  • embed - replace the current item
  • other - look for information elsewhere
  • none - not specified

actuate

Specifies when resource is retrieved or link processing occurs

  • onRequest - when user requests the link
  • onLoad - when page is loaded
  • other - looks for information elsewhere
  • none - not specified

label, from & to

Specifies link direction



XML schema[edit]


The following XML schema defines a tour guide that contains at least one city. Each city contains one or more attractions. The name of each attraction is an XLink.

Exhibit 2: XML schema for TourGuide

<?xml version="1.0" encoding="UTF-8"?>
<!--
      Document   : TourGuide.xsd
      Created on : February 28, 2006
      Author     : Billy Timmins
-->
<!--
      Declaration of usage of xlink Namespace
-->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified"
            xmlns:xlink="http://www.w3.org/1999/xlink">  
    <xsd:element name="tourGuide">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="city" type="cityDetails" minOccurs="1" maxOccurs="unbounded" />
            </xsd:sequence>
        </xsd:complexType>
     </xsd:element>
     <!--
     This section will contain the City details
     -->
     <xsd:complexType name="cityDetails">
         <xsd:sequence>
           <xsd:element name="cityName" type="xsd:string"/>
           <xsd:element name="adminUnit" type="xsd:string"/>
           <xsd:element name="country" type="xsd:string"/>
           <xsd:element name="continent">
                <xsd:simpleType>
                  <xsd:restriction base="xsd:string">
                  <xsd:enumeration value="Asia"/>
                  <xsd:enumeration value="Africa"/>
                  <xsd:enumeration value="Australia"/>
                  <xsd:enumeration value="Europe"/>
                  <xsd:enumeration value="North America"/>
                  <xsd:enumeration value="South America"/>
                  <xsd:enumeration value="Antarctica"/>
                  </xsd:restriction>
                </xsd:simpleType>
            </xsd:element>
            <xsd:element name="population" type="xsd:integer"/>
            <xsd:element name="description" type="xsd:string"/>
            <xsd:element name="attraction" type="attractionDetails" minOccurs="1" maxOccurs="unbounded"/>
         </xsd:sequence>
     </xsd:complexType>
     <xsd:complexType name="attractionDetails">
         <xsd:sequence>
         <!--   
         Note use of xlink
         -->
            <xsd:element name="attractionName" xlink:type="simple"/>
            <xsd:element name="attractionDescription" type="xsd:string"/>
            <xsd:element name="attractionRating" type="xsd:integer"/>
        </xsd:sequence>
     </xsd:complexType>
</xsd:schema>

XML document[edit]


The following XML document shows how the XLink, attractionName, defined in the XML schema, is used in an XML document. Note that it is necessary to include xlink:href="" within the attribute tags in order to define the linked website.

Exhibit 3: XML document for TourGuide.xsd (using XLink)

<?xml version="1.0" encoding="UTF-8"?>
<!--
      Document   : SomeTourGuide.xml
      Created on : February 28, 2006
      Author     : Billy Timmins
-->
<!--
      Declaration of usage of XLink Namespace
-->
<?xml-stylesheet href="TourGuide.xsl" type="text/xsl"?>
<tourGuide xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xlink="http://www.w3.org/1999/xlink" xsi:noNamespaceSchemaLocation="TourGuide.xsd">
    <city>
        <cityName>Atlanta</cityName>
        <adminUnit>Georgia</adminUnit>
        <country>USA</country>
        <continent>North America</continent>
        <population>425000</population>
        <description>Atlanta is the capital of and largest city in the U.S. state of Georgia.</description>
        <attraction>
            <!--
            Declaration of XLink and associated link
            -->
            <attractionName xlink:href="http://www.georgiaaquarium.org/"> Georgia Aquarium </attractionName>
            <attractionDescription>World’s Largest Aquarium</attractionDescription>
            <attractionRating>5</attractionRating>
        </attraction>
        <attraction>
            <!--
            Declaration of XLink and associated link
            -->
            <attractionName xlink:href="http://www.high.org/"> High Museum of Art </attractionName>
            <attractionDescription>The High Museum of Art, founded in 1905 as the Atlanta Art Association, is the leading art museum in the Southeastern United States.</attractionDescription>
            <attractionRating>4</attractionRating>
        </attraction>
        <attraction>
            <!--
            Declaration of XLink and associated link
            -->
            <attractionName xlink:href="http://www.underground-atlanta.com/"> Underground Atlanta </attractionName>
            <attractionDescription> Go beneath the streets of a bustling downtown, to the heart of a great American city.  Underground Atlanta is at the center of it all.</attractionDescription>
            <attractionRating>2</attractionRating>
        </attraction>        
    </city>
    <city>
        <cityName>Tampa</cityName>
        <adminUnit>Florida</adminUnit>
        <country>USA</country>
        <continent>North America</continent>
        <population>303000</population>
        <description>Tampa is a major United States city located in Hillsborough County, on the west coast of Florida.</description>
        <attraction>
            <!--
            Declaration of XLink and associated link
            -->
            <attractionName xlink:href="http://www.buschgardens.com/buschgardens/fla/default.aspx"> Bush Gardens </attractionName>
            <attractionDescription>The nation's fourth largest zoo, Bush Gardens is where you can see African animals roaming free and an exciting amusement park featuring its world-famous rides like Kumba and the new inverted roller-coaster, Montu.</attractionDescription>
            <attractionRating>5</attractionRating>
        </attraction>
        <attraction>
            <!--
            Declaration of XLink and associated link
            -->
            <attractionName xlink:href="http://www.plantmuseum.com/"> Henry B. Plant Museum </attractionName>
            <attractionDescription>Discover a museum which transports you to turn-of-the-century Florida.</attractionDescription>
            <attractionRating>1</attractionRating>
        </attraction>      
    </city>
</tourGuide>

XML stylesheet[edit]


The following XML stylesheet displays the contents of the XML document.

Exhibit 4: XML stylesheet TourGuide

<?xml version="1.0" encoding="UTF-8"?>
<!--
      Document   : TourGuide.xsl
      Created on : February 28, 2006
      Author     : Billy Timmins
-->
<!--
      Declaration of usage of XLink Namespace
-->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xlink="http://www.w3.org/1999/xlink" exclude-result-prefixes="xlink" version="1.0">
    <xsl:output method="html"/>
    <!--
    Attribute XLink defined as an href of simple type
    -->
    <xsl:template match="*[@xlink:type = 'simple' and @xlink:href]">
        <a href="{@xlink:href}">
            <xsl:apply-templates/>
        </a>
    </xsl:template>
    <xsl:template match="/">
        <html>
            <head>
                <title>Tour Guide XLink Example</title>
            </head>
            <body>
                <h2>Cities</h2>
                <xsl:apply-templates select="tourGuide"/>
            </body>
        </html>
    </xsl:template>
    <!-- 
    template for handling a link 
    -->
    <xsl:template match="attractionName">
        <a href="{@xlink:href}">
            <xsl:value-of select="."/>
        </a>
    </xsl:template>
    <xsl:template match="tourGuide">
        <table border="1" width="100%">
            <xsl:for-each select="city">
                <tr>
                    <td>
                        <br/>
                        <xsl:text>City: </xsl:text>
                        <xsl:value-of select="cityName"/>
                        <br/>
                        <xsl:text>County: </xsl:text>
                        <xsl:value-of select="adminUnit"/>
                        <br/>    
                        <xsl:text>Continent: </xsl:text>
                        <xsl:value-of select="continent"/>
                        <br/>
                        <xsl:text>Population: </xsl:text>
                        <xsl:value-of select="population"/>
                        <br/>
                        <xsl:text>Description: </xsl:text>
                        <xsl:value-of select="description"/>
                        <br/>              
                        <br/>
                    </td>
                </tr>
                <tr>
                    <td>
                        <xsl:text>Attraction: </xsl:text>
                    </td>
                    <td>
                        <xsl:text>Attraction Description: </xsl:text>
                    </td>
                    <td>
                        <xsl:text>Attraction Rating: </xsl:text>
                    </td>
                </tr>
                <xsl:for-each select="attraction">
                    <tr>
                        <td>
                            <!--
                            application of the template
                            -->
                            <xsl:apply-templates select="attractionName"/>
                        </td>
                        <td>
                            <xsl:value-of select="attractionDescription"/>
                        </td>
                        <td>
                            <xsl:value-of select="attractionRating"/>
                        </td>
                    </tr>
                </xsl:for-each>
            </xsl:for-each>
        </table>
    </xsl:template>
</xsl:stylesheet>

Summary[edit]

XLink is an extremely versatile specification that standardizes the process for linking to other data sources. Not only does XLink support unidirectional linking similar to an anchor tag in HTML but also can be used to create bidirectional links. Additionally, XLink allows for the linkage from any XML element. This gives great freedom to the developer.

Exercises[edit]

Answers[edit]



Xml book cover wiki.png XML - Managing Data Exchange
Chapters
Appendices
Exercises
Related Topics
Computer Science Home
Library and Information Science Home
Markup Languages
Get Involved
To do list
Contributors list
Contributing to Wikibooks
Previous Chapter Next Chapter
XLink XSLT and Style Sheets




Cascading Stylesheets (CSS)[edit]

Learning objectives[edit]

Upon completion of this chapter, for CSS you will be able to

  • know the benefits of using CSS
  • know the limitations of CSS, so you are able to find the best solution for your document
  • know how to implement and use CSS on an XML document

Introduction[edit]

CSS (Cascading Style Sheets) is a language that describes the presentation form of a structured document.

An XML or an HTML based document does not have a set style, but it consists of structured text without style information. How the document will look when printed on paper and viewed in a browser or maybe a cellphone is determined by a style sheet. A good way of making a document look consistent and easy to update is by using CSS, which Wikipedia is a good example of.

History of CSS[edit]

Style sheets have been around in one form or another since the beginnings of HTML in the early 1990s. Various browsers included their own style language which could be used to customize the appearance of web documents. Originally, style sheets were targeted towards the end-user; early revisions of HTML did not provide many facilities for presentational attributes, so it was often up to the user to decide how web documents would appear.

As the HTML language grew, however, it came to encompass a wider variety of stylistic capabilities to meet the demands of web developers. With these capabilities, style sheets became less important, and an external language for the purposes of defining style attributes was not widely accepted until the development of CSS.

The concept of Cascading Style Sheets was originally proposed in 1994 by Håkon Wium Lie. Bert Bos was at the time working on a browser called Argo which used its own style sheets; the two decided to work together to develop CSS.

A number of other style sheet languages had already been proposed, but CSS was the first to incorporate the idea of "cascading" -- the capability for a document's style to be inherited from more than one "style sheet." This permitted a user's preferred style to override the site author's specified style in some areas, while inheriting, or "cascading" the author's style in other areas. The capability to cascade in this way permits both users and site authors added flexibility and control; it permitted a mixture of stylistic preferences.

Håkon's proposal was presented at the "Mosaic and the Web" conference in Chicago in 1994, and again with Bert Bos in 1995. Around this time, the World Wide Web Consortium was being established; the W3C took an interest in the development of CSS, and organized a workshop toward that end. Håkon and Bert were the primary technical staff on the project, with additional members, including Thomas Reardon of Microsoft, participating as well. By the end of 1996, CSS was nearly ready to become official. The CSS level 1 Recommendation was published in December 1996.

Early in 1997, CSS was assigned its own working group within the W3C. The group began tackling issues that had not been addressed with CSS level 1, resulting in the creation of CSS level 2, which was published as an official Recommendation in May 1998. CSS level 3 is still under development as of 2005.

Why use CSS?[edit]

Cleaner Looking Code[edit]

A mass of HTML tags which manage design elements generally obscure the content of a page, making the code harder to read and maintain. Using CSS, the content of the page is separated from the design, making content production in formats such as HTML, XHTML, and XML as easy as possible.

Pages Will Load Faster[edit]

Non-CSS design typically consists of more code than a CSS-designed website.

In a non-CSS design, the information about the design is reloaded every time a visitor accesses a new page. Additionally, the finer points of design are executed awkwardly. For example, a common method of defining the spacing of a web page is to use blank GIF images inside tables.

Using CSS keeps content and design separated, so much less code will be needed. The CSS file loads only once per session, and is saved locally in the user's cache. All information about dimensions is defined in this stylesheet, rendering awkward constructions like blank GIF images unnecessary.

Although an increasing amount of Internet users have broadband, the size of a web page can be important to users who are limited to dial-up connections. Suppose a dial-up user accesses a company's website, and this visitor experiences lengthy loading times. It is quite possible that the visitor would stop their visit or form an opinion of this company as "slow." In this way, a seemingly small difference could mean added revenue.

Furthermore, bandwidth is not free and most webhosting firms limit the amount used. In fact, many hosts charge based on bandwidth usage, so less code could also reduce costs.

Redesign Becomes Trivial[edit]

When used properly, CSS is a very powerful tool that gives a web architect complete control over a site's presentation. It is a notation in which the rules of a design are governed. This becomes very useful for a large website which requires a consistent appearance for every type of element (such as a title, a subtitle, a piece of code, or a paragraph).

For example, suppose a company has a 1,200 page website which took many months to complete. The company then undergoes a rebranding and thus the font, the background, the style of hyperlinks, and so forth needs to be updated with the new corporate design. If the site was engineered properly using CSS, this change would be as simple as editing the appropriate lines of a single CSS file (assuming it is an external stylesheet). If CSS is not used, the code that manages the appearance is stored in each of the pages. In order to update the design in this case, each file would have to be updated individually.

Template:Section expansion request

Graceful Degradation[edit]

Template:Section expansion request

Accessibility[edit]

People with lowered vision or users with special web browsers, e.g. people that are blind, will probably like a CSS designed website better than one not designed using CSS. Because CSS allows you to define the reading order separately from the visual layout it makes it easier for the special web browsers to read the page. Bear in mind that anyone who wears glasses or contact lenses can be considered to have lower vision.

Many designers lock the font size in pixels which prevents the user changing the font size. Good CSS design allows the user to increase or decrease the font size at will making pages more usable. A significant number of web surfers like to use a magnification of 300% or more.

Giving the user the opportunity to change the font size will not make any difference for the normal user, but it can make a difference for people that have lowered vision. Ask yourself the question: who is the website made for? The visitors or the designer?

Websites designed with CSS tend to display better than table-based designs in the web browsers used in PDAs and cellphones. The use of cellphones for browsing will probably continue to increase. A table-based design will make web pages inaccessible to these users.

Be careful with your CSS designs. Misuse of absolute positioning and absolute rather than relative sizes can make your webpages less accessible rather than more accessible. A good table design is better than a bad CSS design.

Better results in search engines[edit]

Extensive use of tables confuses the search engines, they can actually get problems separating content from code. The search engine robots start reading on the top of the page, and they want to find out how relevant the webpage is as fast as possible. Again, less code will make it easier for the search engines to find code that's relevant, and it will probably give your webpage a better ranking.

Disadvantages of CSS[edit]

The use of CSS for styling has few disadvantages. However some browsers, especially older ones, will sometimes present the page incorrectly. When I was gathering information for this chapter it became clear to me that many experts think that formatting XML with CSS is not the future of the web. The main view is that XSL will be the new standard. So make sure you read through the previous chapter of this book one more time. The formatting parts of XSL and CSS will be quite similar. For example, you will be able to use all CSS1 and CSS2 properties and values in XSL with the same meaning as in CSS.

CSS levels[edit]

The first CSS specification to become an official W3C Recommendation is CSS level 1, published in December 1996. Among its capabilities is support for:

  • Typeface|Font properties such as typeface and emphasis
  • Color of text, backgrounds, and other elements
  • Text attributes such as spacing between words, letters, and lines of text
  • alignment (typesetting)|Alignment of text, images, tables and other elements
  • Margin, border, padding, and positioning for most elements
  • Unique identification and generic classification of groups of attributes

The W3C maintains the CSS1 Recommendation.

CSS level 2 was developed by the W3C and published as a Recommendation in May 1998. A superset of CSS1, CSS2 includes a number of new capabilities, among them the absolute, relative, and fixed positioning of elements, the concept of media types, support for aural style sheets and bidirectional text, and new font properties such as shadows. The W3C maintains the CSS2 Recommendation.

CSS level 2 revision 1 or CSS 2.1 fixes errors in CSS2, removes poorly-supported features and adds already-implemented browser extensions to the specification. It's currently a Candidate Recommendation.

CSS level 3 is currently under development. The W3C maintains a CSS3 progress report.

CSS Syntax and Properties[edit]

The following section contains a list of some of the most common CSS properties. A complete list can be found here. The syntax for the use of CSS in an XML document is the same as that for HTML. The difference is in how you link your CSS file to the XML document. To do this you have to write <?xml-stylesheet href="X.css" type="text/css"?> before the root element of your XML document, where X.css of course is the name of the CSS file.

As mentioned earlier in this chapter, CSS is a set of rules that determines how elements in a document will be shown. The rule has two parts: a selector and a group of one or more declarations surrounded by braces (curly brackets):

selector { declaration; ...}

The selector is normally the tag you wish to style. Here is an example of a simple rule containing a single declaration:

h1 { color: red; }

Result: All h1-elements in the document are shown with the text color red.

The general syntax[edit]

Rules are usually defined like this:

selector { declaration; ...}

The declaration is formed like this:

property: value;

Remember that there can be several declarations in one rule. A common mistake is to mix up colons, which separate the property and value of a declaration, and semicolons, which separate declarations. A selector chooses the elements for which the rule applies and the declaration sets the value for the different properties of the elements that are chosen.

Back to our example:

h1 { color: red; }

In our example:

selector is the element h1
declaration color: red

The property color gets the value red

Multiple declarations can be written either on a single line or over several lines, because whitespace collapses:

h1 { color:red; background-color:white; }

or

h1 {
color:red;
background-color:white;
}

Details of the properties defined by CSS can be found at CSS Programming#CSS1 Properties.

Summary[edit]

Cascading Style Sheets (CSS), are used with webpages to define the view of information saved in HTML or XML. While XML and HTML create and preserve a documents structure, CSS is used to define the appearance and placement of objects within the document as well as its content. All of this information is saved in a separate file, the .css file. In the CSS file are textsize, background color, text types, e.g defined. The placement of pictures and other animations are also defined in the css file. If CSS is used correctly it would make a webpage a lot easier to create and even more important, to maintain. Because you will only have to make changes in the css file to make the whole website change.

File:Csszengarden nocss.png
CSS Zen Garden without CSS
File:Csszengarden1 css.png
Zen Garden with CSS


References and useful links[edit]

References:

Useful links:

Exercises[edit]

Exercise 1[edit]

Using the CSS file provided below, create a price list for books as an XML document. <?xml version="1.0"?> Exercise1.css:

<book> Lord of the rings</book> book{
  display: block;
  background-color: transparent;
  margin: 20px 10px 10px 200px;
}
<isbn>1.000.56439 </isbn> isbn{
  display: block;
  font: 12pt/15pt georgia, serif;
}
<title> The Two Towers </title> title {
  display: block;
  font: 14pt/18pt verdana, sans-serif;
}
<author> J.R.R. Tolkien </author> author {
  display: block;
  font: italic 12pt/15pt georgia, serif;
}
<publisher> Penguin </author> author {
  display: block;
  font: 12pt/15pt georgia, serif;
}
<price> 48 EUR </price> price{
  display: block;
  font: bold 12pt/15pt georgia, serif;
  color: #ff0000;
  background-color: transparent;
}

Exercise 2[edit]

Create a personal homepage, where you introduce yourself.

The page should contain one header, one footer, and navigation as a list of links.

Solutions[edit]

Solutions

CSS Challenges[edit]

Copy and paste the HTML, then take up the challenge to create a stylesheet to match the picture! See Web Design:CSS Challenges


Xml book cover wiki.png XML - Managing Data Exchange
Chapters
Appendices
Exercises
Related Topics
Computer Science Home
Library and Information Science Home
Markup Languages
Get Involved
To do list
Contributors list
Contributing to Wikibooks
Previous Chapter Next Chapter
XSLT and Style Sheets Parsing XML files




Learning objectives

  • Understand the function of Cocoon
  • Create a working sitemap
  • Make available a stylesheet-formatted XML document
  • Create a simple Cocoon form
  • Create a simple XSP

sponsored by:

The University of Georgia

Terry College of Business

Department of Management Information Systems



Introduction[edit]

Cocoon is a product of the Apache Software Foundation. It is a powerful server heavily based on Java and XML technology. While it does have a command line interface, most users will be able to do everything they need to with it simply through careful editing of a few configuration files, formatted as XML documents. If you want to see some examples of what Cocoon can do, go to http://MIST5730.terry.uga.edu:8080/cocoon/.

Assumptions[edit]

This tutorial is set up based on the user having access to an installation of Cocoon on Terry’s Blaze server. If you do not have this access, simply replace file locations and access methods with those provided by your server administrator. Some programs described may be Windows-only; you will need to find out a suitable replacement if you are a Macintosh or Linux user, although these utilities are often included with the operating system. JEdit is a free text editor that can read and save files on an FTP or SFTP server as easily as on a hard disk, and properly manipulate many different types of files, with the proper plugins. It is available for Windows, Macintosh, some Linux distributions and as a platform-independent Java application at http://www.jedit.org/.

The Sitemap[edit]

The primary Cocoon file to be concerned with is sitemap.xmap, located in the root Cocoon directory. It uses XML tags to define things such as different ways to present data, the location of important files, identification of browsers, and the most important aspect, pipelines. The default xmap will be fine for our purposes, and we will only need to look at the last few lines of it, where pipeline matches are defined. This section begins at the tag <map:pipeline>. A pipeline match looks like this:

<map:match pattern=”test”>
	<map:generate type=”file” src=”content/test.xml”/>
	<map:transform type=”xslt” src=”stylesheets/test.xslt”/>
	<map:serialize type=”html”/>
</map:match>

Let’s look at what each line does. The first line tells Cocoon to watch for someone browsing to http://blaze.terry.uga.edu:8080/cocoon/otc/test. When this happens, the actions on the next three lines take place. Cocoon will take the information from the file test.xml within the content directory, and apply the stylesheet test.xslt from the stylesheets directory. It formats this result as an html page, as specified on the fourth line. Cocoon can use different serializers to format data as an html or xhtml page, flash object, pdf, or even OpenOffice document. Unlike when working with XML for other purposes, no XSD schema is needed – simply create and populate fields in the XML file as necessary.

Cocoon Forms[edit]

Cocoon forms, or CForms, are a way to use XML structure to create validating form field objects and then arrange them in a template for use. The primary advantage of CForms over using HTML forms is that fields can be validated either with built-in functionality or simple XML attributes. There are several elements required for this. A definition XML file, which holds the fields, called "widgets":

<fd:field id="email" required="true">
<fd:label>Email address:</fd:label>
<fd:datatype base="string"/>
<fd:validation>
<fd:email/>
</fd:validation>
</fd:field>


A template XML file calls on these widgets, adding HTML code to help with look and feel:

<br/>
<ft:widget-label id="email"/>
<ft:widget id="email"/>

A Javascript file that controls the flow of data from one file to the next:

function registration() {
var form = new Form("registration_definition.xml");
form.showForm("registration-display-pipeline");
var viewData = { "username" : form.getChild("name").getValue() }
cocoon.sendPage("registration-success-pipeline", viewData);
}

Pipelines in the sitemap that also control flow:

<map:match pattern="registration">
<map:call function="registration"/>
</map:match>
...
<map:match pattern="registration-display-pipeline">
<map:generate type="jx" src="registration_template.xml"/>
<map:transform type="i18n">
<map:parameter name="locale" value="en-US"/>
</map:transform>
<map:transform src="forms-samples-styling.xsl"/>
<map:serialize/>
</map:match>
...
<map:match pattern="registration-success-pipeline">
<map:generate type="jx" src="registration_success.jx"/>
<map:serialize/>
</map:match>

An XSP can be used in this flow in order to pass submissions to a database.

XSPs[edit]

XSPs function similarly to JSPs and servlets - they are server-side applications that can support many users at once. Unlike JSPs and servlets, XSPs can use XML tags to accomplish much of their functionality, although they can also use Java code between <xsp:logic></xsp:logic> tags. One good use for XSPs is passing information to a database or recalling and displaying stored data. While JSPs and servlets have to either call a specific database connector or contain all of the code for connecting within them, Cocoon has a configuration file which holds this information, and XSPs just call the name of the database as specified in WEB-INF/cocoon.xconf:

<esql:pool>dbname</esql:pool>

XSP code to enter data from a form might look like this:

  <esql:execute-query>
  <esql:query>
  INSERT into otc_users (name,email,password,age,spam) values  ('<xsp:expr>esc_name</xsp:expr>','<xsp-request:get-parameter name="email"/>','<xsp-request:get-parameter name="password"/>','<xsp-request:get-parameter name="age"/>','<xsp-request:get-parameter name="spam"/>')
  </esql:query>
  </esql:execute-query>

Exercises[edit]

  1. Create a basic XML file and accompanying html stylesheet. Upload them into the proper folders (content and stylesheets respectively) on the Blaze server, and write a pipeline match that would enable you to view the XML content with your stylesheet applied in a browser. Files and match pattern should be named after your own name, for example Bob Jones would use “bjones.” It is not necessary to upload the pipeline code - simply browse to http://blaze.terry.uga.edu:8080/cocoon/otc/yourname and it should be visible.
  2. Follow along with the CForms example located at http://cocoon.apache.org/2.1/userdocs/basics/sample.html. Create and implement at least one widget of your own making. You can view this at work by browsing to http://blaze.terry.uga.edu:8080/cocoon/cforms/registration.
  3. Browse to opt/tomcat5/webapps/cocoon/cforms on Blaze. Examine sitemap-modified.xmap to see how the pipelines could be modified to pass CForm data to an XSP. Test.xsp shows how that data could be inserted into or called from a database.

Appendix - Accessing the Blaze server[edit]

When you have an account set up on the Blaze server, there are several steps you will need to take in order to be able to work with files in the Cocoon directory. Generally, new user accounts are set up with the user’s UGA MyId as the username, and social security number as the password. This password must be changed at the user’s first login, which requires using an SSH client to accomplish. UGA students can download Secure Shell Utilities 3.1 at http://sitesoft.uga.edu/. Two programs are installed by this download, Secure Shell Client and Secure File Transfer Client.

Open the Secure Shell client and click the “Quick Connect” button located near the top of the window. In the resulting window, enter “blaze.terry.uga.edu” as the Host Name, and your specified username as User Name. Port Number should be set to “22”, and Authentication Method should be “Passworded”. Click “Connect”. In the resulting window, enter your given password and click “Ok”. You may see a window asking to save the new host key, click “Yes”. You will now be presented with a text box. It will notify you that your password has expired and must be changed. You will need to enter your given password once, hit enter, enter your desired new password, hit enter, and again enter your desired new password and hit enter. Be aware that nothing you type will show up for security purposes, and you will not be able to delete any typos - you'll have to log in and start over if you mess up. This is all we will be using the Secure Shell Client application for; you can click the “Disconnect” button in the row of small buttons at the top of the screen, and then exit the program.

In order to actually access files on the Blaze server, the Secure File Transfer Client is used. Open it and click the “Quick Connect” button located near the top of the window, entering the same Host Name as with the Secure Shell Client, your new password, and make sure the other settings are the same. Click “Connect.” You will be presented with a Windows Explorer-type screen where you can browse through the files on the Blaze server. To access our Cocoon installation go to the “opt” folder, then the “tomcat5” folder, then the “webapps” folder, then the cocoon folder. Most of our work will be done in the “otc” folder within. To download a file for editing, simply highlight it and click the “Download” button in the row of small buttons at the top of the screen. Once you select a download location and click “Download.” You can then open it in your editor of choice. To upload a file to the server, simply do the reverse – click the “Upload” button in the row of small buttons as the top of the screen, select a file to upload, and click “Upload,” which will put the file in the folder you are currently viewing on the Blaze server.


Xml book cover wiki.png XML - Managing Data Exchange
Chapters
Appendices
Exercises
Related Topics
Computer Science Home
Library and Information Science Home
Markup Languages
Get Involved
To do list
Contributors list
Contributing to Wikibooks
Previous Chapter Next Chapter
Cocoon XUL



Learning objectives

  • Understand the concept of parsing XML files
  • Use different APIs for processing XML files
  • Be aware of the differences between different approaches for parsing XML files
  • Decide when to use a particular technique


In the earlier chapters we were taught how to create XML files in detail. This involved the development of XML documents, Style sheets and Schema and their validation. In this chapter, we will focus on different approaches for parsing XML files and when to use them.

But first, it is time to refresh what we have learned about parsing.

The Process of Parsing XML files[edit]

One goal of the XML format was to enhance raw data formats like plain text by including detailed descriptions of the meaning of the content. Now, in order to be able to read XML files, we use a parser which basically exposes the document’s content through a so-called API (application programming interface). In other words, a client application accesses the content of the XML document through an interface, instead of having to interpret the XML code on its own!

Simple Text Parsing[edit]

One way to extract data from an XML document is simple text parsing – browsing all characters in the document and check for a desired pattern:


<house>
<value><int>150,000</int></value>
</house>

Let’s say we are interested in the value of the house. Using straight text parsing, we would scan the file for the character sequence <value><int> and call it the start pattern. Then, we would further scan the document for the end pattern (i.e. </int></value>). Finally, we declare the text string in between these two patterns to be the value of the surrounding <house> tag.

Why it doesn't work that way[edit]

Obviously, this approach is not suitable for extracting information from large and complex XML documents, since we would have to know exactly what the file looks like and where the information needed is located. From a more general point of view, the structure and semantics of an XML file is determined by the makeup of the document, its tags and attributes – hence, we need a device that is able to recognize and understand this structure and can point out any errors in it. Moreover, it has to provide the content of the document through an interface, so that other applications can access it without difficulty. This device is known as an XML parser.

What a parser does[edit]

Almost all programs that need to process XML documents use an XML parser to extract the information stored in the XML document in order to avoid any of the difficulties that occur when reading and interpreting raw XML data. The parser usually is a class library (e.g. a set of Java class files) that reads a given document and checks if it is well-formed according to the W3C specification. Then, any client software can use methods of the interface provided by the parser API to access the information the parser retrieved from the XML file.

All in all, the parser shields the user from dealing with the complex details of XML like assembling information distributed over several XML files, checking for well-formedness constraints, and so on.

Parsing: an Example[edit]

To illustrate more clearly what parsing an XML file really means, the following example was created which contains information about some cities. It also keeps track of who is on vacation and demonstrates the parsing process with the currently most common parsing methods.

Example: cities.xml[edit]

<?xml version="1.0" encoding="UTF-8" ?>
<cities>
<city vacation="Sam">
<cityName>Atlanta</cityName>
<cityCountry>USA</cityCountry> 
</city>
<city vacation="David">
<cityName>Sydney</cityName>
<cityCountry>Australia</cityCountry> 
</city>
<city vacation="Ashley">
<cityName>Athens</cityName>
<cityCountry>Greece</cityCountry> 
</city>
</cities>

Based on the information stored in this XML document, we can easily check who is on vacation and where. The parser will read the file using one of the various techniques presented later in this chapter.

This process is very complicated and prone to errors of all kinds. Luckily, we will never have to write code for it, because there are plenty of free, fully-functional parsers on the Web. All we do is download a parser class library and access the XML document through the interface provided by the parser software. With more recent builds of Java, most parsers do not even have to be downloaded. In other words, we use the functions or methods included in the class library for extracting the information.

Basically, a parser reads the XML document and tries to recognize the structure of the file itself while checking for errors. It simply checks for start/end tags, attributes, namespaces, prefixes, and so on. Then, the client software can access the information derived from this structure using methods provided by the parser software (i.e. the interface).

The best way to learn about the functionality of a parser is to actually use them; therefore, the next section demonstrates the different methods of parsing.

Parser APIs (Application Programming Interface)[edit]

Overview[edit]

There are two “traditional” approaches that dominate the market right now, an event-based push-model as represented by SAX (Simple API for XML) and a tree-based model using the DOM (document object model) approach.

However, there is a movement towards newer approaches and techniques that try to overcome the flaws inherent in these traditional models – an event-based pull-model and a “cursor model”, such as VTD-XML, which allows us to browse the XML document just like in the tree-based approach, but simpler and easier to use.

SAX (Simple API for XML)[edit]

Description[edit]

The push model, typically exemplified by SAX (www.saxproject.org) is the “gold standard” of XML parsing, since it is probably the most complete and accurate method so far. The SAX classes provide an interface between the input streams from which XML documents are read and the client software which receives the data made available by the parser. The parser browses through the whole document and fires events every time it recognizes an XML construct (e.g. it recognizes a start tag and fires an event – the client software is notified and can use this information… or not).

Evaluation[edit]

The advantage of such a model is that we don’t need to store the whole XML document in memory, since we are only reading one piece of information at a time. If you recall that the XML structure is a set of nodes of various types (like an element node) – parsing the document with a SAX parser means going through each node one at a time. This makes it possible to read even very large XML documents in a memory-efficient way. However, the fact that the parser only provides information about the node currently read also implies that the programmer of the client software is in charge of saving certain information in a separate data structure (e.g. the parents or children of the currently processed node). Moreover, the SAX approach is pretty much read-only, since it is hard to modify the XML structure when we do not have some sort of global view.

In fact, the parser is in control of what is read when. The user can only wait until a certain event has occurred and then use the information stored in the currently processed node.

Example: TGSAXParser.java[edit]

As mentioned before, the best way to fully understand the concept of the parsing process is to actually use it. In the following code sample, the information about the name and country of the cities that people are vacationing in will be displayed. The SAX API that is part of the Xerces parser package was used for the implementation ((Xerces 2 Homepage):


// import the basic SAX API classes
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
 
public class TGSAXParser extends DefaultHandler
{
    public boolean onVacation = false;
 
    // what to do when a start-element event was triggered
    public void startElement(String uri, String name, String qName, Attributes atts)
    {
        // stores the string in the XML file          
        String vacationer = atts.getValue("vacation");
        String cityName = atts.getValue("cityName");
        String cityCountry = atts.getValue("cityCountry");
 
        // if the start tag is "city" set vacationer to true
        if (qName.equals("city") && (vacationer != null))
        {
            onVacation = true;
            System.out.print("\n" + vacationer + " is on vacation in ");
        }
        if (qName.equals("cityName") && onVacation)
            {                       
            }
        if (qName.equals("cityCountry") && onVacation)
        {                       
        }
    }
 
    /**This method is used to stop printing information once the element has
    *been read.  It will also reset the onVacation variable for the next
    *element.
    */
    public void endElement(String uri, String name, String qName)
    {
        //reset flag
        if (qName.equals("city"))
        {
            onVacation = false;
        }
    }
 
    /**This method is triggered to store and print the values between
    *the XML tags.  It will only print those values if onVacation == true.
    */
    public void characters(char[] ch, int start, int length)
    {
        if (onVacation)
        {
            for (int i = start; i < start + length; i++)
            System.out.print(ch[i]);
        }
    }
 
    public static void main(String[] args)
    {
        System.out.println("People on vacation in the following cities:");
 
        try
        {
            // create a SAX parser from the Xerces package
            XMLReader xml = XMLReaderFactory.createXMLReader();
            TGSAXParser handler = new TGSAXParser();
            xml.setContentHandler(handler);
            xml.setErrorHandler(handler);
            FileReader r = new FileReader("cities.xml");
            xml.parse(new InputSource(r));
        }
        catch (SAXException se)
        {
            System.out.println("XML Parsing Error: " + se);
        } 
        catch (IOException io) 
        {
            System.out.println("File I/O Error: " + io);
        }
    }
}

The DefaultHandler: As mentioned before, SAX is completely event-driven. Therefore, we need a handler that “listens” to the input stream coming from the input file (cities.xml in this case).

The SAX API provides interface classes, which we have to extend with our own code to read our own specific XML document. In order to include our code in the SAX API, we just have to extend the DefaultHandler interface with our own class and set the content handler to our custom handler class (which consists of three methods: startElement, endElement and characters)

The startElement() and endElement() methods: These methods are invoked whenever the SAX parser finds a start or end tag respectively. The SAX API provides blank stubs for both methods and we have to fill them with code of our own.

In this case, we want our program to do something whenever the vacation attribute is set, so we set a Boolean variable to true whenever we find such an element and process the node by printing out the character sequence in between the start and end tag. The character method is automatically called whenever a startElement and endElement event was triggered, but prints out the character string only if the onVacation attribute is set.

DOM (Document Object Model)[edit]

Description[edit]

The other popular approach is the tree-based model as represented by the DOM (document object model, see W3C Recommendation). This method actually works similarly to a SAX parser, since it reads the XML document from an input stream by browsing through the file and recognizing XML structures.

This time, instead of returning the content of the document in a series of small fragments, the DOM method maps the XML hierarchy to a DOM tree object that contains everything from the original XML document. Everything from elements, comments, textual information or processing instructions is stored in the tree object as nodes, starting with the document itself as the root node.

Now that all the information we need is stored in memory, we access the data by using methods provided by the parser software to read or modify objects within the tree. This facilitates random access to the content of the XML document and provides the possibility to modify the data it contains or even create new XML files by transforming a DOM back to an XML document.

Evaluation[edit]

However, the major downside of this approach is that it requires much more memory and is therefore not suitable for situations where large XML files are used. More importantly, it is somewhat more complex than the simplistic SAX method even for small and simple problems.

Example: MyDOMParser.java[edit]

In the following code sample, a list of cities with people on vacation is again created but this time with the tree-based approach:

// import all necessary DOM API classes
import org.apache.xerces.parsers.*;
import org.apache.xerces.dom.*;
import org.w3c.dom.*;
public class MyDOMParser{
public static void main(String[] args) {
System.out.println("People on vacation in the following cities:");  
try {
// creates a DOM parser object
DOMParser parser = new DOMParser();
parser.parse("cities.xml"); 
 
// stores the tree object in a variable
         org.w3c.dom.Document doc  = parser.getDocument();
 
// returns a list of all city elements in my city list
	 NodeList list = doc.getElementsByTagName("city");
 
// now, for every element in the city list, check if the
// "vacation" attribute is set and if yes, print out the   
// information about the vacationer.
for(int i = 0, length = list.getLength(); i < length; i++){
Element city  = (Element)list.item(i);
Attr vacationer = city.getAttributeNode("vacation");
if(vacationer!= null){
String v = vacationer.getValue();
System.out.print(v + " is vacationing in ");
 
// grab information about city name and country
// directly from the DOM tree object
ParentNode cityname = (ParentNode)
doc.getElementsByTagName("cityName").item(0);
ParentNode country = (ParentNode)
doc.getElementsByTagName("cityCountry").item(0);
System.out.println(cityname.getTextContent() + ", " + country.getTextContent());
}
}
} catch (Exception e) {         
System.out.println(e.getMessage());
}     
}
}

parser.getDocument(): Once we parsed the XML document, the tree object is temporarily stored in the parser variable. In order to work with the DOM object, we have to create a variable holding it (of type org.w3c.dom.Document).

Then, we create a list of nodes holding all elements with the tag name city. The parser finds these nodes by browsing through the DOM tree. Then, we just go through each one of the city-elements and check if the vacation attribute is set and display all the information about the vacationer if so.

Xerces provides a helpful method called getTextContent() that lets us directly access the text node of an element node, avoiding all difficulties emerging from unneeded white space and the like.

Summary[edit]

Choosing an API at the beginning of your XML project is a very important decision. Once you decide which one to use, it is easy to try different vendors without having much trouble, but switching to a different API will be a very time-consuming and costly process, since you will have to redesign your whole program code.

The SAX API is a widely accepted and well-working parser that is easy to implement and works especially well with streaming content (e.g. an online XML source). Because it is a read-only API, you would not be able to modify the underlying XML data source. Since it only reads one node at a time, it is very memory-efficient and fast. However, this implies that your application expects the information to be close together and ordered.

If you want to randomly access the entire document at any point of time, then the DOM approach might be a better choice for you. The DOM API is more complex and harder to implement, but gives you full control over the whole document and lets you modify the data, also. However, it reads the whole XML document into memory, so the DOM API is not suitable for projects with very large XML files.

Exercise[edit]

Recommended optional exercise[edit]

Use the code sample for the SAX and DOM parser from this chapter and play around with it. You probably want to print out different nodes or add more constraints. This absolutely optional, but will give you an idea of the main differences between SAX and DOM.

Now for the exercise[edit]

  • Create a SAX parser to parse the file movies.xml. The output simply needs to come from your IDE, it does not need to be sent onto a webpage.


TO HELP YOU download this, it provides a structure of the problem so that you can more easily run the app in NetBeans 5.0.

If you’re interested in using Xerces – just download the following file:

           http://www.apache.org/dist/xml/xerces-j/Xerces-J-bin.2.8.0.zip

If the above link is dead. Go to http://www.apache.org/dist/xml/xerces-j/ and download the latest zip binary file. It should be in the format of "Xerces-J-bin.#.#.#.zip"

Then put the content into the \lib\ext subfolder of your NetBeans directory and start up NetBeans IDE. Now, the Xerces package is successfully installed on your machine.

Useful Links[edit]

  1. http://www.cafeconleche.org
  2. http://www.xml.com
  3. http://www.xmlpull.org
  4. http://workshop.bea.com/xmlbeans/reference/com/bea/xml/XmlCursor.html
  5. http://workshop.bea.com/xmlbeans/reference/com/bea/xml/XmlCursor.html


If this text appears blue, the answers to the examples to this page may be found by clicking here.



Xml book cover wiki.png XML - Managing Data Exchange
Chapters
Appendices
Exercises
Related Topics
Computer Science Home
Library and Information Science Home
Markup Languages
Get Involved
To do list
Contributors list
Contributing to Wikibooks
Previous Chapter Next Chapter
Parsing XML files AJAX



Learning objectives
  • Get a brief overview of what XUL is.
  • Learn about the basic tag/widget library.
  • Create some simple, static XUL web pages.
  • Add event handlers to a XUL page.

Introduction[edit]

XUL (pronounced zool and rhymes with cool), which stands for eXtensible User interface Language, is an XML-based user interface language originally developed for use in the Netscape browser. It is now maintained by Mozilla. It is a part of Mozilla Firefox and many other Mozilla applications, and is available as part of Gecko, the rendering engine developed by Mozilla. In fact, XUL is powerful enough that the entire user interface in the Firefox application is implemented in XUL.

Like HTML, in XUL you can create an interface using a relatively simple markup language, define the appearance with CSS style sheets, and use JavaScript to manipulate behavior. Unlike HTML, however, XUL provides a rich set of user interface widgets to create, for example, menus, toolbars and tabbed panels.

To put it in simple terms, XUL can be used to create lightweight, cross-platform, cross-device user interfaces.

Many applications are developed using features of a specific platform that makes building cross-platform software time-consuming and costly. Some users may want to use an application on technologies other than traditional computers, such as small handheld devices. To date, there have been some cross-platform solutions already developed. Java, for example, was created just for such a purpose. However, creating GUIs with Java is cumbersome at best. Alternatively, XUL has been designed for building portable user interfaces easily and quickly. It is available on most versions of Windows, Mac OS X, Linux and Unix. Yahoo! currently uses XUL and related technologies for its Yahoo! tool bar (a Firefox extension) and Photomail application.

To illustrate XUL’s potential, this chapter will work through a few examples. Potential is the correct word here. The full capabilities of XUL are beyond the scope of this chapter but it is designed to give the reader a first look at the power of XUL. One more thing needs to be noted: you’ll need a Gecko-based browser (such as Firefox or the Mozilla Suite) or XULRunner to work with XUL.

The Basics[edit]

XUL is XML, and like all good XML files, a good XUL file begins with the standard XML version declaration. Currently, XUL is using the XML version 1.0.

To make your XUL page look good, you must include a global stylesheet in it. The URI of the default stylesheet is href = "chrome://global/skin/". While you can load as many stylesheets as you like, it is best practice to load the global stylesheet initially. Look at Fig.1. Notice the reference to “chrome”. ‘The chrome is the part of the application window that lies outside of a window's content area. Toolbars, menu bars, progress bars, and window title bars are all examples of elements that are typically part of the chrome.’(1) Chrome is the descriptive term used to name all of the elements in a XUL application. Think of it like the chrome on the outside of a car. It’s what catches your eye. The elements in a XUL file are what you see in the browser window.

All XML documents must have a namespace declaration. The developers of XUL have provided a namespace that shows where they came up with the name XUL. (The reference is from the movie ‘Ghostbusters’ for the uninitiated)

<?xml version="1.0"?>
<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>
<window
   id="window identifier"
   title="XUL page"
   orient="horizontal"
   xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">
   . . . (add elements here)
</window>

The next thing to note is the tag <window>. This tag is analogous to the <body> tag in HTML. All the elements will live inside the window tag. In Fig. 1 the window tag has three attributes that are very important. The ‘id’ attribute is important in that it is the way to identify the window so that scripts can refer to it. While the title attribute is not necessary, it is good practice to provide a descriptive name. The value of title will be displayed in the title bar of the window. The next attribute is very important. This tells the browser in what direction to lay out the elements described in the XUL file. Horizontal means just that. Lay out in succession across the window. Vertical is the opposite; it adds the elements in column format. Vertical is the default value so if you do not declare this attribute you’ll get vertical orientation.

As was stated earlier, a XUL document is used to create user interfaces. UI's are generally full of interactive components such as text boxes, buttons and the like. A XUL document accomplishes this with the use of widgets, which are self-contained components with pre-defined behavior. For example buttons will respond to mouse clicks and menu bars can hold buttons. All the normally accepted actions of GUI components are built in to the widgets. There is already a rich library of predefined widgets, but because this is open source, any one can define a widget or a set of widgets for themselves.

The widgets are ‘disconnected’ until they are programmed to work together. This can be done simply with JavaScript or a more complex application can be made using something like C++ or Java. In this chapter we will use JavaScript to illustrate XUL’s uses and potential.

Also, a XUL file should have .xul extension. The Mozilla browser will automatically recognize it and know what to do with it when you click on it. Optionally, an .xml extension could be used but you would have to open the file within the browser.

One more thing needs to be mentioned. There are a few syntax rules to follow and they are:

  • All events and attributes must be written in lowercase.
  • All strings must be double quoted.
  • Every XUL widget must use close tags (either <tag></tag> or <tag/>) to be well-formed.
  • All attributes must have a value.

A First Example[edit]

What better way to start then with the good old ‘Hello World’ example. Open up a text editor (not MS Word) like notepad or TextPad and type in:


<?xml version="1.0"?>
<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>
 
<window
id="Hello"
title="Hello World Example"
orient="vertical"
persist="screenX screenY width height"
xmlns= "http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">
 
<description style='font-size:24pt'>Hello World</description>
<description value='Hello World' style='font-size:24pt'/>
<label value = 'Hello World'  style='font-size:24pt'/>
</window>

Save it anywhere but be sure to give the file the .xul extension. Now just double click on it and it should open in your Mozilla or Netscape browser. You should get ‘Hello World’ three times, one on top of the other. Notice the different ways that ‘Hello World’ was printed: twice from a description tag and once from a label tag. Both <description> and <label> are text related tags. Using the description tag is the only way to write text that is not contents of a ‘value’ attribute. This means that you can write text that isn't necessarily assigned to a variable. In the second and third examples the text is expressed as an attribute to the tag description or label, respectively. You can see here that the orient attribute in window is set to ‘vertical’. That is why the text is output in a column. Otherwise, if orient was set to ‘horizontal’, all the text would be on one line. Try it.

Now let’s start adding some more interesting elements.

Adding Widgets[edit]

As stated earlier, XUL has an existing rich library of elements fondly called widgets. These include buttons, text boxes, progress bars, sliders and a host of other useful items. One good listing is the XUL Programmer's Reference.

Let us take a look at some simple buttons. Enter the following code and place it into a Notepad or other text editor that is not MS Word.

<?xml version="1.0"?>
<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>
 
<window
id="findfile-window"
title="Find Files"
orient="horizontal"
xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">
 
<button id="find-button" label="Find" default="true"/>
<button id="cancel-button" label="Cancel"/>
 
</window>


Save it and give the file the .xul extension. Open a Mozilla or Netscape browser and open the file from the browser. You should see a "find" button and a "cancel button". From here it is possible to add more functionality and build up elaborate interfaces.

There has to be some place to put all of these things and like the <body> tag in HTML, the <box> tag in XUL is used to house the widgets. In other words, boxes are containers that encapsulate other elements. There are a number of different <box> types. In this example we’ll use <hbox>, <vbox>, <toolbox> and <tabbox>.

<hbox> and <vbox> are synonymous with the attributes 'orient = "horizontal"' and 'orient = "vertical"', which respectively form the <window> tag. By using these two boxes, discrete sections of the window can have their own orientation. These two elements can hold all of the other elements and can even be nested.

The tags <toolbox> and <tabbox> serve special purposes. <toolbox> is used to create tool bars at the top or bottom of the window while <tabbox> sets up a series of tabbed sheets in the window.

Take the XUL framework from Fig. 1 and replace ". . .( add elements here)" with a <vbox> tag pair (that's both open and close tags). This will be the outside container for the rest of the elements. Remember, the <vbox> means that elements will be positioned vertically in order of appearance. Add the attribute 'flex="1"'. This will make the menu bar extend all the way across the window.

<?xml version="1.0"?>
<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>
 
<window
id="findfile-window"
title="Find Files"
orient="horizontal"
xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">
 
<vbox flex="1">
    (... add elements here)
</vbox>
 
</window>

The 'flex' attribute needs some explanation since it is a primary way of sizing and positioning the elements on a page. Flex is a dynamic way of sizing and positioning widgets in a window. The higher the flex number (1 being highest), the more that widget gets priority sizing and placement over widgets with lower flex settings. All elements have size attributes, such as width and/or height, that can be set to an exact number of pixels but using flex insures the same relative sizing and positioning when resizing a window occurs.

Now put a pair each of <toolbox> and <tabbox> tags inside of the <vbox> tags with <toolbox> first. As was said <toolbox> is used to create tool bars so lets add a toolbar similar to the one at the top of the browser.

This is the code so far:

<?xml version="1.0"?>
<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>
 
<window
id="findfile-window"
title="Find Files"
orient="horizontal"
xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">
 
<vbox flex="1">
 
<toolbox>
 
<menubar id="MenuBar">
<menu id="File" label="File" accesskey="f">
<menupopup id="FileMenu">
<menuitem label="New" accesskey="n"/>
<menuitem label="Open..." accesskey="o"/>
<menuitem label="Save" accesskey="s"/> 
<menuitem label="Save As..." accesskey="s"/>  
<menuitem label=" ... "/> 
<menuseparator/>
<menuitem label="Close" accesskey="c" />
</menupopup>
</menu>
 
<menu id="Edit" label="Edit" accesskey="e">
<menupopup id="EditMenu">
<menuitem label="Cut" accesskey="t" acceltext="Ctrl + X"/>
<menuitem label="Copy" accesskey="c"  acceltext="Ctrl + C"/>
<menuitem label="Paste" accesskey="p" disabled="true"/>
</menupopup>
</menu>
 
<menu id="View" label="View" accesskey="v">
<menupopup id="ViewMenu">
<menuitem id="Tool Bar1" label="Tool Bar1"
type="checkbox" accesskey="1" checked="true"/>
<menuitem id="Tool Bar2" label="Tool Bar2"
type="checkbox" accesskey="2" checked="false"/>
</menupopup>
</menu>
</menubar>
 
</toolbox>
 
<tabbox>
 
</tabbox>
 
</vbox>
 
</window>


There should now be a menu bar with “File Edit View” in it and they should each expand when you click on them. Let’s examine the elements and their attributes more closely to see how they work.

First the <menubar> holds all of the menu items (File, Edit ,View). Next there are the three different menu items. Each menu has a set of elements and attributes. The <menupopup> does just it says. It creates the popup menu that occurs when the menu label is clicked. In the popup menu is the list of menu items. Each of these has an 'accesskey' attribute. This attribute underlines the letter and provides the reference for making a hot key for that menu item. Notice in the Edit menu, both 'Cut' and 'Copy' have accelerator text labels. In the File menu there is a <menuseperator/> tag. This places a line across the menu that acts as a visual separator. In the Edit menu, notice the menu item labeled 'Paste' has an attribute: disabled="true". This causes the Paste label to be grayed out in that menu and finally in the View menu the menu items there are actually checkboxes. The first one is checked by default and the second one is not.

Now on to the <tabbox>. Let's make three different sheets with different elements on them. Put this code in between the <tabbox> tags:

<tabbox flex="1">
<tabs>
   <tab id="Tab1" label="Sheet1" selected="true"/>
   <tab id="Tab2" label="Sheet2"/>
   <tab id="Tab3" label="Sheet3"/>
</tabs>
 
<tabpanels flex="1">
   <tabpanel flex="1" id="Tab1Sheet" orient="vertical" >
   <description style="color:teal;">
      This doesn't do much.
      Just shows some of the style attributes.
   </description>
   </tabpanel>
 
   <tabpanel flex="1" id="Tab2Sheet" orient="vertical">
   <description class="normal">
      Hey, the slider works (for free).
   </description>
   <scrollbar/>
   </tabpanel>
 
   <tabpanel flex="1" id="Tab3Sheet" orient="vertical">
   <hbox>
      <text value="Progress Meter" id="txt" style="display:visible;"/>
      <progressmeter id="prgmeter" mode="undetermined"
         style="display:visible;" label="Progress Bar"/>		  
   </hbox>
   <description value="Wow, XUL! I mean cool!"/>   
   </tabpanel>
</tabpanels>
</tabbox>


The tabs are first defined with <tab>. They are given an id and label. Next, a set of associated panels is created, each with different content. The first one is to show that like HTML style sheets can be applied in line. The second two sheets have component type elements in them. See how the slider works and the progress bar is running on its own.

XUL has a number of types of elements for creating list boxes. A list box displays items in the form of a list. Any item in such a particular list can be selected. XUL provides two types of elements to create lists, a listbox element to create multi-row list boxes, and a menulist element to create drop-down list boxes, as we have already seen.

The simplest list box uses the listbox element for the box itself, and the listitem element for each item. For example, this list box will have four rows, one for each item.

<listbox>
  <listitem label="Butter Pecan"/>
  <listitem label="Chocolate Chip"/>
  <listitem label="Raspberry Ripple"/>
  <listitem label="Squash Swirl"/>
</listbox>

Like with the HTML option element, you a value can be assinged using the value attribute. The list box will set to a normal size, but you can alter the size to a certain level using the row attributes. Set it to the number of rows to display in the list box. A scroll bar will automatically come up to let the user be able to see the rest of the items in the list box if the box is too small.

<listbox rows="3">
  <listitem label="Butter Pecan" value="bpecan"/>
  <listitem label="Chocolate Chip" value="chocchip"/>
  <listitem label="Raspberry Ripple" value="raspripple"/>
  <listitem label="Squash Swirl" value="squash"/>
</listbox>

Assigning values to each of the listitems lets the user be able to reference them later using script. This way, other elements can be reference this items to be used for alternative purposes.

All these elements are very nice and easy to put into a window, but by themselves they don't do anything. Now we have to connect things with some other code.

Adding Event Handlers and Responding to Events[edit]

To make things really useful, some type of scripting or application level coding has to be done. In our example, JavaScript will be used to add functionality to the components. This is done in a similar fashion as to scripting with HTML. With HTML, an event handler is associated with an element and some action is initiated when that handler is activated. Most of the handlers used with HTML are also found in XUL, in addition to some unique ones. Scripting can be done in additional lines of code, but a more efficient way is to create a separate file with the needed scripts inside of it. This allows the page to load faster since the rendering engine doesn’t have to decide what to do with the embedded script tags.

That being said, we’ll first add a simple script, in line, as a first example.

Let’s add an ‘onclick’ event handler to fire an alert box when an element is selected. Inside the <window> tag add the line beginning with onclick:


<window
   onclick="alert(event.target.tagName); return false;"
   id="findfile-window"
   title="Find Files"
   orient="horizontal"
   xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">
       (... add elements here)
</window>


Now when you click on any element in the window, you created an alert box that pops up telling you the name of the element. One interesting thing to note: When you click on the text enclosed by the description tag the response is undefined but when you click on the text wrapped by the label tag you get the tabName label.

This implies that a description tag is not really an element. After playing with the alert box, delete that line and add this inside the opening tag of the ‘Close’ menu item in the ‘File’ menu:

oncommand="window.close()"


Now when you click on ‘Close’ or use the ‘C’ as a hot key, the entire window will close. The oncommand event handler is actually preferred over onclick because oncommand can handle hot keys and other non-mouse events.

Let’s try one more thing. Add this right after the opening <window> tag.

<script>
function show()
{
  var meter=document.getElementById('prgmeter');
  meter.setAttribute("style","display: visible;");
  var tx=document.getElementById('txt');
  tx.setAttribute("style","display: visible;");
}
 
function hide()
{
  var meter=document.getElementById('prgmeter');
  meter.setAttribute("style","display: none;");
  var tx=document.getElementById('txt');
  tx.setAttribute("style","display: none;");
}
 
</script>


These two functions first retrieve a reference to the progress meter and the text element using their ids. Then both functions set the style attributes of the progress meter and text element to have a display of 'visible' or ‘none’ which will do just that: hide or display those two elements. (The tabpanel for the progress meter has to be displayed in order to see these actions)

Now add two buttons that will provide the event to fire these two methods. First, add a new box element to hold the buttons. The width attribute of the box needs to be set otherwise the buttons will be laid out to extend the length of the window.

<box width="200px">
  <button id="show" label="Show" default="true" oncommand="show();"/>
  <button id="hide" label="Hide" default="true" oncommand="hide();"/>
</box>


Style Sheets[edit]

Style sheets may be used both for creating themes, as well as modifying elements for a more elaborate user interfaces. XUL uses CSS (Cascading Style Sheets) for this. A style sheet is a file which contains style information for elements. The style sheet makes it possible to apply certain fonts, colors, borders, and size to the elements of your choice. Mozilla applies a default style sheet to each XUL window. So far, this is the style sheet that has been used for all the XUL documents:

<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>


That line gives the XUL document the default chrome://global/skin/ style sheet. In Mozilla, this will be translated as the file global.css, which contains default style information for XUL elements. The file will still show is this line is left out but it will not be as aesthetically pleasing. The style sheet applies theme-specific fonts, colors and borders to make the elements look more suitable. Even though style sheets can provide a better looking file, adding styles cannot always provide a better view. Some CSS properties do not affect the appearance of a widget, such as those that change the size or margins. In XUL, the use of the "flex: attribute should be used instead of using specific sizes. There are other ways that CSS does not apply, and may be to advanced for this tutorial.

Using a style sheet that you perhaps have already made, you just have to insert one extra line of code pointing to the CSS file you have already made.

<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>
<?xml-stylesheet href="findfile.css" type="text/css"?>


This second line of code references the style sheet, and will take over as the default style sheet used for the XUL document. Sometimes it is desired not to have the style that comes with the default CSS file.

Answers[edit]

Conclusion[edit]

The examples shown in this chapter merely scratch the surface of XUL’s capabilities. Even though these examples are very simple, one can see how easy it would be to create more complex UI’s with XUL. With a complete set of the standard components such as buttons and text boxes at the programmer’s disposal, the programmer can code anything in XUL that can be coded in HTML. The cross-platform ability of XUL is another bonus but the fact that it doesn’t work with Microsoft’s Internet Explorer may suppress XUL’s widespread use. There is some hope that due to the delay in the development of the next version of IE that XUL may find it’s way into IE, but don’t hold your breath..

References[edit]

  1. 'Configurable Chrome' by Dave Hyatt (hyatt@netscape.com) (Last Modified 4/7/99)
  2. XML User Interface Language (XUL) - The Mozilla Organization
  3. XulPlanet
  4. XUL Programmer's Reference Manual, Fifth Draft: Updated for XUL 1.0



Xml book cover wiki.png XML - Managing Data Exchange
Chapters
Appendices
Exercises
Related Topics
Computer Science Home
Library and Information Science Home
Markup Languages
Get Involved
To do list
Contributors list
Contributing to Wikibooks
Previous Chapter Next Chapter
AJAX XMLHTTP



Learning objectives

  • To understand web services, and what they can do.
  • To know what SOAP is, and what web services use it for.
  • To know what Web Services Description Language is, and how to read a WSDL file.
  • To know what the Universal Description, Discovery, and Integration (UDDI) standard is.
  • To understand how to use Java to connect to a web service.

sponsored by:

The University of Georgia

Terry College of Business

Department of Management Information Systems


Web Services Overview[edit]

Web Services are a new breed of Web application. They are self-contained, self-describing, modular applications that can be published, located, and invoked across the Web. Web services perform functions, which can be anything from simple requests to complicated business processes. Once a Web service is deployed, other applications (and other Web services) can discover and invoke the deployed service. Web services make use of XML to describe the request and response, and HTTP as its network transport.

The primary difference between a Web Service and a web application relates to collaboration. Web applications are simply business applications which are located or invoked using web protocols. Similarly, Web Services also perform computing functions remotely over a network. However, Web Services use internet protocols with the specific intent of enabling inter operable machine to machine coordination.

Web Services have emerged as a solution to problems associated with distributed computing. Distributed computing is the use of multiple systems to perform a function rather than having a single system perform it. The previous technologies used in distributed computing, primarily Common Object Request Broker Architecture (CORBA) and Distributed Component Object Model (DCOM), had some limitations. For example, neither has achieved complete platform independence or easy transport over firewalls. Additionally, DCOM is not vendor independent, being a Microsoft product.

Some of the primary needs for a distributed computing standard were:

  • Cross-platform support for Business to Business, as well as internal, communication.
  • Concordance with existing Internet infrastructure as much as possible.
  • Scalability, both in number and complexity of nodes.
  • Internalization.
  • Tolerance of failure.
  • Vendor independence.
  • Suitability for trivial and non-trivial requests.

Over time, business information systems became highly configured and differentiated. This inevitably made system interaction extremely costly and time consuming. Developers began realizing the benefits of standardizing Web Service development. Using web standards seemed to be an intuitive and logical step toward attaining these goals. Web standards already provided a platform independent means for system communication and were readily accepted by information system users.

The end result was the development of Web Services. A Web Service forms a distributed environment, in which objects can be accessed remotely via standardized interfaces. It uses a three-tiered model, defining a service provider, a service consumer, and a service broker. This allows the Web Service to be a loose relationship, so that if a service provider goes down, the broker can always direct consumers to another one. Similarly, there are many brokers, so consumers can always find an available one. For communication, Web Services use open Web standards: TCP/IP, HTTP, and XML based SOAP.

At higher levels technologies such as XAML, XLANG, (transactional support for complex web transactions involving multiple web services) and XKMS (ongoing work by Microsoft and Verisign to support authentication and registration) might be added.

SOAP[edit]

SOAP structure

Simple Object Access Protocol (SOAP) is a method for sending information to and from Web Services in an extensible format. SOAP can be used to send information or remote procedure calls encoded as XML. Essentially, SOAP serves as a universally accepted method of communication with web services. Businesses adhere to the SOAP conventions in order to simplify the process of interacting with Web Services.

<SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/">
 <SOAP:Header>
 
  <!-- SOAP header -->
 
 </SOAP:Header>
 <SOAP:Body SOAP:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
 
  <!-- SOAP body -->
 
 </SOAP:Body>
</SOAP:Envelope>

A SOAP message contains either a request method for invoking a Web Service, or contains response information to a Web Service request.

Adhering to this layout when developing independent Web Services provides notable benefits to the businesses. Due to the fact that Web Applications are designed to be utilized by a myriad of actors, developers want them to be easily adoptable. Using established and familiar standards of communication ultimately reduces the amount of effort it takes users to effectively interact with a Web Service.

The SOAP Envelope is used for defining and organizing the content contained in Web Service messages. Primarily, the SOAP envelope serves to indicate that the specified document will be used for service interaction. It contains an optional SOAP Header and a SOAP Body. Messages are sent in the SOAP body, and the SOAP head is used for sending other information that wouldn't be expected in the body. For example, if the SOAP:actor attribute is present in the SOAP header, it indicates who the recipient of the message should be.

A web service transaction involves a SOAP request and a SOAP response. The example we will be using is a Web Service provided by Weather.gov. The input is latitude, longitude, a start date, how many days of forecast information desired, and the format of the data. The SOAP request will look like this:

  <?xml version="1.0" encoding="UTF-8" standalone="no"?/>
  <SOAP-ENV:Envelope
      SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" 
      xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
 
     <SOAP-ENV:Body>
        <m:NDFDgenByDayRequest xmlns:SOAPSDK1="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl">
           <latitude xsi:type="xsd:decimal">33.955464</latitude>
           <longitude xsi:type="xsd:decimal">-83.383245</longitude>
           <startDate xsi:type="xsd:date"></startDate>
           <numDays xsi:type="xsd:integer">1</numDays>
           <format>24 Hourly</format>
        </m:NDFDgenByDayRequest>
     </SOAP-ENV:Body>
 
  </SOAP-ENV:Envelope>

The startDate was left empty because this will automatically get the most recent data. The format data type is not defined because it is defined in the WSDL document.

The response SOAP looks like this.

  <?xml version="1.0" encoding="UTF-8" standalone="no"?/>
  <SOAP-ENV:Envelope
      SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" 
      xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
 
     <SOAP-ENV:Body>
        <NDFDgenByDayResponse xmlns:SOAPSDK1="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl">
           <dwmlByDayOut xsi:type="xsd:string">.....</dwmlByDayOut>
        </NDFDgenByDayResponse>
     </SOAP-ENV:Body>
 
  </SOAP-ENV:Envelope>

SOAP handles data by encoding it on the sender side and decoding it on the reciever side. The data types handled by SOAP are based on the W3C XML Schema specification. Simple types include strings, integers, floats, and doubles, while compound types are made up of primitive types.

  <element name="name" type="xsd:string" />
  <SOAP:Array SOAP:arrayType="xsd:string[2]">
     <string>Web</string>
     <string>Services</string>
  </SOAP:Array>

Because they are text based, SOAP messages generally have no problem getting through firewalls or other barriers. They are the ideal way to pass information to and from web services.

Service Description - WSDL[edit]

Web Service Description Language (WSDL) was created to provide information about how to connect to and query a specific Web Service. This document also adheres to strict formatting and organizational guidelines. However, the methods, parameters, and service information are application specific. Web Services perform different functionality and contain independent information, however they are all organized the same way. By creating a standard organizational architecture for these services, developers can effectively invoke and utilize them with little to no familiarization. To use a web service, a developer can follow the design standards of the WSDL to easily determine all the information and procedures associated with its usage.

Essentially, a WSDL document serves as an instruction for interacting with a Web Service. It contains no application logic, giving the service a level of autonomy. This enables users to effectively interact with the service without having to understand its inner workings.

The following is an example of a WSDL file for a web service that provides a temperature, given a U.S. zip code.

<?xml version="1.0"?>
<definitions xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" 
xmlns:si="http://soapinterop.org/xsd" xmlns:tns="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
xmlns:typens="http://www.weather.gov/forecasts/xml/DWMLgen/schema/DWML.xsd" xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" 
xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns="http://schemas.xmlsoap.org/wsdl/" 
targetNamespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl>
 
 <types>
    <xsd:schema targetNamespace="http://www.weather.gov/forecasts/xml/DWMLgen/schema/DWML.xsd">
       <xsd:import namespace="http://schemas.xmlsoap.org/soap/encoding/" />
       <xsd:import namespace="http://schemas.xmlsoap.org/wsdl/" />
       <xsd:simpleType name="formatType">
          <xsd:restriction base="xsd:string">
             <xsd:enumeration value="24 hourly" />
             <xsd:enumeration value="12 hourly" />
          </xsd:restriction>
       </xsd:simpleType>
       <xsd:simpleType name="productType">
          <xsd:restriction base="xsd:string">
             <xsd:enumeration value="time-series" />
             <xsd:enumeration value="glance" />
          </xsd:restriction>
       </xsd:simpleType>
       <xsd:complexType name="weatherParametersType">
          <xsd:all>
             <xsd:element name="maxt" type="xsd:boolean" />
             <xsd:element name="mint" type="xsd:boolean" />
             <xsd:element name="temp" type="xsd:boolean" />
             <xsd:element name="dew" type="xsd:boolean" />
             <xsd:element name="pop12" type="xsd:boolean" />
             <xsd:element name="qpf" type="xsd:boolean" />
             <xsd:element name="sky" type="xsd:boolean" />
             <xsd:element name="snow" type="xsd:boolean" />
             <xsd:element name="wspd" type="xsd:boolean" />
             <xsd:element name="wdir" type="xsd:boolean" />
             <xsd:element name="wx" type="xsd:boolean" />
             <xsd:element name="waveh" type="xsd:boolean" />
             <xsd:element name="icons" type="xsd:boolean" />
             <xsd:element name="rh" type="xsd:boolean" />
             <xsd:element name="appt" type="xsd:boolean" />
          </xsd:all>
       </xsd:complexType>
    </xsd:schema>
</types>
 
<message name="NDFDgenRequest">  
   <part name="latitude" type="xsd:decimal"/>
   <part name="longitude" type="xsd:decimal" />
   <part name="product" type="typens:productType" />
   <part name="startTime" type="xsd:dateTime" />
   <part name="endTime" type="xsd:dateTime" />
   <part name="weatherParameters" type="typens:weatherParametersType" />
</message>
 
<message name="NDFDgenResponse">
   <part name="dwmlOut" type="xsd:string" />
</message>
 
<message name="NDFDgenByDayRequest">  
   <part name="latitude" type="xsd:decimal" />
   <part name="longitude" type="xsd:decimal" />
   <part name="startDate" type="xsd:date" />
   <part name="numDays" type="xsd:integer" />
   <part name="format" type="typens:formatType" />
</message>
 
<message name="NDFDgenByDayResponse">
   <part name="dwmlByDayOut" type="xsd:string" />
</message>
 
<portType name="ndfdXMLPortType">
   <operation name="NDFDgen">
      <documentation> Returns National Weather Service digital weather forecast data </documentation>
      <input message="tns:NDFDgenRequest" />
      <output message="tns:NDFDgenResponse" />
   </operation>
   <operation name="NDFDgenByDay">
      <documentation> Returns National Weather Service digital weather forecast data summarized over either 24- or 12-hourly periods </documentation>
      <input message="tns:NDFDgenByDayRequest" />
      <output message="tns:NDFDgenByDayResponse" />
   </operation>
</portType>
 
<binding name="ndfdXMLBinding" type="tns:ndfdXMLPortType">
   <soap:binding style="rpc" transport="http://schemas.xmlsoap.org/soap/http" />
   <operation name="NDFDgen">
      <soap:operation soapAction="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl#NDFDgen" style="rpc" />
      <input>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </input>
      <output>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </output>
   </operation>
   <operation name="NDFDgenByDay">
      <soap:operation soapAction="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl#NDFDgenByDay" style="rpc" />
      <input>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </input>
      <output>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </output>
   </operation>
</binding>
 
<service name="ndfdXML">
   <documentation>The service has two exposed functions, NDFDgen and NDFDgenByDay. 
                        For the NDFDgen function, the client needs to provide a latitude and 
                        longitude pair and the product type. The client also needs to provide 
                        the start and end time of the period that it wants data for. For the 
                        time-series product, the client needs to provide an array of boolean values 
                        corresponding to which weather values should appear in the time series product. 
                        For the NDFDgenByDay function, the client needs to provide a latitude and longitude 
                        pair, the date it wants to start retrieving data for and the number of days worth 
                        of data. The client also needs to provide the format that is desired.</documentation>
   <port name="ndfdXMLPort" binding="tns:ndfdXMLBinding">
      <soap:address location="http://www.weather.gov/forecasts/xml/SOAP_server/ndfdXMLserver.php" />
   </port>
  </service>
</definitions>

The WSDL file defines a service, made up of different endpoints, called ports. The port is made up of a network address and a binding.

<service name="ndfdXML">
   <documentation>The service has two exposed functions, NDFDgen and NDFDgenByDay. 
                        For the NDFDgen function, the client needs to provide a latitude and 
                        longitude pair and the product type. The client also needs to provide 
                        the start and end time of the period that it wants data for. For the 
                        time-series product, the client needs to provide an array of boolean values 
                        corresponding to which weather values should appear in the time series product. 
                        For the NDFDgenByDay function, the client needs to provide a latitude and longitude 
                        pair, the date it wants to start retrieving data for and the number of days worth 
                        of data. The client also needs to provide the format that is desired.</documentation>
   <port name="ndfdXMLPort" binding="tns:ndfdXMLBinding">
      <soap:address location="http://www.weather.gov/forecasts/xml/SOAP_server/ndfdXMLserver.php" />
   </port>
</service>

The binding identifies the binding style and protocol for each operation. In this case, it uses Remote Procedure Call style binding, using SOAP.

   <binding name="ndfdXMLBinding" type="tns:ndfdXMLPortType">
   <soap:binding style="rpc" transport="http://schemas.xmlsoap.org/soap/http" />
   <operation name="NDFDgen">
      <soap:operation soapAction="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl#NDFDgen" style="rpc" />
      <input>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </input>
      <output>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </output>
   </operation>
   <operation name="NDFDgenByDay">
      <soap:operation soapAction="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl#NDFDgenByDay" style="rpc" />
      <input>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </input>
      <output>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </output>
   </operation>
</binding>

Port Types are abstract collections of operations. In this case, the operation is getTemp.

   <portType name="ndfdXMLPortType">
      <operation name="NDFDgen">
         <documentation> Returns National Weather Service digital weather forecast data     </documentation>
         <input message="tns:NDFDgenRequest" />
         <output message="tns:NDFDgenResponse" />
      </operation>
      <operation name="NDFDgenByDay">
         <documentation> Returns National Weather Service digital weather forecast data summarized over either 24- or 12-hourly periods </documentation>
         <input message="tns:NDFDgenByDayRequest" />
         <output message="tns:NDFDgenByDayResponse" />
      </operation>
   </portType>

Finally, messages are used by the operations to communicate - in other words, to pass parameters and return values.

   <message name="NDFDgenByDayRequest">  
      <part name="latitude" type="xsd:decimal" />
      <part name="longitude" type="xsd:decimal" />
      <part name="startDate" type="xsd:date" />
      <part name="numDays" type="xsd:integer" />
      <part name="format" type="typens:formatType" />
   </message>
 
   <message name="NDFDgenByDayResponse">
      <part name="dwmlByDayOut" type="xsd:string" />
   </message>

From the WSDL file, a consumer should be able to access data in a web service.

For a more detailed analysis of how this particular web service, please visit Weather.gov

Service Discovery - UDDI[edit]

You've seen how WSDL can be used to share interface definitions for Web Services, but how do you go about finding a Web Service in the first place? There are countless independent Web Services that are developed and maintained by just as many different organizations. Upon adopting Web Service practices and methodologies, developers sought to foster the involvement and creative reuse of their systems. It soon became apparent that there was a need for an enumerated record of these services and their respective locations. This information would empower developers to leverage the best practices and processes of Web Services quickly and easily. Additionally, having a central reference of current Web Service capabilities enables developers avoid developing redundant applications.

UDDI defines registries in which services can be published and found. The UDDI specification was creaed by Microsoft, Ariba, and IBM. UDDI defines a data structure and Application Programming Interface (API).

In the three-tier model mentioned before, UDDI is the service broker. Its function is to enable service consumers to find appropriate service providers.

Connecting to UDDI registries using Java can be accomplished through the Java API for XML Registries (JAXR). JAXR creates a layer of abstraction, so that it can be used with UDDI and other types of XML Registries, such as the ebXML Registry and Repository standard.

Using Java With Web Services[edit]

To execute a SOAP message, an application must be used to communicate with the service provider. Due to its flexibility, almost any programming language can be used to execute SOAP message. For our purposes, however, we will be focusing on using Java to interact with Web Services.

Using Java with web services requires some external libraries.

  • Apache SOAP Toolkit
  • Java Mail Framework
  • JavaBeans Activation Framework
  • Xerces XML parser

Let's go through using Java to query the Temperature Web Service we talked about earlier.

import java.io.*;
import java.net.*;
import java.util.*;
import org.apache.soap.util.xml.*;
import org.apache.soap.*;
import org.apache.soap.rpc.*;
 
public class TempClient
{
 
 public static float getTemp (URL url, String zipcode) throws Exception 
 {
 
  Call call = new Call ();
 
  // Service uses standard SOAP encoding
  String encodingStyleURI = Constants.NS_URI_SOAP_ENC;
  call.setEncodingStyleURI(encodingStyleURI);
 
  // Set service locator parameters
  call.setTargetObjectURI ("urn:xmethods-Temperature");
  call.setMethodName ("getTemp");
 
  // Create input parameter vector
  Vector params = new Vector ();
  params.addElement (new Parameter("zipcode", String.class, zipcode, null));
  call.setParams (params);
 
  // Invoke the service ....
  Response resp = call.invoke (url,"");
 
  // ... and evaluate the response
  if (resp.generatedFault ()) 
  {
   throw new Exception();
  } 
  else 
  {
   // Call was successful. Extract response parameter and return result
   Parameter result = resp.getReturnValue ();
   Float rate=(Float) result.getValue();
   return rate.floatValue();
  }
 }
 
 // Driver to illustrate service invocation
 public static void main(String[] args)
 {
  try
  {
   URL url=new URL("http://services.xmethods.net:80/soap/servlet/rpcrouter");
   String zipcode= "30605";
   float temp = getTemp(url,zipcode);
   System.out.println(temp);
  }
  catch (Exception e) 
  {
   e.printStackTrace();
  }
 }
}

This Java code effectively hides all the SOAP from the user. It invokes the target object by name and URL, and sets the parameter zipcode. But what does the underlying SOAP Request look like?

  <?xml version="1.0" encoding="UTF-8"?>
  <soap:Envelope xmlns:n="urn:xmethods-Temperature"
      xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
      xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
      xmlns:xs="http://www.w3.org/2001/XMLSchema" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 
     <soap:Body soap:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
        <n:getTemp>
           <zipcode xsi:type="xs:string">30605</zipcode>
        </n:getTemp>
     </soap:Body>
 
  </soap:Envelope>

As you see, the SOAP request uses the parameters passed in by the Java Call to fill out the SOAP envelope and direct the message. Similarly, the response comes back into the Java program as '70.0'. The response SOAP is also hidden by the Java program.

  <?xml version='1.0' encoding='UTF-8'?>
  <SOAP-ENV:Envelope 
      xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 
     <SOAP-ENV:Body>
        <ns1:getTempResponse xmlns:ns1="urn:xmethods-Temperature" 
            SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
           <return xsi:type="xsd:float">70.0</return>
        </ns1:getTempResponse>
     </SOAP-ENV:Body>
 
  </SOAP-ENV:Envelope>

Here's an additional example of using Java and SOAP to interact with Web Services. This particular Web Service is called the "US Zip Validator" and takes a ZipCode as a parameter, which then returns a corresponding latitude and longitude. When developing applications to interact with Web Services, the first step should be to review the WSDL document.

The WSDL document for this service is located here: http://www.webservicemart.com/uszip.asmx?WSDL

This document will contain all the necessary instructions for interacting with the "US Zip Validator" Web Service.

SOAPClient4XG

Modified by - Duncan McAllister From: http://www.ibm.com/developerworks/xml/library/x-soapcl/

import java.io.*;
import java.net.*;
import java.util.*;
 
public class SOAPClient4XG {
 
    public static void main(String[] args) throws Exception {
 
        args = new String[2];
 
        args[0] = "http://services.xmethods.net:80/soap/servlet/rpcrouter";
        args[1] = "SOAPrequest.xml";
 
        if (args.length  < 2) {
            System.err.println("Usage:  java SOAPClient4XG " +
                               "http://soapURL soapEnvelopefile.xml" +
                               " [SOAPAction]");
				System.err.println("SOAPAction is optional.");
            System.exit(1);
        }
 
        String SOAPUrl      = args[0];
        String xmlFile2Send = args[1];
 
		  String SOAPAction = "";
 
 
        // Create the connection where we're going to send the file.
        URL url = new URL(SOAPUrl);
        URLConnection connection = url.openConnection();
        HttpURLConnection httpConn = (HttpURLConnection) connection;
 
        // Open the input file. After we copy it to a byte array, we can see
        // how big it is so that we can set the HTTP Cotent-Length
        // property. (See complete e-mail below for more on this.)
 
        FileInputStream fin = new FileInputStream(xmlFile2Send);
 
        ByteArrayOutputStream bout = new ByteArrayOutputStream();
 
        // Copy the SOAP file to the open connection.
        copy(fin,bout);
        fin.close();
 
        byte[] b = bout.toByteArray();
 
        // Set the appropriate HTTP parameters.
        httpConn.setRequestProperty( "Content-Length",
                                     String.valueOf( b.length ) );
        httpConn.setRequestProperty("Content-Type","text/xml; charset=utf-8");
		  httpConn.setRequestProperty("SOAPAction",SOAPAction);
        httpConn.setRequestMethod( "POST" );
        httpConn.setDoOutput(true);
        httpConn.setDoInput(true);
 
        // Everything's set up; send the XML that was read in to b.
        OutputStream out = httpConn.getOutputStream();
        out.write( b );    
        out.close();
 
        // Read the response and write it to standard out.
 
        InputStreamReader isr =
            new InputStreamReader(httpConn.getInputStream());
        BufferedReader in = new BufferedReader(isr);
 
        String inputLine;
 
        while ((inputLine = in.readLine()) != null)
            System.out.println(inputLine);
 
        in.close();
    }
 
  // copy method from From E.R. Harold's book "Java I/O"
  public static void copy(InputStream in, OutputStream out) 
   throws IOException {
 
    // do not allow other threads to read from the
    // input or write to the output while copying is
    // taking place
 
    synchronized (in) {
      synchronized (out) {
 
        byte[] buffer = new byte[256];
        while (true) {
          int bytesRead = in.read(buffer);
          if (bytesRead == -1) break;
          out.write(buffer, 0, bytesRead);
        }
      }
    }
  } 
}

This Java class refers to an XML document(SOAPRequest.xml), which is used as the SOAP message. This document should be included in the same project folder as the Java application invoking the service.

After reviewing the "US Zip Validator" WSDL document, it is clear that we would like to invoke the "getTemp" method. This information is contained within the SOAP body and includes the appropriate parameters.

SOAPRequest.xml

<?xml version="1.0" encoding="UTF-8"?>
  <soap:Envelope xmlns:n="urn:xmethods-Temperature"
      xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
      xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
      xmlns:xs="http://www.w3.org/2001/XMLSchema" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 
     <soap:Body soap:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
        <n:getTemp>
           <zipcode xsi:type="xs:string">30605</zipcode>
        </n:getTemp>
     </soap:Body>
 
  </soap:Envelope>

Following a successful interaction, the Web Service provider will provide a response that is similar in format to the user request. When developing in NetBeans, run this project and examine the subsequent SOAP message response in the Tomcat output window.

Web Services with Netbeans[edit]

The Netbeans version used for this explanation is 5.0.

After Netbeans is open, click on the "Runtime" tab on the left pane, then right-click "Web Services" and select "Add Web Service." In the "URL" field, enter the address of the web service WSDL file, in our example above it is "http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" and click Get Web Service Description. This will bring up the information of the web service.

Summary[edit]

Web services are applications that use XML to communicate with many different systems to perform a task. To facilitate the use of web services, protocols were developed that allow them to be flexible and scalable. SOAP is used to send and define information and WSDL was created to provide information about how to connect to and query a web service. UDDI describes where these web services can be found.

Exercises[edit]

Answers[edit]

Appendix[edit]

References and Links[edit]

  • Weather.gov
  • WebServices.org
  • W3C's Web Services Reference
  • UDDI.org
  • XMethods.net
  • Java API for XML Registries
  • Apache SOAP Toolkit
  • JavaMail Framework
  • Java Activation Framework
  • Xerces Java Parser
  • Jasnowski, Mike. Java, XML, and Web Services Bible
  • Microsoft Web Services
  • http://www.xml.com/
  • http://www.w3schools.com/soap/soap_intro.asp
  • http://en.wikibooks.org/wiki/XML_-_Managing_Data_Exchange/Web_services
  • http://www.w3.org/TR/2007/REC-soap12-part0-20070427/
  • http://www.eweek.com/article2/0,1895,1589730,00.asp
  • http://www.ibm.com/developerworks/xml/library/x-soapcl/


  • Previous Chapter Next Chapter
    Web Services Database and XML



    History[edit]

    The XMLHttpRequest object enables JavaScript to make HTTP requests to a remote server without the need to reload the page. It was first implemented by Microsoft as an ActiveX object but is now available as a native object within both Mozilla and Apple's Safari browser. Javascript is used to transfer information back to the server in real time where it can be processed by the server, and then returned instantaneously to the user.

    Purpose[edit]

    The main function of the XMLHttpRequest object is that it provides an easy way for webpages to receive updated information from servers without having to refresh the whole webpage. As a result, the webserver’s processing load is reduced and the user receives information faster without seeing any interruptions in service.


    Future Application[edit]

    The XMLHttpRequest object has many improvements over the existing data exchange methods. Many developers still rely on Common Gateway Interchange (CGI) for data exchange. Since CGI has no adequate restriction about the format of data, XML usage is relatively pointless from a data exchange standpoint. Utilizing the inherent abilities of XMLHttp will remove the inadequacies originating from the widespread use of CGI. The XMLHttpRequest object provides a more adequate approach for real-time content delivery than existing development methods.


    Tutorials[edit]

    http://developer.apple.com/library/safari/#documentation/appleapplications/Conceptual/SafariJSProgTopics/Articles/XHR.html


    Previous Chapter Next Chapter
    XMLHTTP SyncML



    Learning objectives
    • Learn about Native XML Databases
    • Learn about the conversion technology available
    • Create Table and retrieve information


    Native XML Database[edit]

    The term Native XML database: has become popular since 1999, after the company Software AG released the first version of its native XML server Tamino, which included a native XML database. A definition of a native databases is that it:

    "[d]efines a (logical) model for an XML document and stores and retrieves documents according to that model." (Bourret, 2002)"

    To model data in XML, two principle approaches are used: Data-centric documents and Document-centric documents.

    • Data-centric documents (for data transport) have fairly regular structure, order typically does not matter, and little or no mixed content.
    • Document-centric documents (usually for human consumption) have less regular or irregular structure, significant order of the elements, and lots of mixed content.


    Examples of Native databases


    Product Developer License DB Type
    Tamino Software AG Commercial Proprietary. Relational through ODBC.
    XediX Multimedia Solution XediX Tera Solution Commercial Proprietary
    eXist Wolfgang Meier Open Source Relational
    dbXML dbXML Group Open Source Proprietary
    Xindice Apache Software Foundation Open Source Proprietary (Model-based)


    eXist[edit]

    eXist is an Open Source effort to develop a native XML database system, tightly integrated with existing XML development tools like Apache's Cocoon. The database may be easily deployed, running either standalone, inside a servlet engine, or directly embedded in an application.

    Some features that are available in eXist and that can be found in most Native XML databases are :


    • Schema-less storage - Documents do not have to be associated to schema or document type, meaning they are allowed to be well formed only.
    • Collections - A collection plays a similar role to a directory in a file system. When submitting a query the user can choose a distinct part of the collection hierarchy or even all the documents contained in the database.
    • Query languages - The most popular query languages supported by Native XML databases are XPath (with extensions for queries over multiple documents) and XQuery.

    Relational Databases[edit]

    Database vendors such as IBM, Microsoft, Oracle, and Sybase have developed tools to assist in converting XML documents into relational tables.

    Let us look at IBM and Oracle:


    IBM Technology[edit]

    DB2 XML Extender provides access, storage and transformation for XML data through user-defined functions and stored procedure. It offers 2 key storage models: XML Colums and XML Collections.


    1. XML Column: stores and retrieves entire XML documents as DB2 column data. Use of XML Columns is recommended when XML documents already exist and/or when there is a need to store XML documents in their entity.

    2. XML Collection: composes XML Documents from a collection of relational tables.


    A data access definition (DAD) file is used for both XML Column and XML Collection approaches to define the "mapping" between the database tables and the structure of the XML document.

    <Xcollection> Specifies that the XML data is either to be decomposed from XML documents into a collection of relational tables, or to be composed into XML documents from a collection of relational tables.

    The DAD file defines the XML document tree structure, using the following kinds of nodes:

    • root_node - Specifies the root element of the document.
    • element_node - Identifies an element, which can be the root element or a child element.
    • text_node - Represents the CDATA text of an element.
    • attribute_node - Represents an attribute of an element.


    <?xml version="1.0"?> 
    <!DOCTYPE DAD SYSTEM ""c:\dxx\samples\db2xml\dtd\dad.dtd"> 
    <DAD> 
      ...
    <Xcollection> 
    <SQL_stmt> 
           ...
    </SQL_stmt> 
    <prolog>?xml version="1.0"?</prolog> 
    <doctype>!DOCTYPE Order SYSTEM
                      ""c:\dxx\samples\db2xml\dtd\getstart.dtd""</doctype>
    <root_node> 
     <element_node name="Order">      --> Identifies the element <Order>
      <attribute_node name="key">     --> Identifies the attribute "key" 
       <column name="order_key"/>     --> Defines the name of the column, 
                                          "order_key", to which the
                                          element and attribute are
                                          mapped
      </attribute_node> 
      <element_node name="Customer">  --> Identifies a child element of 
                                          <Order> as <Customer>
       <text_node>                    --> Specifies the CDATA text for
                                          the element <Customer>
        <column name="customer">      --> Defines the name of the column,
                                          "customer", to which the child
                                          element is mapped
       </text_node> 
      </element_node> 
            ...
     </element_node>
     
          ...
    </root_node> 
    </Xcollection>
    </DAD>
    



    Oracle[edit]

    Oracle's XML SQL Utility (XSU) uses a schematic mapping that defines how to map tables and views, including object-relational features, to XML documents. Oracle translates the chain of object references from the database into the hierarchical structure of XML elements.


    CREATE TABLE Customers 
    {
         FIRSTNAME      VARCHAR,
         LASTNAME       VARCHAR,
         PHONENO      INT,
         ADDRESS        AddressType, // object reference
     
    }
     
    CREATE TYPE AddressType AS OBJECT
    {  
         ZIP       VARCHAR (100),
         CITY      VARCHAR (100),
         STREET    VARCHAR (100),
    }
    


    A corresponding XML document generated from the given object-relational model looks like:

    <?xml version="1.0"?>
    <ROWSET>
         <ROW num="1"> 
     
            <FIRSTNAME>JOHN</FIRSTNAME>
            <LASTNAME>SMITH</LASTNAME>
            <PHONENO>7061234567</PHONENO>
     
            <ADDRESS>
       <ZIP>30601</ZIP>
       <CITY>ATHENS</CITY>
       <STREET>123 MAIN STREEET</STREET>
            </ADDRESS>
     
        </ROW>
     
        <!-- additional rows ... -->
     
     </ROWSET>
    


    XSU can be used for executing queries in a Java environment and retrieve XML from the database.

    import oracle.jdbc.driver.*;
    import oracle.xml.sql.query.OracleXMLQuery;
    import java.lang.*;
    import java.sql.*;
     
    // class to test XML document generation as String
    class testXMLSQL {
     
       public static void main(String[] args)
       {
         try {
          // Create the connection
          Connection conn  = getConnection("root","");
     
          // Create the query class
          OracleXMLQuery qry = new OracleXMLQuery(conn,
             "SELECT  * FROM Customers");
     
          // Get the XML string
          String str = qry.getXMLString();
     
          // Print the XML output
          System.out.println("The XML output is:\n"+str);
     
          // Always close the query to get rid of any resources..
          qry.close();
         } catch(SQLException e) {
          System.out.println(e.toString());
         }
       }
     
       // Get the connection given the user name and password.!
       private static Connection getConnection(String username,
            String password)
            throws SQLException
       {
          // register the JDBC driver..
           DriverManager.registerDriver(new 
              oracle.jdbc.driver.OracleDriver());
     
          // Create the connection using the OCI8 driver
           Connection conn =
            DriverManager.getConnection(
               "jdbc:oracle:thin:@dlsun489:1521:ORCL",username,password);
     
          return conn;
       }
    }
    

    Query Languages[edit]

    XPath[edit]

    XPath is a language for addressing parts of an XML document, and is the common locator used by both XSLT and XPointer. An XPath expression is a series of location steps separated by " / ". Each step selects a set of nodes that become the current node(s) for the next step. The set of nodes selected by the expression are the nodes remaining after processing each step in order.


    XQuery[edit]

    XQuery is a query language under development by the World Wide Web Consortium (W3C). The ambitious task is to develop the first world standard for querying Web documents. XQuery is a versatile markup language, capable of labeling the information content of diverse data sources including structured and semi-structured documents, relational databases, and object repositories.

    MySQL 5.1[edit]

    MySQL has a command line utility for executing queries against a MySQL database; it has an option for using XML as their output format. MySQL also allows convertion to XML; more information can be found in Converting MySQL to XML MySQL allows users to execute any SQL query. mysqldump allows users to specify which tables to dump and to specify a where clause to restrict the rows that are dumped. In its Beta release of MySQL 5.1, several features have been added including new XML functions.

    In order to understand these New functions, we will use the following table:

    CREATE TABLE Customers (doc VARCHAR(150));
     
    INSERT INTO Customers VALUES
    ('
    <person id="1">
          <firstname>John</firstname>
          <lastname>Smith</lastname>
          <phoneno>123-5678</phoneno>
    </person>
    ');
     
    INSERT INTO Customers VALUES
    ('
    <person id="2">
          <firstname>Aminata</firstname>
          <lastname>Cisse</lastname>
          <phoneno>123-5679</phoneno>
    </person>
    ');
     
    INSERT INTO Customers VALUES
    ('
    <person id="3">
          <firstname>Lamine</firstname>
          <lastname>Smith</lastname>
          <phoneno>123-5680</phoneno>
    </person>
    ');
    


    XML Functions[edit]

    MySQL version 5.1 has functions for searching and changing XML documents: ExtractValue() and UpdateXML().


    • EXTRACTVALUE (XML_document, XPath_string);

    This function takes 2 string arguments: The first parameter correspond to the XML_document string, and the 2nd Parameter XPath_string (XPath expression / locator). This will result in the return of the string containing a value from the document.

    mysql> SELECT EXTRACTVALUE(doc,'//firstname') FROM Customers;
    
    +------------------------------------------+
    | EXTRACTVALUE(doc,'//firstname')          |
    +------------------------------------------+
    | John                                     | 
    | Aminata                                  | 
    | Lamine                                   | 
    +------------------------------------------+
    3 rows in set (0.01 sec)
    


    mysql> SELECT ExtractValue(doc,'/person[@id="3"]/firstname') as fname FROM Customers;
    
    +---------+
    | fname   |
    +---------+
    |         | 
    |         | 
    | Lamine  | 
    +---------+
    3 rows in set (0.02 sec)
    



    • UPDATEXML (XML_document, XPath_string, new_value);

    This function takes 3 string arguments: The first two paramaters are similar to the ones used with extractValue(), XML_document and XPath_string. The third parameter is the new value that will replace the one found. This function will then returns the changed XML.

    mysql> SELECT UpdateXML(doc,'/person[@id="3"]/phoneno', '<phoneno>111-2233<phoneno>') FROM Customers;
    
    
    +-------------------------------------------------------------------------------
    ----------------------------------------------------+
    | UpdateXML(doc,'/person[@id="3"]/phoneno','<phoneno>111-2233<phoneno>')
                                                        |
    +-------------------------------------------------------------------------------
    ----------------------------------------------------+
    |
    <person id="1">
          <firstname>John</firstname>
          <lastname>Smith</lastname>
          <phoneno>123-5678</phoneno>
    </person>
             |
    |
    <person id="2">
          <firstname>Aminata</firstname>
          <lastname>Cisse</lastname>
          <phoneno>123-5679</phoneno>
    </person>
          |
    |
    <person id="3">
          <firstname>Lamine</firstname>
          <lastname>Smith</lastname>
          <phoneno>111-2233<phoneno>
    </person>
     |
    +-------------------------------------------------------------------------------
    ----------------------------------------------------+
    3 rows in set (0.00 sec)
    



    Installation[edit]

    Currently (04/05/06) MySQL 5.1 does not come with the installer (Beta Version).


    Quick Windows installation.

    Details information can be found in the online Manual:

    Summary[edit]

    Exercises[edit]

    Answers[edit]




    Previous Chapter Next Chapter
    Database and XML SVG



    Learning objectives[edit]

    Upon completion of this chapter, you will be able to

    • Understand SyncML fundamentals and general syntax.
    • Understand how and why SyncML is implemented.
    • Quickly locate and use SyncML technical specifications.

    Introduction[edit]

    Mobile devices such as PDAs, pagers, mobile phones and laptops are- by nature- not always connected to a network. Yet these devices contain applications which require information obtained from a network in order to be useful. While most PDAs and mobile phones contain applications such as calendars, task lists, and address books for storing useful information, this information is far less useful when it is static, only available on the device itself. For example, copies of static information will always be dissimilar when changes are made on one copy or the other. Synchronization offers a device the ability to connect to a network in order to update either the information on the device or the information on the network, such that both sets of information are identical and up-to-date.

    Given the proliferation of proprietary mobile devices and protocols, as well as the increasing consumer demand for ubiquitous mobile access of information, leading technology companies saw the need to create a standard, universal language for describing the synchronization actions between devices and applications. They formed a consortium to sponsor the SyncML initiative to create this language.

    Currently, the SyncML consortium has been adopted and incorporated into the Open Mobile Alliance, a larger group of over 300 companies which sponsors many collaborative technology projects and protocols.

    What is SyncML?[edit]

    SyncML or Synchronization Markup Language is an XML-based, industry-standard protocol for synchronizing mobile data across a variety of multiple networks, platforms and devices. SyncML started as an initiative in mid 2000 by major technology companies such as Ericsson, IBM, Palm Inc., Lotus, Matsushita Ltd. (Panasonic), Motorola, Nokia, Openwave, Starfish Software, Psion and Symbian. Their initiative's goals were to create a universal language from the myriad, proprietary, synchronization protocols used by mobile devices and provide a complete set of synchronization functionality for future devices. The consortium released version 1.0 in December 2000. They then implemented new features and resolved issues with the subsequent version releases, finalizing the protocol with version 1.1 in February 2002.

    The SyncML protocol is designed with these goals in mind:

    • As a common language, any device should be able to synchronize with any SyncML service (a networked data repository).
    • Any service speaking SyncML should be able to synchronize with any SyncML-capable device.
    • The protocol must address the limitations of mobile devices, specifically with respect to memory storage.
    • It must support a variety of transport protocols such as HTTP, SMTP, Bluetooth and others.
    • It must deliver common synchronization commands to all devices.
    • It builds upon existing web technologies, specifically XML.
    • Support asynchronous communication and error-handling, since the Internet has latency.

    SyncML consists of client and server commands enclosed within DTD-defined...

    SyncML Fundamentals[edit]

    Vocabulary[edit]

    Let's begin by defining a vocabulary:

    • Client - the mobile device, its application and local database.
    • Server - a remote system communicating to the system database or application.
    • Modifications - data in fields in a database are changed.
    • Sync - The client and server exchange SyncML messages with commands.
    • Package - SyncML DTD conformant XML markup describing requests or actions to be taken by either a SyncML client or server. A package is a collection of actions to be performed
    • Message - the smallest unit of SyncML markup. Large packages are broken into separate messages.
    • Mapping - using an intermediate identifier to tie two pieces of information together. example: let's say 'green' is '5', and '5' is nice. What is nice? If you said 'green' you are correct. You've just done mapping!

    Abbreviations:

    IMEI International Mobile Equipment Identifier
    GUID Global Unique Identifier
    LUID Local Unique Identifier

    Messages and Packages[edit]

    SyncML messages are requests from either a client or server to perform some action. The action may be to synchronize data, perform some checks on data, update a status, or handle any errors with these actions. Messages are bundled together as packages, as kind of a to-do list. Messages are a laundry list of requests, and they can be pieced together out of order if sufficient mapping information is given to identify to which package the message belongs.

    SyncML is designed this way to accommodate for errors and dropped messages. Should one message be dropped, a syncML client or server will know there is a problem because the mapping cannot be completed. It will then issue a request for the information to be resent. Once the data is received, the updates to the information can proceed.

    Structure of a SyncML message[edit]

    Like SOAP, there are two parts to the SyncML message, a Sync Header <SyncHdr> and Sync Body <SyncBody>. The header contains meta-information about the request, such as the target database <Target> and source database <Source> URIs, Authentication information <Cred>, the session ID <SessionID>, the message ID <MsgID>, and SyncML version declaration <VerDTD>. The body contains the actual requests, alerts and data.

    Addressing[edit]

    Addressing is done through the <source> and <LocURI> tags. A server will have a familiar URI like http://www.chris.syncml.org/sync and a client mobile device will have an IMEI identification number like this 30400495959596904.

    Mapping[edit]

    SyncML is based on the idea that clients and servers can have their own way of mapping information in their databases. Therefore, clients and servers must each have their own set of unique identifiers.

    • Locally Unique Identifiers (LUID) are numbers assigned by the client to a data object in a local database (like a field or a row). They are non-reusable numbers assigned to these objects by the SyncML client.
    • Globally Unique Identifiers (GUID) are numbers assigned to a data object for use in a remote database. This identifier is assigned by the server.

    LUID and GUID numbers only have to be unique if they are being used in a table between two communicating parties. In other words, these numbers are temporary, used for mapping data to tables and only really exist for the complete duration of transactions between client and server.

    The server will create a mapping table to tie the LUID and GUID together.

    Client-side data

    LUID
    ----
    5
    
    
    Data
    ----
    Green
    

    Server-side data

    GUID
    ----
    5050505
    
    
    Data
    ----
    Green
    


    Server Mapping

    GUID
    ----
    5050505
    
    
    LUID
    ----
    5
    

    Change Logs[edit]

    The Server and Client track of changes made to their databases during synchronization through "change logs". SyncML doesn't define the change logs, instead SyncML does require that the changes and corrections be negotiated between client and server through messages. Using change logs, the Client and Server know which fields need to be updated. The implementation of change tracking in the application which will use SyncML is not defined.

    Sync Anchors[edit]

    During Synchronization, the Client and Server need to know which fields to update. If a client/server application is checking the fields prior to updating/modifying them, how then does the client/server keep track of the position of current field in the database? The answer is "by using Sync Anchors".

    There are two kinds of Anchors : Last and Next. The 'Last' anchor describes which updates occurred during the last synchronization event. The 'Next' anchor describes the current and future synchronization request. These anchors describe the events from the standpoint of the sending device.

    Anchors are sent back and forth from client and server to keep track of what is happening to the database fields and what's going on in overall through the lifetime of the sync operation.

    By coordinating Sync Anchors and change logs with the type of Sync that is requested, the server application can determine and track (with change logs) which information is the most up-to-date. For example, it is possible to overwrite 'newer' information- that is information for which there is the most recent time-stamp in the change log- with older information. This could be done by choosing a sync in which the client tells the server to overwrite it's information with client data. This is called a 'refresh sync from client'. The types of syncs are described below.

    Syncs[edit]

    There are seven types of Syncs in the SyncML 1.1 language. The following section describes the types of syncs:

    1. Two-way Sync - The client and server exchange information about modified data. The client sends the modifications first.
    2. Slow sync - a two-way sync in which all fields in the database are checked on a field-to-field basis. This type of sync is used for the first sync, or after a synchronization failure.
    3. One-way sync, client only - the client sends the modified data first. The server accepts and updates the data and does not send its modifications.
    4. Refresh sync from client - the client sends the entire database to the server. The server does not sync. Rather, the server replaces the target database with the client's database.
    5. One-way sync, server only - the server sends the modified data first. The client accepts and updates the data and does not send its modifications.
    6. Refresh sync from server - the server sends all its information from a database to the client, replacing the client's database.
    7. Server alerted sync - the server remotely commands the client to initiate one of the above sync types with the server. In this way, the server is remotely-controlling the client.

    Sync Initiation[edit]

    Sync Initiation is the process the client and server must go through prior to an actual Synchronization. The first step is for the client and server to speak the same language, exchanging and revealing each other's capabilities (as defined by device, as in amount of memory, and protocol as defined by DTD). The second step is identification of the databases to be synchronized. Next the two must decide on the type of synchronization. The third and final step is authentication. Once this step is completed successfully, the synchronization activities can begin.


    Authentication[edit]

    The SyncML server can send the client a message containing the <Chal> tag in order to represent an authentication challenge to the information the client is attempting to access. The client must then respond, giving the username and password within the <Cred> tag.

    SyncML uses MD5 digest access authentication. The Client and Server exchange credentials during the authentication process, returning error codes if the process breaks down at some point. The <Cred> tag is used in the <SyncHdr> for holding the credentials to be used for authentication.

    Common SyncML implementations[edit]

    Nokia was the first company to make a SyncML-enabled phone. It synchronized the calendar database on the phone. SyncML can synchronize to-do lists, calendars, address books, phone-books, pretty much anything an organizer can do. SyncML is capable of much more. It would be appropriate to use SyncML any time there are two disparate, remote applications which need to share the same data.

    SyncML Syntax[edit]

    SyncML Example[edit]

    Abbreviated SyncML example

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    
    <SyncML>
    <SyncHdr>
    <VerDTD>1.1</VerDTD>
    <VerProto>SyncML/1.1</VerProto>
    <SessionID>104050403</SessionID>
    <MsgID>5</MsgID>
    <Cred>...</Cred>
    </SyncHdr>
    <SyncBody>
    <Status>...</Status>
    <Sync>
    <Target>target database URI</Target>
    <Source>source database URI</Source>
    <Add>datafield and data</Add>
    <Replace>an existing data field with some data</Replace>
    </Sync>
    </SyncBody>
    </SyncML>
    

    Notice lines {1} and {18} start the SyncML file with the root tags. Next, the SyncHdr is defined by lines {2} and {8}. Further, lines {3,4} define the versioning information, line {5} defines the sessionID to distinguish which unique dialogue is occurring between client and server applications, line {6} shows the MsgID to uniquely identify this set of requests (this entire markup) to be performed by the requested application. Also in the syncHeader are credentials, on line {7}.

    The SyncBody begins on line {9}. In this part of the syncML message, device/application status {10}, target/source URIs {12,13}, and requested actions such as the sync itself between lines {11,16}, Add and Replace {14,15} commands are given.

    WBXML and SyncML[edit]

    WAP Binary XML (WBXML) is a form of XML whereby the XML tags are abbreviated in order to shorten the markup for transmission to mobile devices, which commonly have bandwidth and memory limitations. The XML tags are encoded into a binary shorthand to save space. Let's take a look at an example so that this will make more sense.

    The following is WBXML binary code depicting a SyncML message. Notice in the first line there is a the document type definition, represented here in hexadecimal tokens. Can you see what happens to the following string? "//SYNCML//DTD SYNCML 1.1//EN"

    wbxml

    Immediately following this string are the characters '6D 6C 71'. Each of these represent a SyncML tag.

    wbxml abbreviations

    6D
    6C
    71
    
    = "<SyncML>"
    = "<SyncHdr>"
    = "<VerDTD>"
    


    wbxml abbreviations (cont.)

    C3
    03
    "1" "." "1"
    01
    
     = represents the beginning of opaque (xml) data
    = this represents the length of this opaque data
    = The characters "1" followed by "." and "1"
    = represents "</VerDTD>"
    


    tells the SyncML processor that this is the beginning of opaque (xml) data this represents the length of this opaque data The characters "1" followed by "." and "1" represents "</VerDTD>"


    All together this WBXML code snippet, 6D6C71C303"1.1"01 represents:

    SyncML header snippet

    1
    2
    3
    
    <SyncML>
    <SyncHdr>
    <VerDTD>1.1</VerDTD>
    

    So you can see how using WBXML shorthand would be a more compact means of representing XML, saving bandwidth for mobile devices.

    For more information please refer to Ed Dumbill's articles on syncML with WBXML:

    SyncML specifications[edit]

    The best source of information on SyncML is the protocol itself. Visit the Open Mobile Alliance for the SyncML specifications.

    Open Mobile Alliance[edit]

    Download OMA SyncML Specifications and white papers at the Open Mobile Alliance. Or check out the SyncML Articles at the Open Mobile Alliance.

    SyncML Implementations[edit]

    Although the SyncML specifications are useful, you still have to implement the protocol in your application. There are a few toolkits and implementations out there that you can use to get a head start.

    SyncML Reference Toolkit[edit]

    The Open Mobile Alliance has released a toolkit written in C to demonstrate SyncML. You can get it here. If you can read German, you can get a sample application using the toolkit here.

    Funambol[edit]

    Interested in developing SyncML for Java? Check out the open source project Funambol. It offers a Java and C++ SDK that implements the SyncML data synchronization protocol, a Java-based application framework for building SyncML server applications, and a standalone SyncML server.

    Summary[edit]

    Mobile Device Technology is improving and changing at a rapid pace. As US telecommunication companies implement Third generation (3G) WCDMA technology (wide-band code-division multiple access), or wireless broadband, we will begin to see powerful devices emerge on the market. These devices will be able to deliver full color, video, streaming multimedia and a variety of data services such as Multimedia Messaging Service (MMS) through WAP. In that infrastructure is becoming cheaper, these telecommunication companies are starting to shift towards being service providers and media vendors as opposed to communications utilities. Cingular wireless, multimedia messaging and ringtones services are a good example of the shift of their company towards being a media platform. The companies that will survive will be the ones that listen to customers needs and make easy-to-use services.

    Telecommunications companies can add value to their services by creating custom applications and services that use SyncML for synchronization.


    Exercises[edit]

    1. Visit the Open Mobile Alliance Website, download the pdf of the SyncML v. 1.1 protocol and review it. Reading this reference is a valuable exercise in learning.
    2. Answer these questions:
      • What is WBXML and why is it used?
      • How do you foresee SyncML being used in the future?
      • Name a problematic situation whereby SyncML is the best 'tool' for the job.

    Answers: 2a) WBXML is Wap Binary XML, it is a form of XML whereby the XML tags are abbreviated in order to shorten the markup for transmission to mobile devices, which commonly have bandwidth and memory limitations. 2b) SyncML will likely be used as a general, standard syncing mechanism for synchronizing data sets between systems, not just for mobile devices. 2c) A ticket-tracking system called TNT helpdesk is a web-based open work request management system. The staff running this system would like to have live data from this system on their PDAs, listing open requests. Currently, the PDA database is synced via a docking sync station attached to the staff members' PCs. Staff members have to download the request list as a CSV file, convert it into a usable PDA database and upload it to the PDA, making it this process cumbersome, prone to error, and always out-of-date. Recommendation: Create a custom app to push live updates to the PDAs using SyncML over Bluetooth/Wireless

    References[edit]

    Dumbill, E.(2002, January 1). XML Watch: Have data, will travel. IBM.com. Retrieved April 6, 2004 from
    http://www-106.ibm.com/developerworks/xml/library/x-synchml/index.html
    Dumbill, E.(2003, March 1). XML Watch: WBXML and basic SyncML server requirements. IBM.com. Retrieved April 6, 2004 from
    http://www-106.ibm.com/developerworks/xml/library/x-syncml2.html
    How [SyncML] works(n.a). Nokia.com. Retrieved April 6, 2004 from
    http://www.nokia.com/nokia/0,8764,2559,00.html
    The New SyncML Standard. . Cellular Dot Co Dot Za Website. Retrieved April 6, 2004 from
    http://www.cellular.co.za/syncml.htm
    Open Mobile Alliance (2002, April 2). SyncML version 1.0, 1.1 specification, white paper, errata. Retrieved April 6, 2004 from
    http://www.openmobilealliance.org/tech/affiliates/syncml/syncmlindex.html
    Pabla, C(2002, April 1). SyncML Intensive: A beginner's look at the SyncML protocol and procedures. IBM.com. Retrieved April 6, 2004 from
    http://www-106.ibm.com/developerworks/xml/library/wi-syncml2/
    SyncML Initiative, Ltd.(2000, December 7). SyncML Specification Protocol version 1.0. The Open Mobile Alliance. Retrieved April 6, 2004 from
    http://www.openmobilealliance.org/tech/affiliates/syncml/syncml_represent_v10_20001207.pdf
    SyncML Initiative, Ltd.(2002, February 15). SyncML Device Information DTD version 1.1. . Retrieved April 6, 2004 from
    http://www.openmobilealliance.org/tech/affiliates/syncml/syncml_devinf_v11_20020215.pdf
    Saarilahti, A, Group SyncML, et al.(2001, April 23). Tik-76.115 Short introduction to SyncML. . Retrieved April 6, 2004 from
    http://www.hut.fi/u/asaarila/syncml/syncml_intro.html
    Stemberger, S.(2002, October). Syncing Data: An introduction to SyncML. IBM.com. Retrieved April 6, 2004 from
    http://www-106.ibm.com/developerworks/wireless/library/wi-syncml/
    Synchronica Software GmbH(n.d.). SyncML for Microsoft Exchange. Synchronica Software Website. Retrieved May 24, 2004 from
    http://www.synchronica.com/products/syncml/corporate_syncml.html
    Weblicon Technologies AG (n.d.). SyncML for SunOne. Weblicon Technologies AG Website. Retrieved April 6, 2004 from
    http://www.weblicon.net/html/products_syncml.html
    XML Cover Pages (n.a., 2003, April 29). The SyncML Initiative. XML Cover Pages Dot Org Website. Retrieved April 6, 2004 from
    http://xml.coverpages.org/syncML.html



    Previous Chapter Next Chapter
    SyncML VoiceXML



    Learning objectives
    • define SVG and its purpose
    • discuss differences between raster graphics and SVG
    • define similarities and differences between Flash and SVG
    • create a simple SVG document
    Initiated by:

    The University of Georgia

    Terry College of Business

    Department of Management Information Systems

    What is SVG?[edit]

    Based on XML, Scalable Vector Graphics (SVG) is an open-standard vector graphics file format and Web development language created by the W3C, and has been designed to be compatible with other W3C standards such as DOM, CSS, XML, XSLT, XSL, SMIL, HTML, and XHTML. SVG enables the creation of dynamically generated, high-quality graphics from real-time data. SVG allows you to design high-resolution graphics that can include elements such as gradients, embedded fonts, transparency, animation, and filter effects.

    SVG files are different from raster or bitmap formats, such as GIF and JPEG that have to include every pixel needed to display a graphic. Because of this, GIF and JPEG files tend to be bulky, limited to a single resolution, and consume large amounts of bandwidth. SVG files are significantly smaller than their raster counterparts. Additionally, the use of vectors means SVG graphics retain their resolution at any zoom level. SVG allows you to scale your graphics, use any font, and print your designs, all without compromising resolution. SVG is XML-based and written in plain text, meaning SVG code can be edited with any text editor. Additionally, SVG offers important advantages over bitmap or raster formats such as:

    • Zooming: Users can magnify their view of an image without negatively affecting the resolution.
    • Text stays text: Text remains editable and searchable. Additionally, any font may be used.
    • Small file size: SVG files are typically smaller than other Web-graphic formats and can be downloaded more quickly.
    • Display independence: SVG images always appear crisp on your screen, no matter the resolution. You will never experience “pixelated” images.
    • Superior color control: SVG offers a palette of 16 million colors.
    • Interactivity and intelligence: Since SVG is XML-based, it offers dynamic interactivity that can respond to user actions.

    Data-driven graphics[edit]

    Because it is written in XML, SVG content can be linked to back-end business processes, databases, and other sources of information. SVG documents use existing standards such as Cascading Stylesheets (CSS) and Extensible Stylesheet Language (XSL), enabling graphics to be easily customized. This results in:

    • Reduced maintenance costs: Because SVG allows image attributes to be changed dynamically, it eliminates the need for numerous image files. SVG allows you to specify rollover states and behaviors via scriptable attributes. Complex navigation buttons, for example, can be created using only one SVG file where normally this would require multiple raster files.
    • Reduced development time: SVG separates the three elements of traditional Web workflow – content (data), presentation (graphics), and application logic (scripting). With raster files, entire graphics must be completely recreated if changes are made to content.
    • Scalable server solutions: Both the client and the server can render SVG graphics. Because the “client” can be utilized to render the graphic, SVG can reduce server loads. Client-side rendering can enhance the user-experience by allowing users to “zoom in” on an SVG graphic. Additionally, the server can be used to render the graphic if the client has limited processing resources, such as a PDA or cell phone. Either way the file is rendered, the source content is the same.
    • Easily updated: SVG separates design from content, allowing easy updates to either.

    Interactive graphics[edit]

    SVG allows you to create Web-based applications, tools, or user interfaces. Additionally, you can incorporate scripting and programming languages such as JavaScript, Java, and Visual Basic. Any SVG element can be used to modify or control any other SVG or HTML element. Because SVG is text based, the text inside graphics can be translated for other languages quickly, which simplifies localization efforts. Additionally, if there is a connection to a database, SVG allows drill-down functionality for charts and graphs. This results in:

    • Improved end user experience: Users can input their own data, modify data, or even generate new graphics from two or more data sources.
    • In SVG, text is text: As mentioned previously, SVG treats text as text. This makes SVG-based graphics searchable by search engines.
    • SVG can create SVG: Enterprise applications such as an online help feature can be developed.

    Personalized graphics[edit]

    SVG can be targeted to people to overcome issues of culture, accessibility, and aesthetics, and can be customized for many audiences and demographic groups. SVG can also be dynamically generated using information gathered from databases or user interaction. The overall goal is to have one source file, which transforms seamlessly in a wide variety of situations. This results in:

    • One source, customized appearances: SVG makes it possible to change color and other properties based on aesthetics, culture, and accessibility issues. SVG can use stylesheets to customize its appearance for different situations.
    • Internationalization, localization: SVG supports Unicode characters in order to effectively display text in many languages and fashions – vertically, horizontally, and bi-directionally.
    • Utilizing existing standards: SVG works seamlessly with stylesheets in order to control presentation. Cascading Stylesheets (CSS) can be used for typical font characteristics as well as for other SVG graphic elements. For example, you can control the stroke color, fill color, and fill opacity of an element from an external stylesheet.

    SVG vs. Macromedia Flash[edit]

    Macromedia has been the dominant force behind vector-based graphics on the web for the last 10 years. It is apparent, however, that SVG provides alternatives to many of the functions of Flash and incorporates many others. The creation of vector-based graphical elements is the base structure of both SVG and Flash. Much like Flash, SVG also includes the ability to create time-based animations for each element and allows scripting of elements via DOM, JavaScript, or any other scripting language that the SVG viewer supports. Many basic elements are available to the developer, including elements for creating circles, rectangles, lines, ellipses, polygons, and text. Much like HTML, elements are styled with Cascading Stylesheets (CSS2) using a style element or directly on a particular graphical element via the style attribute. Styling properties may also be specified with presentation attributes. For each CSS property applicable to an element, an XML attribute specifying the same styling property can also be used. There is an on going debate about whether Flash or SVG is better for web development There are advantages to both, it usually comes down to the situation.

    Flash Advantages:

    • Use Flash if you want to make a Flash-like website – replicating the same effect using SVG is hard.
    • Use Flash if you want complex animations, or complex games (SVG's built in SMIL animation engine is extremely processor intensive).
    • Use Flash if your users will not be so computer literate, for instance a children's site, or a site appealing to a wide audience.
    • Use Flash if sound is important – SVG/SMIL supports sound, but it's pretty basic.
    • Use Flash if you prefer WYSIWYG to script.

    SVG advantages:

    • It's fully scriptable, using a DOM1 interface and JavaScript. That means you can start with an empty SVG image, and build it up using JavaScript.
    • SVG can easily be created by ASP, PHP, Perl, etc and extracted from a database.
    • It has a built-in ECMA-script (JavaScript) engine, so you don't have to code per browser, and you don't need to learn Flash's action-script.
    • SVG is XML, meaning it can be read by anything that can read XML . Flash can use XML, but needs to convert it before use.
    • This also allows SVG to be transformed through an XSLT stylesheet/parser.
    • SVG supports standard CSS1 stylesheets.
    • Text used in SVG remains selectable and searchable.
    • You only need a text editor to create SVG, as opposed to buying Flash.
    • SVG is an web real standard (not just “de facto”), supported by various different programs, some of which are free software (and thus available for most free computer operating systems).

    Why use SVG?[edit]

    SVG is emerging through the efforts of the W3C and its members. It is open source and as such does not require the use of proprietary languages and development tools as does Macromedia Flash. Because it is XML-based, it looks familiar to developers and allows them to use existing skills. SVG is text based and can be learned by leveraging the work (or code) of others, which significantly reduces the overall learning curve. Additionally, because SVG can incorporate JavaScript, DOM, and other technologies, developers familiar with these languages can create graphics in much the same way. SVG is also highly compatible because it works with HTML, GIF, JPEG, PNG, SMIL, ASP, JSP, and JavaScript. Finally, graphics created in SVG are scalable and do not result in loss of quality across platforms and devices. SVG can therefore be used for the Web, in print, as well as on portable devices while retaining full quality.

    SVG Viewer[edit]

    The Adobe SVG Viewer[edit]

    The Adobe SVG Viewer is available as a downloadable plug–in that allows SVG to be viewed on Windows, Linux and Mac operating systems in all major browsers including Internet Explorer (versions 4.x, 5.x, 6.x), Netscape (versions 4.x, 6.x), and Opera in Internet Explorer and Netscape.

    The Adobe SVG Viewer is the most widely deployed SVG Viewer and it supports almost all of the SVG Specification including support for the SVG DOM, animation and scripting.

    Features of the Adobe SVG Viewer Click the right mouse button (CTRL-Key + mouse click in Mac) over your SVG image to get a context menu. The context menu gives you several options, which can all be accessed utilizing the menu itself or “hotkeys”:

    Table 1: Features of the Adobe SVG Viewer

    Function Description
    Zoom In

    Using the CTRL-Key (or Apple-Key) you can drag your mouse to make a rectangle that specifies the cross-section of the area you will zoom to.

    Zoom Out

    This work just like “Zoom In” except you press the CTRL-Key and the SHIFT-Key at the same time.

    Panning

    Pressing the ALT-Key and move the mouse cursor while a hand-icon appears.

    Copy SVG

    The purpose of the SVG Viewers “Copy SVG” options is for users to be able to cut-and-paste graphics and/or source code into other applications. Using “Copy SVG” developers are able to make a copy of the source code, which can be pasted into any text editor. Also, after selecting “Copy SVG” and switching to a desktop application such as MS Office users are able to choose either to use the Edit/Paste option to produce a snapshot of the SVGs DOM-tree code (this contains the current structure of the dynamic SVG image) or users can use the Edit/Paste Special option to translate the SVG into a Bitmap image. These options are likely to improve and increase as support for SVG improves in other applications.

    View Source

    The SVG Viewers “View Source” menu options allow both compressed and uncompressed SVG source code to instantly be viewed as text in a new browser window. This is a very handy option for designers and developers.

    Save SVG as…

    This option allows for quickly saving of SVG content to your local computer by popping up a “save SVG as” form that gives you the option to input the name and location of the file. In version 3 of the Adobe SVG Viewer the option of Saving as GZip compressed SVG (.svgz) was added to the 'save as’ dialog box.

    SMIL[edit]

    The Synchronized Multimedia Integration Language (SMIL, pronounced “smile”) enables simple authoring of interactive audiovisual presentations. SMIL is typically used for “rich media”/multimedia presentations which integrate streaming audio and video with images, text or any other media type. SMIL is an easy-to-learn HTML-like language, and many SMIL presentations are written using a simple text-editor. SMIL can be used with XML to enable video and sound when viewing a SVG.

    Attention Microsoft Windows Mozilla users![edit]

    The Seamonkey and Mozilla Firefox browsers have SVG support enabled natively. If desired, the Adobe SVG Viewer plugin will work with Mozilla Firefox, or the Seamonkey browser. [4] Webkit based browsers also have some SVG support natively.

    Native SVG (Firefox)[edit]

    The Mozilla SVG implementation is a native SVG implementation. This is as opposed to plug-in SVG viewers such as the Adobe viewer (which is currently the most popular SVG viewer).

    Some of the implications of this are:

    • Mozilla can handle documents that contain SVG, MathML, XHTML, XUL, etc. all mixed together in the same 'compound' document. This is being made possible by using XML namespaces.
    • Mozilla is 'aware' of the SVG content. It can be accessed through the SVG DOM (which is compatible with the XML DOM) and manipulated by Mozilla's script engine.
    • Other Mozilla technologies can be used with SVG. XBL coupled with SVG is a particular interesting combination. It can be used to create graphical widgets (I wonder when we'll see the first SVG-based chrome!) or extend Mozilla to recognize other specialized languages such as e.g. CML (chemical markup language). There are samples of these kinds of more advanced usage patterns on http://croczilla.com/svg/.

    rsvg-view[edit]

    rsvg-view program is a part of the librsvg package[1]. It may be used as the default svg opener. It can resize svgs and export them to png which is often the only thing one needs to do with an svg file.[2]

    Creating SVG files[edit]

    How to do it[edit]

    One can use 4 groups of programs :

    • general text editors, like Notepad ++ ( with XML syntax highlithning)
    • specialized svg editors
    • programs that can exports svg ( like gnuplot, Maxima CAS)
    • own programs to create svg files directly thru concatenate of strings

    SVG editors[edit]

    As you can see from the previous example of a path definition, SVG files are written in an extremely abbreviated format to help minimize file size. However, they can be very difficult to write depending on the complexity of your image. There are SVG editor tools that can help make this task easier. Some of these tools are:

    Table 3: SVG Editors

    SVG Editor Platform Availability Description
    Adobe Illustrator 10.0 Mac OS 9.1/9.2/10.1, Win98/ME, Win2000/XP Commercial product

    Illustrator version 9.01 had SVG export capability. Version 10, announced recently, adds SVG import and enhances SVG export, including data-driven graphics.

    Sodipodi Linux / UNIX Open Source (Free, with source)

    Fast vector graphics WYSWIG editor.

    Adobe Livemotion 2 Win98/ME, Win2000/XP Commercial product

    Adobe Livemotion is the authoring tool similar to Macromedia Flash. It had SVG export capability in earler version, but in Version 2, its support is withdrawn.It looks that even Adobe's support of the SVG is dubious.

    Beez Win95/98/ME, WinNT/2000/XP Free download

    Beez is a WYSIWYG editor to create a single animated SVG path, consisting of multiple Bezier curves, which can then be used in an SVG file. Nice utility for hand coders. It is an open-source project, on sourceforge, and written in Delphi.

    Corel Draw! Win95/98/ME, WinNT/2000, Mac OS X version 11 Commercial product Has SVG import and export capability
    Gill

    (Gnome Illustration Application)

    Linux / UNIX (with Gnome) Free, with source

    Drawing program with SVG import and export; has a full DOM; continuously updated, can embed SVG in other Gnome programs (such as Gnumeric, the spreadsheet). See the CVS changelog for latest status

    IMS Web

    Dwarf

    Win95/98/ME, WinNT/2000/XP Free download WYSIWYG editor, exports to either HTML or SVG
    IMS Web

    Engine

    Win95/98/ME, WinNT/2000/XP 14-day trial downloadable

    IMS Web Engine is an Interactive Animation Editor and Web Top publisher for the creation of content rich interactive Dynamic HTML and SVG

    Inkscape Linux, Windows, Mac Free, with source WYSIWYG editor, but allows editing the XML directly. No animation yet.

    C[edit]

    Here is example in C :

    /*
     
    c console program based on :
    cpp code by Claudio Rocchini
     
    http://commons.wikimedia.org/wiki/File:Poincare_halfplane_eptagonal_hb.svg
     
     
    http://validator.w3.org/
    The uploaded document “circle.svg” was successfully checked as SVG 1.1.
    This means that the resource in question identified itself as “SVG 1.1”
    and that we successfully performed a formal validation using an SGML, HTML5 and/or XML
    Parser(s) (depending on the markup language used).
     
    */
     
    #include <stdio.h>
    #include <stdlib.h>
    #include <math.h>
     
     
     
    const double PI = 3.1415926535897932384626433832795;
     
    const int  iXmax = 1000,
               iYmax = 1000,
               radius=100,
               cx=200,
               cy=200;
    const char *black="#FFFFFF", /* hexadecimal number as a string for svg color*/
               *white="#000000";
     
     FILE * fp;
     
    void draw_circle(FILE * FileP,int radius,int cx,int cy)
    {
        fprintf(FileP,"<circle cx=\"%f\" cy=\"%f\" r=\"%f\" style=\"stroke:%s; stroke-width:2; fill:%s\"/>\n",
        cx,cy,radius,white,black);
    }
     
    void beginSVG(
     
    int main(){
        FILE * fp;
        char *filename="circle.svg";
        fp = fopen(filename,"w");
     char *comment = "<!-- sample comment in SVG file  \n can be multi-line -->";
     
     fprintf(fp,
         "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n"
         "%s \n "
               "<svg width=\"20cm\" height=\"20cm\" viewBox=\"0 0 %f %f \"\n"
               " xmlns=\"http://www.w3.org/2000/svg\" version=\"1.1\">\n",
               comment,iXmax,iYmax);
     
     draw_circle(fp,radius,cx,cy);
     
     
     
        fprintf(fp,"</svg>\n");
     fclose(fp);
     printf(" file %s saved \n",filename );
     getchar();
     return 0;
    }
    

    Haskell[edit]

    Haskel code : lavaurs' algorithm in Haskell with SVG output by Claude Heiland-Allen

    Java Script[edit]

    SVG image with Java Script code

    Matlab[edit]

    Based on code by Guillaume JACQUENOT :[3]

    filename = [filename '.svg'];
    fid = fopen(filename,'w');
    fprintf(fid,'<?xml version="1.0" standalone="no"?>\n');
    fprintf(fid,'"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">\n');
    fprintf(fid,'<svg width="620" height="620" version="1.1"\n');
    fprintf(fid,'xmlns="http://www.w3.org/2000/svg">\n');
    fprintf(fid,'<circle cx="100" cy="100" r="10" stroke="black" stroke-width="1" fill="none"/>\n');
    fprintf(fid,'</svg>\n');
    fclose(fid);
    

    Lisp[edit]

    One can use cl-svg library or your own procedure.

    Maxima[edit]

    BeginSVG(file_name,cm_width,cm_height,i_width,i_height):=
    block(
    destination : openw (file_name),
    printf(destination, "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>~%"),
    printf(destination,"<svg width=\"~d cm\" height=\"~d cm\" viewBox=\"0 0 ~d ~d\" xmlns=\"http://www.w3.org/2000/svg\" version=\"1.1\">~%",
    cm_width,cm_height,i_width,i_height),
    return(destination)
    );
    CircleSVG(dest,center_x,center_y,_radius):=printf(dest,"<circle cx=\"~d\" cy=\"~d\" r=\"~d\" fill=\"white\" stroke=\"black\" stroke-width=\"2\"/>~%",
    center_x,center_y,_radius);
    CloseSVG(destination):=
    (
    printf(destination,"</svg>~%"),
    close (destination)
    );
    /* ---------------------------------------------------- */
    cmWidth:10;
    cmHeight:10;
    iWidth:800;
    iHeight:600;
    radius:200;
    centerX:400;
    centerY:300;
    f_name:"b.svg";
    /* ------------------------------------------------------*/
    f:BeginSVG(f_name,cmWidth,cmHeight,iWidth,iHeight);
    CircleSVG(f,centerX,centerY,radius);
    CloseSVG(f);
    

    Python[edit]

    Image with python code

    Getting started[edit]

    Because it is based on XML, SVG follows standard XML conventions. Every SVG file is contained within an <svg> tag as its parent element. SVG can be embedded within a parent document or used independently. For example, the following shows an independent SVG document:

    Exhibit 1: Creating a SVG

    <?xml version="1.0" standalone="no"?>
    <svg width="100%" height="100%" version="1.1" xmlns="http://www.w3.org/2000/svg">
        ...
    </svg>
    

    The first line declares that the code that follows is XML. Note the “standalone” attribute. This denotes that this particular file does not contain enough processing instructions to function alone. In order to attain the required functionality it needs to display a particular image, the SVG file must reference an external document.

    The second line provides a reference to the Document Type Definition, or DTD. As mentioned in Chapter 7: XML Schemas, the DTD is an alternate way to define the data contained within an XML instanced document. Developers familiar with HTML will notice the DTD declaration is similar to that of an HTML document, but it is specific for SVG. For more information about DTDs, visit: http://www.w3schools.com/dtd/dtd_intro.asp

    Hint: Many IDEs (ex. NetBeans) do not have SVG “templates” built in to the tool. Therefore, it may be easier to use a simple text editor when creating SVG documents. Once you have an SVG Viewer installed, you should then be able to open and view your SVG document with any browser. When creating your SVG documents, remember to:

    • Declare your document as an XML file
    • Make sure your SVG document elements are between <svg> element tags, including the SVG namespace declaration.
    • Save your file with a .svg file extension.
    • It is not necessary do include a DOCTYPE statement, which includes information to identify this as an SVG document (since SVG 1.2 there is also not more such).[4][5][6]

    The <svg> element on line 5 defines the SVG document, and can specify, among other things, the user coordinate system, and various CSS unit specifiers. Just like with XHTML documents, the document element must include a namespace declaration to declare the element as being a member of the relevant namespace (in this case, the SVG namespace). Within the <svg> element, there can be three types of drawing elements: text, shapes, and paths.

    Text[edit]

    The following is an example of the text element: Exhibit 2: Using text with SVG

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <svg width="5.5in" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" height="0.5in">
       <text y="15" fill="red">This is SVG.</text>
    </svg>
    


    The <svg> element on line 4 specifies: 1) that white space within text elements will be retained, 2) the width and height of the SVG document — particularly important for specifying print output size. In this example, the text is positioned in a 5.5 inches wide by .5 inches tall image area. The “y” attribute on line 5 declares that the text element’s baseline is 15 pixels down from the top of the SVG document. An omitted “x” attribute on a text element implies an x coordinate of 0.

    Because SVG documents use a W3C DTD, you can use the W3C Validator to validate your document. Notice that the “style” attribute is used to describe the presentation of the text element. The text could equivalently have been given a red color by use of a presentation attribute fill="red".

    Shapes[edit]

    SVG contains the following basic shape elements:

    • Rectangles
    • Circles
    • Ellipses
    • Lines
    • Polylines
    • Polygons

    These basic shapes, along with “paths” which are covered later in the chapter, constitute the graphic shapes of SVG. Because is this an introduction to SVG, we will only cover rectangle and circle shapes here. For more information on all shapes, please visit: http://www.w3schools.com/svg/default.asp

    Rectangles[edit]

    The <rect> element defines a rectangle which is axis-aligned with the current user coordinate system, the coordinate system that is currently active and which is used to define how coordinates and lengths are located and computed on the current canvas. Rounded rectangles can be created by setting values for the rx and ry attributes.

    The following example produces a blue rectangle with its top left corner aligning with the top left corner of the image area. This utilizes the default value of “0” for the x and y attributes.

    Exhibit 3: Creating a rectangle in SVG

    <?xml version="1.0"?>
    <svg xmlns="http://www.w3.org/2000/svg" top="0in" width="5.5in" height="2in">
        <rect fill="blue" width="250" height="200"/>
    </svg>
    

    It will produce this result:

    Example of SVG use of rectangle shape

    Circles[edit]

    A circle element requires three attributes: cx, cy, and r. The 'cx’ and 'cy’ values specify the location of the center of the circle while the 'r’ value specifies the radius. If the 'cx’ and 'cy’ attributes are not specified then the circle's center point is assumed to be (0, 0). If the 'r’ attribute is set to zero then the circle will not appear. Unlike 'cx’ and 'cy’, the 'r’ attribute is not optional and must be specified. In addition the keyword stroke creates an outline of the image. Both the width and the color can be changed.

    Exhibit 4: Creating a circle in SVG

    <?xml version="1.0"?>
    <svg xmlns="http://www.w3.org/2000/svg" width="350" height="300">
        <circle cx="100" cy="50" r="40" stroke="darkslategrey" stroke-width="2" fill="red"/>
    </svg>
    

    It will produce this result:

    Example of SVG use of circle shape

    Polygons[edit]

    A polygon is any geometric shape consisting of three or more sides. The 'points' attributes describes the (x,y) coordinates that specify the corners points of the polygon. For this specific example, there are three points which indicate that a triangle will be produced.

    Exhibit 5: Creating a Polygon in SVG

    <?xml version="1.0" standalone="no"?>
    <svg width="100%" height="100%" version="1.1" xmlns="http://www.w3.org/2000/svg">
       <polygon points="220,100 300,210 170,250" style="fill:#blue;stroke:red;stroke-width:2"/>
    </svg>
    

    It will produce this result:

    XML example polygon.svg

    Paths[edit]

    Paths are used to draw your own shapes in SVG, and are described using the following data attributes:

    Table 2: SVG Paths

    Attribute Command Parameters Function Description
    Moveto M x y Set a new current point Start a new sub-path at the given (x,y) coordinate.
    Lineto L x y Draw a straight line

    Draw a line from the current point to the given (x,y) coordinate which becomes the new current point.

    Horizontal lineto H x Draw a horizontal line

    Draws a horizontal line from the current point (cpx, cpy) to (x, cpy).

    Vertical lineto V y Draw a vertical line

    Draws a vertical line from the current point (cpx, cpy) to (cpx, y).

    Curveto C x1 y1 x2 y2 x y Draw a curve using a cubic Bezier

    Draws a cubic Bézier curve from the current point to (x,y) using (x1,y1) as the control point at the beginning of the curve and (x2,y2) as the control point at the end of the curve.

    Smooth curveto S x2 y2 x y Draw a shorthand/smooth curve using a cubic Bezier

    Draws a cubic Bézier curve from the current point to (x,y). The first control point is assumed to be the reflection of the second control point on the previous command relative to the current point. (x2,y2) is the second control point (i.e., the control point at the end of the curve)

    Quadratic Belzier curveto Q x1 y1 x y Draws a quadratic Bézier curve

    Draws a quadratic Bézier curve from the current point to (x,y) using (x1,y1) as the control point.

    Smooth quadratic Belzier curveto T x y Draws a shorthand/smooth quadratic Bézier curve

    Draws a quadratic Bézier curve from the current point to (x,y).

    Elliptical arc A rx ry x-axis-rotation large-arc-flag sweep-flag x y Draw an elliptical or circular arc

    Draws an elliptical arc from the current point to (x, y). The size and orientation of the ellipse are defined by two radii (rx, ry) and an x-axis-rotation, which indicates how the ellipse as a whole is rotated relative to the current coordinate system. The center (cx, cy) of the ellipse is calculated automatically to satisfy the constraints imposed by the other parameters. large-arc-flag and sweep-flag contribute to the automatic calculations and help determine how the arc is drawn.

    Closepath Z (none)

    Close the current path by drawing a line to the last moveto point

    Close the current sub path by drawing a straight line from the current point to current sub path’s initial point.

    The following example produces the shape of a triangle. The “M” indicates a “moveto” to set the first point. The “L” indicates “lineto” to draw a line from “M” to the “L” coordinates. The “Z” indicates a “closepath”, which draws a line from the last set of L coordinates back to the M starting point.

    Exhibit 6: Creating paths in SVG

    <?xml version="1.0"?>
    <svg xmlns="http://www.w3.org/2000/svg" width="5.5in" height="2in">
        <path d="M 50 10 L 350 10 L 200 120 z"/>
    </svg>
    

    It produces this result:

    Example of SVG use of paths

    Validation[edit]

    After creating file check its code with the W3C Validatior[7]

    Optimisation[edit]

    Even code without errors can be improved. For example grouping elements makes code shorter.

    Including SVG in HTML[edit]

    There are three methods to include SVG in an HTML document. Basically, the SVG document is first created as a stand-alone file. It is then referenced in the HTML document using one of the following commands:

    Table 4: Including SVG in HTML

    Command Advantages Disadvantages
    <embed>
    1. Supported in nearly any browser
    2. Allows html2svg and svg2html scripting
    3. Recommended by Adobe for their SVG Viewer
    1. Not clearly standardized within any HTML specification
    <object>
    1. HTML4 and higher standard
    2. Supported in newer browser generations
    1. Works on newer browsers but without html2svg and svg2html scripting
    <iframe>
    1. Works in most browsers, but not documented
    1. Generates a window-like border without specific styling

    Embed[edit]

    The syntax is as follows: Exhibit 7: Embedding SVG into HTML using keyword embed

    <embed src="canvas.svg" width="350" height="176" type="image/svg+xml" name="emap">
    

    An additional attribute, “pluginspage”, can be set to the URL where the plug-in can be downloaded:

    pluginspage="http://www.adobe.com/svg/viewer/install/main.html"

    Object[edit]

    The syntax is as follows and conforms to the HTML 4 Strict specification: Exhibit 8: Embedding SVG into HTML using keyword object

    <object type="image/svg+xml" name="omap" data="canvas_norelief.svg" width="350" height="176"></object>
    

    Between the opening and the closing <object> tags, information for browsers that do not support objects can be added:

    <object ...>You should update your browser</object>
    

    Unfortunately some browsers such as Netscape Navigator 4 do not show this alternative content if the type attribute has been set to something other than text/html.

    Iframe[edit]

    The syntax is as follows and conforms to the HTML 4 Transitional specification: Exhibit 9: Embedding SVG into HTML using keyword iframe

    <iframe src="canvas_norelief.svg" width="350" height="176" name="imap"></iframe>
    

    Between the opening and the closing <iframe> tags, information for browsers that do not support iframes can be added:

    <iframe ...>You should update your browser</iframe>
    

    Creating 3D SVG images[edit]

    Section by Charles Gunti, UGA Master of Internet Technology Program, Class of 2007

    Sometime we may want to view an SVG image in three dimensions. For this we will need to change the viewpoint of the graphic. So far we have created two dimensional graphics, such as circles and squares. Those exist on a simple x, y plane. If we want to look at something in three dimensions we have to add the z coordinate plane. The z plane is already there, but we are looking at it straight on, so if data is changed on z it doesn't look any different to the viewer. We need to add another parameter to the data file, the z parameter.

    <?xml version="1.0"?>
      <data>
      <subject x_axis="90" y_axis="118" z_axis="0" color="red" />
      <subject x_axis="113" y_axis="45" z_axis="75" color="purple" />
      <subject x_axis="-30" y_axis="-59" z_axis="110" color="blue" />
      <subject x_axis="60" y_axis="-50" z_axis="-25" color="yellow" />
    </data>
    

    Once we have the data we will use XSLT to create the SVG file. The SVG stylesheet is the same as other stylesheets, but we need to ensure an SVG file is created during the transformation. We call the SVG namespace with this line in the declarations:

     xmlns="http://www.w3.org/2000/svg
    

    Another change we should make from previous examples is to change the origin of (0, 0). We change the origin in this example because some of our data is negative. The default origin is at the upper left corner of the SVG graphic. Negative values are not displayed because, unlike traditional coordinate planes, negative values are above positive values. To move the origin we simply add a line of code to the stylesheet. Before going over that line, let's look at The g element. The container element, g, is used for grouping related graphics elements. Here, we'll use g to group together our graphical elements and then we can apply the transform. Here is how we declare g and change the origin to a point 300 pixels to the right and 300 pixels down:

     <g transform="translate(300,300)">graphical elements</g>
    

    SVG transformations are pretty simple, until it comes to changing the viewpoint. SVG has features such as rotating and skewing the image in two dimensions, but it cannot rotate the coordinate system in three dimensions. For that we will need to use some math and a little Java. When rotating in three dimensions two rotations need to be made, one around the y axis, and another around the x axis. The first rotation will be around the y axis and the formula will look like this:

    z' = z \cdot \cos(-Az) - x \cdot \sin(-Az) Az is the angle the z axis will be rotated

    x' = z \cdot \sin(-Az) - x \cdot \cos(-Az)

    y' = y \, y will not change because we are rotating around the y axis

    The second rotation will be around the x axis. Keep in mind that one rotation has already been made, so instead of using x, y, and z values we need to use x', y', and z' (x-prime, y-prime and z-prime) found in the last rotation. The formula will look like this:

    z" = z'*cos(Ay) – y'*sin(Ay) Ay is the angle of rotation on the y axis

    y" = z'*sin(Ay) + y'*cos(Ay)

    x" = x' Remember we are rotating around the x axis, so this does not change

    Remember from trig class the old acronym SOH CAH TOA? This means

    Sin = Opposite/Hypotenuse Cos = Adjacent/Hypotenuse Tan = Opposite/Adjacent

    And we use those functions to find the angles needed for our rotations. Based of the previous two formulas we can make the following statements about Az and Ay:

    tan(Az) = Xv/Zv

    sin(Ay) = Yv/sqrt(Xv2 + Yv2 + Zv2)

    With so many steps to take to make the rotation we should drop all of this information into a Java class, then call the class in the stylesheet. The Java class should have methods for doing all of the calculations for determining where the new data points will go once the rotation is made. Creating that java class is beyond the scope of this section, but for this example I'll call it ViewCalc.class.

    Now that we can rotate the image, we need to integrate that capability into the transformation. We will use parameters to pass viewpoints into the stylesheet during the transformation. The default viewpoint will be (0, 0, 0) and is specified on the stylesheet like so:

    Exhibit 10: 3D images with SVG

         <?xml version="1.0" ?>
         <xsl:stylesheet version="1.0"
               xmlns="http://www.w3.org/2000/svg"
               xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
            <!-- default viewpoint in case they are not specified  -->
            <!-- from the command line -->
            <xsl:param name="viewpoint_x">0</xsl:param>
            <xsl:param name="viewpoint_y">0</xsl:param>
            <xsl:param name="viewpoint_z">0</xsl:param>
         <xsl:template match="/">

    Java now needs to be added to the stylesheet so the processor will know what methods to call. Two lines are added to the namespace declarations:

         <?xml version="1.0" ?>
         <xsl:stylesheet version="1.0"
             xmlns="http://www.w3.org/2000/svg"
             xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
             <b>xmlns:java="ViewCalc"
             exclude-result-prefixes="java"</b>>
    

    Notice the exclude-result-prefixes="java" line. That line is added so things in the stylesheet with the java: prefix will be processed, not output. Be sure to have the ViewCalc class in the CLASSPATH or the transformation will not run.

    The final step is to call the methods in the ViewCalc class from the stylesheet. For example:

       <xsl:template match="square">
            <xsl:for-each select=".">
             <xsl:variable name="locationx" select="@x_axis"/>
             <xsl:variable name="locationy" select="@y_axis"/>
             <xsl:variable name="locationz" select="@z_axis"/>
            <xsl:variable name="thisx" select="java:locationX($locationx,$locationy,
              $locationz, $viewpoint_x, $viewpoint_y,
              $viewpoint_z)"/>
            <xsl:variable name="thisy" select="java:locationY($locationx,
              $locationy, $locationz, $viewpoint_x, $viewpoint_y,
              $viewpoint_z)"/>
           </xsl:for-each>
    

    Finally we pass new parameters and run the XSL transformation to create the SVG file with a different viewpoint.

    Summary[edit]

    SVG stands for Scalable Vector graphics. Meaning that it creates an image that will not lose image quality when moving or changing the size. Similar to Flash in functionality, neither is better than the other, they are however better in particular situations (some of which were listed earlier.) Can create both 2D and 3D images via SVG. Supported by W3C.

    Demos[edit]

    The following table provides a sampling of SVG documents that demonstrate varying degrees of functionality and complexity:

    Table 5: SVG Demos

    Function URL Browser Compatibility
    Basic http://www.carto.net/papers/svg/samples/canvas.svg All
    Fills http://www.carto.net/papers/svg/samples/fill.svg All
    HTML, JS, Java Servlet http://www.adobe.com/svg/viewer/install/main.html - Then follow to Inspiration, Fluent Solutions/Adobe Theater demo Does not provide full functionality in Mozilla
    HTML, JS, DOM http://www.adobe.com/svg/viewer/install/main.html - Then follow to Inspiration, Chart and Graph demo Does not provide full functionality in Mozilla
    PHP, MySQL http://www.carto.net/papers/svg/samples/mysql_svg_php.shtml All

    The Basic demo demonstrates the effects of zooming, panning, and anti-aliasing (high quality).

    The Fills demo demonstrates the effects of colors and transparency. The black circle is drag-able. Simply click and drag the circle within the square to see the changes.

    The HTML, JS, Java Servlet demo describes an interactive, database-driven, seating diagram, where chairs represent available seats for a performance. If the user moves the mouse pointer over a seat, it changes color, and the seat detail (section, row, and seat number) and pricing are displayed. On the client side of the application, SVG renders the seating diagram and works with JavaScript to provide user interactivity. The SVG application is integrated with a server-side database, which maintains ticket and event availability information and processes ticket purchases. The Java Servlet handles form submission and updates the database with seat purchases.

    The HTML, JS, DOM demo shows how SVG manages and displays data, generating SVG code from data on the fly. Although this kind of application can be written in a variety of different ways, SVG provides client-side processing to maintain and display the data, reducing the load on the server as well as overall latency. Using the DOM, developers can build documents, navigate their structure, and add, modify, or delete elements and content.

    The PHP, MySQL demo shows the use of database driven SVG generation utilizing MySQL. It randomly generates a map of a European country. Each time you reload the page you will see a different country.


    Exercises[edit]

    1. Download and install the Adobe SVG Viewer. Once the Adobe SVG Viewer has been installed, go to this page to test that the install was successful: http://www.adobe.com/svg/viewer/install/svgtest.html
    2. Create your own stand-alone SVG file to produce an image containing a circle within a rectangle.
    3. Create your own stand-alone SVG file. Use 3 circles and 1 path element to create a yellow smiley face with black eyes and a black mouth. Use a text element so that the message “Have a nice day!” appears below the smiley face.
      • Hint: Because <path> elements can be difficult to write, here is a sample path you can utilize:
      • <path d="M 100, 120 C 100,120 140, 140 180,120" style="fill:none;stroke:black;stroke-width:1"/>

    References[edit]



    Previous Chapter Next Chapter
    SVG DocBook



    Learning objectives

    • Learn history of VoiceXML
    • Understand hardware/software requirements of VoiceXML
    • Learn basic VoiceXML elements


    Voicexml examples[edit]

    According to the W3C, "VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed initiative conversations. Its major goal is to bring the advantages of Web-based development and content delivery to interactive voice response applications."

    Here are two short examples of VoiceXML. The first is the always fun example, "Hello World":

    Hello world

    <?xml version="1.0" encoding="UTF-8"?>
    <vxml xmlns="http://www.w3.org/2001/vxml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/vxml" version="2.0">
    <form>
    <block>Hello World!</block>
    </form>
    </vxml>
    


    The top-level element is <vxml>, which is mainly a container for dialogs. The two main types of dialogs are forms and menus. Forms present information and gather input. Menus offer choices of what to do next. This example has a single form, which contains a block that synthesizes and presents "Hello World!" to the user. Since the form does not specify a dialog after "Hello World", the conversation ends.

    Our second example asks the user for a choice of drink and then submits it to a server script:


    Form example:

    <?xml version="1.0" encoding="UTF-8"?>
    <vxml xmlns="http://www.w3.org/2001/vxml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.w3.org/2001/vxml
    http://www.w3.o