XML - Managing Data Exchange/Print version

From Wikibooks, open books for an open world
Jump to navigation Jump to search


XML - Managing Data Exchange

The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/XML_-_Managing_Data_Exchange

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.

Contents


To do

Feel free to add your suggestions to this list of improvements. If you complete one of the tasks, please place strike through the comment using <strike></strike> tags.

Current To-Dos (January 28, 2007 and later)[edit]

  1. Add template to all subpages, using the following code: {{XML-MDE}}
  2. Come up with a better design for template.
  3. Make sure navigation links are added to the top of every chapter. (Navigation fixed in book template)
  4. Group chapters by topic -- any suggestions for grouping schemes?
    1. I was thinking "Principles of XML," "Languages derived from XML," and "XML in Applications" (the last category referencing mainly AJAX) -- Runnerupnj
  5. Provide links from a chapter to the exercises it covers, and vice-versa.
  6. Mend links to previous module main page.
  7. Separate exercise questions and answers.
  8. Break chapters into shorter sections.
  9. Create a glossary with links from within the book.
  10. Create a Ajax Page. - There is no page here for Ajax help with XML!
    1. We can link the AJAX book here.
  11. Create a PDF version available from Wikibooks

To-Dos Previous to January 28, 2007[edit]

These to-dos remained from before the project was reinvigorated, and rested for a brief period on the Talk page. -- Runnerupnj 08:41, 29 January 2007 (UTC)

The list is in no particular priority

  1. Convert all code examples to the format specified in Author guidelines
  2. A print version to make reading easier
  3. Breaking chapters into shorter sections
  4. Hints for common problems (hint box)
    1. FAQs
    2. a "Common Errors" section near the exercise section. That way when future students run into problems in the exercises, especially the stylesheets, they can hopefully find a common error and fix their problem quickly
  5. Glossary with links from within the book
  6. Chapter 2 on XHTML (move later chapter and make complete)
  7. Exercises and answers on separate pages (tell people how to open a second copy and use it – end of chapter 1)
  8. Good XML editor
  9. Check all answers (also indicate who validated the answer with person’s email)
  10. Major league baseball exercise
    1. Develop an XML schema to show the organization of Major League Baseball. There are many teams within MLB and the teams are all composed of different athletes.
    2. Set up the XML Document with a Division of either the American League or the National League. Enter a representative data into the document to justify your answer.
    3. Organize the XML stylesheet to nicely display the data.
  11. Move all Java parsing to a separate chapter
  12. Write BlueJ as per database access for XML parsing
  13. Move exercise 4 from chapter 3 (one-to-many relationship) and place it in chapter 5 (many-to-many relationship). My reason for this is as follows:
  14. This problem asks you to create a personal library. As we learned earlier a library can have many books and books have many copies. There can be many different people who check out books, however, what they actually check out are copies of books making this a many-to-many relationship since a borrower can check out many copies of a book. I feel like this exercise is a little misleading and would be better off in ch. 5. Most people who have had any experience in data modeling and are trying to learn XML from this book would be confused by this exercise (i.e. myself). It's hard to do something that you haven't learned how to do yet.
  15. Comments in the code and not elsewhere
  16. Instead of giving a complete explanation for an example of an xml/xls/xsd after the problem, explain each piece of the code as you go through it. Or after a given solution, repeat the entire line of code that you are trying to explain. I've found this layout in other technology related books, and it has been easier to follow along. Also, when referring to a table or different section of the book, create a bookmark or link to that section. Could this be in the instructions for authors and what else could we add.
  17. Instructions on how to convert XML to HTML with NetBeans and any other editor
  18. Convert all slides to DocBook slide format
  19. Chapter on XQuery
  20. Chapter on Lenya
  21. Spellcheck the book on a regular basis



Preface



Next Chapter
Introduction to XML




Goals[edit]

Book[edit]

The goal of this book is to provide a comprehensive coverage of eXtensible Markup Language (XML) in a textbook format. This book is written and edited by students for students. Each student who uses the book should improve its quality by correcting errors, adding exercises, adding examples, starting new chapters and so forth.

Chapters 2 through 6 take the perspective that an XML schema is a representation of a data model, and thus these chapters deal with mapping the complete range of relationships that occur between entities. As you learn how to convert each type of relationship into a schema, other aspects of XML are introduced. For example, stylesheets are initially introduced in Chapter 2 and progressively more stylesheet features are added in Chapters 3 through 6.

Consolidation chapters (e.g., Chapter 7 "Data schemas") bring together the material covered across previous chapters; in this case, Chapters 2 through 6. This means students see key skills twice: once in the context of gradually developing their broad understanding of XML and then again in the specific context of one dimension of XML.

Application chapters cover particular uses of XML (e.g., SVG for scalable vector graphics) to give the reader examples of the use of XML to solve particular types of problems. This part of the book is expected to grow as the use of XML extends.

Project[edit]

Professors typically throw away their students’ projects at the end of the term. This is a massive waste of intellectual resources that can be harnessed for the betterment of many by creating an appropriate infrastructure. In our case, we use wiki technology as the infrastructure to create a free open content textbook.

University students are an immense untapped global resource. They can be engaged in creating open textbooks if the right infrastructure is in place to sustain renewable student projects. This book is an example of how waste can be avoided.

History[edit]


Software[edit]

To complete the exercises in the book and view the slides, you will need access to the following software (or a suitable alternative):



Introduction to XML

Learning Objectives
  • define the purpose of SGML, HTML, and XML


There are four central problems in data management: capture, storage, retrieval, and exchange of data. The purpose of this book is to address XML, a technology for managing data exchange. The foundational XML chapters in this book are structured by a 'data model' approach. The first chapter introduces the reader to the XML document, XML schema, and XML stylesheet with a single entity example. Subsequent chapters expand upon the XML basics with multiple-entity examples and a one-to-one relationship, a one-to-many relationship, or a many-to-many relationship.

XML is a tool used for data exchange. Data exchange has long been an issue in information technology, but the Internet has elevated its importance. Electronic data interchange (EDI), the traditional data exchange standard for large organizations, is giving way to XML, which is likely to become the data exchange standard for all organizations, irrespective of size.

EDI supports the electronic exchange of standard business documents and is currently the major data format for electronic commerce. A structured format is used to exchange common business documents (e.g., invoices and shipping orders) between trading partners. In contrast to the free form of e-mail messages, EDI supports the exchange of repetitive, routine business transactions. Standards mean that routine electronic transactions can be concise and precise. The main standard used in the United States and Canada is known as X.12, and the major international standard is UN/EDIFACT. Firms adhering to the same standard can share data electronically.

The Internet is a global network potentially accessible by nearly every firm, with communication costs typically less than those of traditional EDI. Consequently, the Internet has become the electronic transport path of choice between trading partners. The simplest approach is to use the Internet as a means of transporting EDI documents. But because EDI was developed in the 1960s, another approach is to reexamine the technology of data exchange. A result of this rethinking is XML, but before considering XML we need to learn about SGML, the parent of XML.

SGML[edit]

For a typical U.S. firm, it is estimated that document management consumes up to 15 percent of its revenue, nearly 25 percent of its labour costs, and anywhere between 10 and 60 percent of an office worker’s time. The Standard Generalized Markup Language (SGML) is designed to reduce the cost and increase the efficiency of document management.

A markup language embeds information about a document within the document's text. In the following example, the markup tags indicate that the text contains details of a city. Note also that the city's name, state, and population are identified by specific tags. Thus, the reader—a person or a computer—is left in no doubt as to meaning of Athens, Georgia, or 100,000. Note also the latitude and location of the city are explicitly identified with appropriate tags. SGML’s usefulness is based upon both recording text and the meaning of that text.

Exhibit 1: Markup language

<city> 
       <cityname>Athens</cityname> 
       <state>GA</state>
       <description> Home of the University of Georgia</description>
       <population>100,000</population>
       <location>Located about 60 miles Northeast of Atlanta</location>
       <latitude>33 57' 39" N</latitude>
       <longitude>83 22' 42" W</longitude>
</city>

SGML is a vendor-independent International Standard (ISO 8879) that defines the structure of documents. Developed in 1986 as a meta language, SGML is the parent of both HTML and XML. Because SGML documents are standard text files, SGML provides cross-system portability. When technology is rapidly changing, SGML provides a stable platform for managing data exchange. Furthermore, SGML files can be transformed for publication in a variety of media. The use of SGML preserves textual information independent of how and when it is presented. Organizations reap long-term benefits when they can store documents in a single, independent standard that can then be converted for display in any desired media.

SGML has three major advantages for data management:

  • Reuse: Information can be created once and reused many times.
  • Flexibility: SGML documents can be published in any format. The same content can be printed, presented on the Web, or delivered with a text synthesis. Because SGML is content-oriented, presentation decisions can be delayed until the output format is decided.
  • Revision: SGML supports revision and version control. With content version control, a firm can readily track the changes in documents.

A short section of SGML demonstrates clearly the features and strength of SGML (see Exhibit 2). The tags surrounding a chunk of text describe its meaning and thus support presentation and retrieval. For example, the pair of tags <airline> and </airline> surrounding “Delta” identify the airline making the flight.

Exhibit 2: SGML example

   <flight>
       <airline>Delta</airline>
       <flightno>22</flightno>
       <origin>Atlanta</origin>
       <destination>Paris</destination>
       <departure>5:40pm</departure>
       <arrival>8:10am</arrival>
   </flight>

The preceding SGML code can be presented in several ways by applying a style sheet to the file. For example, it might appear as

Delta flight 22 flies from Atlanta to Paris leaving 5:40pm and arriving 8:10am

or as

Airline Flight Origin Destination Departure Arrival
Delta 22 Atlanta Paris 5:40pm 8:10am


If the data are stored in HTML format and rendered on a Web site (as in Exhibit 3), then the meaning of the data has to be inferred by the reader. This is generally quite easy for humans, but impossible for machines. Furthermore, the presentation format is fixed and can only be altered by rewriting the HTML. If you are not familiar with HTML, you should read the WikiBooks chapter on XHTML, an extension of HTML, before reading the next chapter.

Exhibit 3: HTML rendering example

    Delta flight 22 flies from Atlanta to Paris leaving 5:40pm and arriving 8:10am

Meaning and presentation should be independent, and this is an important reason why SGML is more powerful than HTML.

SGML is a markup language that defines the structure of documents and is preferred to HTML as it can be transformed into a variety of media.

XML[edit]

Many computer systems contain data in incompatible formats. A time-consuming challenge is to exchange data between such systems. XML is a generic data storage format that comes bundled with a number of tools and technologies that should make it easier to exchange specific XML 'applications' between incompatible systems. Since XML is open and generic, it is expected that as time progresses, more and more organizations and people will jump onto the XML bandwagon, both developers and data users. This should make XML the ultimate viable technology for certain types of data exchange.

XML is used not only for exchanging information, but also for publishing Web pages. XML's very strict syntax allows for smaller and faster Web browsers and as such is well suited for use with Personal Digital Assistants (PDAs) and cellphones. Web browsers that interpret HTML documents, on the other hand, are bloated with programming code to compensate for HTML’s not so strict coding.

The types of data generally well suited for encoding as XML are those where field lengths are unknown and unpredictable and where field contents are predominantly textual.

An XML schema allows for the exchange of information in a standardized structure. A schema defines custom markup tags that can contain attributes to describe the content that is enclosed by these tags. Information from the tagged data in the XML document can be extracted using an application called a “parser”, and with the use of an XML stylesheet the data can be formatted for a Web page.

XML's power lies in the combination of custom markup tags and content in a defined XML document. The purpose of eXtensible Markup Language (XML) is to make information self-describing. Based on SGML, XML is designed to support electronic commerce. The definition of XML, completed in early 1998 by the World Wide Web Consortium (W3C), describes it as a meta language — a language to generate languages. XML should steadily replace HTML on many Web sites because of some key advantages. The major differences between XML and HTML are captured in the following table.

Exhibit 4: XML vs HTML

XML HTML
Information content Information presentation
Extensible set of tags Fixed set of tags
Data exchange language Data presentation language
Greater hypertext linking Limited hypertext linking


The eXtensible in XML means that a new data exchange language can be created by defining its structure and tags. For example, the OpenGIS Consortium designed a Geography Markup Language (GML) to facilitate the electronic exchange of geographic information. Similarly, the Open Tourism Consortium is working on the definition of TourML to support exchange of tourism information. The insurance industry uses data corresponding to the XML based standard ACORD for electronic data exchange. Another good example of XML in action is NewsML™.

In this text we will cover all the features of XML, but at this point let us introduce a few of the key features.


Applications of XML:

Before we start learning more about how an XML document is structured, let us point out what XML can be used for. The four major implementations of XML are:

Publication: Database content can be converted into XML and afterwards into HTML by using an XSLT stylesheet. Making use of this technique, complex websites as well as print media like PDF files can be generated. Information no longer has to be stored in different formats (i.e. RTF, DOC, PDF, HTML). Content can be stored in the neutral XML format and then, using appropriate layout style sheets and transformations, brochures, websites, or datalists can be generated (See more in Chapter 17.)

An example of the capability of XML and XSLT can be found at http://www.emimusic.de: This website contains approximately 20,000 pages with profiles of the artists, their products and the titles of the songs. These pages are generated using a XSLT script. Based on the script used it will also be possible to create a catalog in PDF format. Please see below for more details.

Interaction: XML can be used for accessing and changing data interactively. This man<->machine communication usually happens via a web browser (see Chapter 12).

Integration: Using XML, homogenous and heterogenous applications can be integrated. In this case, XML is used to describe data, interfaces, and protocols. This machine-machine communication helps integrate relational databases (i.e. by importing and exporting different formats).

Transaction: XML helps to process transactions in applications like online marketplaces, supply chain management, and e-procurement systems.

Key features of XML[edit]

  • Elements have both an opening and a closing tag
  • Elements follow a strict hierarchy, with documents containing only one root element
  • Elements cannot overlap other elements
  • Element names must obey XML naming conventions
  • XML is case sensitive

XML will improve the efficiency of data exchange in several important ways, which include:

  • write once and format many times: Once an XML file is created it can be presented in multiple ways by applying different XML stylesheets. For instance, the information might be displayed on a web page or printed in a book.
  • hardware and software independence: XML files are standard text files, which means they can be read by any application.
  • write once and exchange many times: Once an industry agrees on a XML standard for data exchange, data can be readily exchanged between all members using that standard.
  • Faster and more precise web searching: When the meaning of information can be determined by a computer (by reading the tags), web searching will be enhanced. For example, if you are looking for a specific book title, it is far more efficient for a computer to search for text between the pair of tags <booktitle> and </booktitle> than search an entire file looking for the title. Furthermore, spurious results should be eliminated.
  • data validation XML allows data validation using XSD or DTD which is a contractual agreement between two interacting parties.

10 reasons to use XML[edit]

  1. XML is a widely accepted open standard.
  2. XML allows to clearly separate content from form (appearance).
  3. XML is text-oriented.
  4. XML is extensible.
  5. XML is self-describing.
  6. XML is universal; meaning internationalization is no problem.
  7. XML is independent from platforms and programming languages.
  8. XML provides a robust and durable format for information storage.
  9. XML is easily transformable.
  10. XML is a future-oriented technology.

The major XML elements[edit]

The major XML elements are:

  • XML document: An XML file containing XML code.
  • XML schema: An XML file that describes the structure of a document and its tags.
  • XML stylesheet: An XML file containing formatting instructions for an XML file.

In the next few chapters you will learn how to create and use each of these elements of XML.

Creating a markup file[edit]

Any text editor can be used to create a markup file (e.g. an HTML file). In this book, we use the text editor within NetBeans, an open source Integrated Development Environment (IDE) for Java, because NetBeans supports editing and validation of XML files. Before proceeding, you should download and install NetBeans from http://www.NetBeans.org/.

The examples in this book use NetBeans to illustrate proper XML code. For an alternative to NetBeans, see Exchanger XML Lite

Case Studies in XML Implementation[edit]

XML at United Parcel Service (UPS)[edit]

“UPS is a service company and it is all about scale and speed,” says Geoff Chalmers, Project Leader at UPS eSolutions Department. In 2003, UPS had $33.5 billion annual revenue and 357,000 employees worldwide. Six percent of the United States' Gross Domestic Product (GDP) on any given day is in the UPS system.

UPS uses technology extensively. The Information Systems department employs 4,000 people. The company's web site has 166 different country home pages and is supported by 44 applications.

UPS delivers around 13 million packages every day, and customers can track these shipments via the UPS Web site, which receives around 200 million hits daily. Nineteen of the applications within ups.com are XML OnLine Tool (Web services) applications.

UPS’s online tools are developed specifically to be integrated with customers’ applications. This makes the customer’s task simpler, easier, and faster. UPS verified the importance of simplicity and speed, via “CampusShip,” a product that has been one of the UPS’s most successful in the last 10 years. UPS CampusShip® is a Web-based, UPS-hosted shipping system. Using an Internet connection, employees can ship their own packages and letters from any desktop, while management maintains overall control of shipping activities. UPS CampusShip® allows simultaneous shipper autonomy and managerial cost-control within the organization. This product has been successful because no installation or software maintenance is required and it is quick to implement. XML Online Tools enabled cheap and fast evolution of CampusShip®.

UPS favors XML especially because it is agnostic; platform and language independent. These features make XML very flexible and powerful. It is also decoupled and scalable. XML has enabled UPS to target a broader market and reduce customer interaction, and thus the cost of customer service. Another positive feature of XML is that it is backward compatible. The adoption of XML has reduced maintenance, implementation, and usage costs significantly within UPS.

However these advantages don’t come without a price. “XML is inefficient in so many ways” says Chalmers. XML unfortunately takes more CPU and bandwidth than the other technologies. Yet bandwidth and CPU are cheap and getting cheaper everyday, so this is a gradually disappearing problem.

Nevertheless, Chalmers also thinks that XML doesn’t work well in databases. He says that it is too wordy and it is an exchange medium rather than a database medium. There were some early attempts to tightly integrate XML and databases. Because databases do supply structure and identification to data as does XML, the value-add of XML-database integration is limited to applying hierarchical structure. On the other hand, if data is to be stored as a blob, then XML makes sense. Another problem that he points out about XML is that business rules cannot be expressed in XML schemas.

Finally, raw XML programming and debugging can be challenging. Therefore, UPS’s enterprise customers are starting to explore the code generators and embedded facilities to be found in .NET and BEA. However, hand coding by experienced in-house engineers is a must for the high availability, scalability, and performance that UPS requires for the UPS OnLine Tools.

XML at EMI Music[edit]

How is it used?

EMI Music Germany GmbH & Co. KG, a famous German record label, displays information about the artists it is affiliated with on its website. Visitors are able to explore all their audio or video productions. The whole website consists of nearly 20,000 pages that contain information about artists and their products (CD, DVD, LP). Everything is properly linked and systematically grouped.

After all, there is data to be provided for every artist, albums, samples, pictures, descriptions or article codes. The site is updated on a daily basis and is subject to change by a web editor whenever it’s necessary. Now this is a fairly complex and large amount of data to be handled.

This is where XML comes into play. The data, which is stored in a database, has been transformed into XML code. Now an XSLT stylesheet converts this data into HTML code, which can be easily read by any web browser (e.g. Internet Explorer or Firefox).

What's the benefit?

The advantage of XML is that the programming effort is considerably lower as compared to other formats. This is because XML lies at the point of intersection of XSLT and HTML.

It’s also no problem for the web editor to update the website. Using XML makes it easy for the person in charge to deal with this large amount of data.

Going beyond… On the basis of the XML scripts thus far produced by EMI Music, the company could easily produce a PDF-formatted catalog or design i-Mode pages for the current mobile phone generation. Thanks to XML, this can be done with little extra effort.

A brief history of XML[edit]

In the late 60s Charles Goldfarb, Raymond Lorie and Edward Mosher all working for IBM started to develop GML (Generalized Markup Language), a text formatting language. The language was successfully applied for internal documentation procedures. As it used to be common, the document editing was performed in the batch-mode. GenCode, another procedure to define generic formatting codes for the typesetting systems of various software producers, was developed by the GCA (Graphic Communications Association) at about the same time. Both of these technologies, GML syntactically and GenCode semantically, served as basis for the development of SGML (Standard Generalized Markup Language). The process of standardization started at the U.S. Standardization institute ANSI in the early 80s and in 1986 SGML finally passed as ISO standard ISO2879:1986.

SGML is reckoned to be a complex and comprehensive language (the specification extends 500 pages). However, the success of HTML (Hyper Text Markup Language) proved that the concepts of SGML were appropriate. SGML-based HTML was developed by Tim Berners-Lee in Geneva, in the early 90s in order to illustrate and link documents in the Internet. Meanwhile, HTML developed as the most successful format for all electronical documents. The Internet was originally designed as a space for human-human and human-machine communication but lately machine-machine communication has gained tremendous importance, putting a completely new challenge on the computer languages used.

HTML is a descriptive language for the presentation of documents. The main focus is on the presentation, meaning that an HTML-document mixes the presented data and its formatting instruction. A human being may recognize the displayed semantic by means of the presentation and the context meaning; a machine or (better-said) software is unable to.

In 1996 a team under the guidance of Jos Bosak attending the W3C-consortium was established to make SGML web-suitable. The result was a 30-page specification, which received in February 1998 the status of a "W3C-recommendation" and was named "Extensible Markup Language (XML)".

The most important goals developing XML were:

  • XML should be compatible with SGML
  • XML should be easy to use in the Internet
  • The number of optional characteristics should be minimized
  • XML-documents should be easy to generate and human-readable
  • XML should be supported by a variety of application
  • It should be easy to write programs for XML
  • XML should be put into practice on time

In the terminology of markup languages, a description formulated in XML is called a XML-document, albeit the content has nothing to do with text processing.

Why is this book not an XML document?[edit]

If you have accepted the ideas presented in this chapter, the question is very pertinent. The simple answer is that we have been unable to find the technology to support the creation of an open text book in XML. We need several pieces of technology

  • An XML language for describing a book. DocBook is such a language, but the structure of a book is quite complex, and DocBook (reflecting this complexity) cannot be quickly mastered
  • A Wiki that works with a language such as DocBook
  • A XML stylesheet that converts XML into HTML for displaying the book's content

There is a project to create WikiMl (Wiki MarkupLanguage), and this might be used at some point.

References[edit]

Initiating author Richard T. Watson, University of Georgia



A single entity



Previous Chapter Next Chapter
Introduction to XML Basic data structures




Learning objectives


  • introduce XML documents, schemas, and stylesheets
  • describe and create an XML document
  • describe and create an XML schema
  • describe and create an XML stylesheet


Introduction[edit]

In this chapter, we start to practice working with XML using XML documents, schemas, and stylesheets. An XML document organizes data and information in a structured, hierarchical format. An XML schema provides standards and rules for the structure of a given XML document. An XML schema also enables data transfer. An XSL (XML stylesheet) allows unique presentations of the material found within an XML document.

In the first chapter, Introduction to XML, you learned what XML is, why it is useful, and how it is used. So, now you want to create your very own XML documents. In this chapter, we will show you the basic components used to create an XML document. This chapter is the foundation for all subsequent chapters--it is a little lengthy, but don't be intimidated. We will take you through the fundamentals of XML documents.


This chapter is divided into three parts:

  • XML Document
  • XML Schema
  • XML Stylesheets (XSL)


As you learned in the previous chapter, the XML Schema and Stylesheet are essentially specialized XML Documents. Within each of these three parts we will examine the layout and components required to create the document. There are links at the end of the XML document, schema, and stylesheet sections that show you how to create the documents using an XML editor. At the bottom of the page there is a link to Exercises for this chapter and a link to the Answers.

The first thing you will need before starting to create XML documents is a problem--something you want to solve by using XML to store and share data or information. You need some entity you can collect information about and then access in a variety of formats. So, we created one for you.

To develop an XML document and schema, start with a data model depicting the reality of the actual data that is exchanged. Once a high fidelity model has been created, the data model can be readily converted to an XML document and schema. In this chapter, we start with a very simple situation and in successive chapters extend the complexity to teach you more features of XML.

Our starting point is a single entity, CITY, which is shown in the following figure. While our focus is on this single entity, to map CITY to an XML schema, we need to have an entity that contains CITY. In this case, we have created TOURGUIDE. Think of a TOURGUIDE as containing many cities, and in this case TOURGUIDE has no attributes nor an identifier. It is just a container for data about cities.


Exhibit 1: Data model - Tourguide

Data Model - Tourguide.png


XML document[edit]

An XML document is a file containing XML code and syntax. XML documents have an .xml file extension.

We will examine the features & components of the XML document.


  • Prologue (XML Declaration)
  • Elements
  • Attributes
  • Rules to follow
  • Well-formed & Valid XML documents


Below is a sample XML document using our TourGuide model. We will refer to it as we describe the parts of an XML document.

Exhibit 2: XML document for city entity

  <?xml version="1.0" encoding="UTF-8"?>
  <tourGuide xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
    xsi:noNamespaceSchemaLocation='city.xsd'>
    <city>
        <cityName>Belmopan</cityName>
        <adminUnit>Cayo</adminUnit>
        <country>Belize</country>
        <population>11100</population>
        <area>5</area>
        <elevation>130</elevation>
        <longitude>88.44</longitude>
        <latitude>17.27</latitude>
        <description>Belmopan is the capital of Belize</description>
        <history>Belmopan was established following the devastation of the
           former capital, Belize City, by Hurricane Hattie in 1965. High 
           ground and open space influenced the choice and ground-breaking 
           began in 1966.  By 1970 most government offices and operations had 
           already moved to the new location.
        </history>
    </city>
    <city>
        <cityName>Kuala Lumpur</cityName>
        <adminUnit>Selangor</adminUnit>
        <country>Malaysia</country>
        <population>1448600</population>
        <area>243</area>
        <elevation>111</elevation>
        <longitude>101.71</longitude>
        <latitude>3.16</latitude>
        <description>Kuala Lumpur is the capital of Malaysia and the largest 
            city in the nation</description>
        <history>The city was founded in 1857 by Chinese tin miners and  
            preceded Klang.  In 1880 the British government transferred their 
            headquarters from Klang to Kuala Lumpur, and in 1896 it became the 
            capital of Malaysia. 
        </history>
    </city>
    <city>
        <cityName>Winnipeg</cityName>
        <adminUnit>St. Boniface</adminUnit>
        <country>Canada</country>
        <population>618512</population>
        <area>124</area>
        <elevation>40</elevation>
        <longitude>97.14</longitude>
        <latitude>49.54</latitude>
        <description>Winnipeg has two seasons. Winter and Construction.</description>
        <history>The city was founded by people at the forks (Fort Garry)
         trading in pelts with the Hudson Bay Company. Ironically, 
         The Bay was bought by America.
        </history>
    </city>
  </tourGuide>

Prologue (XML declaration)[edit]

The XML document starts off with the prologue. The prologue informs both a reader and the computer of certain specifications that make the document XML compliant. The first line is the XML declaration (and the only line in this basic XML document).

Exhibit 3: XML document - prologue

     <?xml version="1.0" encoding="UTF-8"?>

xml   =   this is an XML document
version="1.0"   =   the XML version (XML 1.0 is the W3C-recommended version)
encoding="UTF-8"   =   the character encoding used in the document - UTF 8 corresponds to 8-bit encoded Unicode characters (i.e. the standard way to encode international documents) - Unicode provides a unique number for every character.
Another potential attribute of the XML declaration:
standalone="yes"   =   the dependency of the document ('yes' indicates that the document does not require another document to complete content)

Elements[edit]

The majority of what you see in the XML document consists of XML elements. Elements are identified by their tags that open with < or </ and close with > or />. The start tag looks like this: <element attribute="value">, with a left angle bracket (<) followed by the element type name, optional attributes, and finally a right angle bracket (>). The end tag looks like this: </element>, similar to the start tag, but with a slash (/) between the left angle bracket and the element type name, and no attributes.

When there's nothing between a start tag and an end tag, XML allows you to combine them into an empty element tag, which can include everything a start tag can: <img src="Belize.gif" />. This one tag must be closed with a slash and right angle bracket (/>), so that it can be distinguished from a start tag.

The XML document is designed around a major theme, an umbrella concept covering all other items and subjects; this theme is analyzed to determine its component parts, creating categories and subcategories. The major theme and its component parts are described by elements. In our sample XML document, 'tourGuide' is the major theme; 'city' is a category; 'population' is a subcategory of 'city'; and the hierarchy may be carried even further: 'males' and 'females' could be subcategories of 'population'. Elements follow several rules of syntax that will be described in the Rules to Follow section.


We left out the attributes within the <tourGuide> start tag — that part will be explained in the XML Schema section.

Exhibit 4: Elements of the city entity XML document

  <tourGuide>
    <city>
        <cityName>Belmopan</cityName>
        <adminUnit>Cayo</adminUnit>
        <country>Belize</country>
        <population>11100</population>
        <area>5</area>
        <elevation>130</elevation>
        <longitude>88.44</longitude>
        <latitude>17.27</latitude>
        <description>Belmopan is the capital of Belize</description>
        <history>Belmopan was established following the devastation of the
           former capital, Belize City, by Hurricane Hattie in 1965. High 
           ground and open space influenced the choice and ground-breaking 
           began in 1966.  By 1970 most government offices and operations had 
           already moved to the new location.
        </history>
    </city>
  </tourGuide>


Element hierarchy[edit]

  • root element  -   This is the XML document's major theme element. Every document must have exactly one and only one root element. All other elements are contained within this one root element. The root element follows the XML declaration. In our example, <tourGuide> is the root element.
  • parent element  -   This is any element that contains other elements, the child elements. In our example, <city> is a parent element.
  • child element  -   This is any element that is contained within another element, the parent element. In our example, <population> is a child element of <city>.
  • sibling element  -   These are elements that share the same parent element. In our example, <cityName>, <adminUnit>, <country>, <population>, <area>, <elevation>, <longitude>, <latitude>, <description>, and <history> are all sibling elements.


Attributes[edit]

Attributes aid in modifying the content of a given element by providing additional or required information. They are contained within the element's opening tag. In our sample XML document code we could have taken advantage of attributes to specify the unit of measure used to determine the area and the elevation (it could be feet, yards, meters, kilometers, etc.); in this case, we could have called the attribute 'measureUnit' and defined it within the opening tag of 'area' and 'elevation'.


       <adminUnit class="state">Cayo</adminUnit>
       <adminUnit class="region">Selangor</adminUnit>


The above attribute example can also be written as:


1. using child elements

     <adminUnit>
          <class>state</class>
          <name>Cayo</name>
     </adminUnit>
     <adminUnit>
          <class>region</class>
          <name>Selangor</name>
     </adminUnit>

2. using an empty element

    <adminUnit class="state" name="Cayo" />
    <adminUnit class="region" name="Selangor" />


Attributes can be used to:

  • provide more information that is not defined in the data
  • define a characteristic of the element (size, color, style)
  • ensure the inclusion of information about an element in all instances

Attributes can, however, be a bit more difficult to manipulate and they have some constraints. Consider using a child element if you need more freedom.


Rules to follow[edit]

These rules are designed to aid the computer reading your XML document.

  • The first line of an XML document must be the XML declaration (the prologue).
  • The main theme of the XML document is established in the root element and all other elements must be contained within the opening and closing tags of this root element.
  • Every element must have an opening tag and a closing tag - no exceptions

(e.g. <element>data stuff</element>).

  • Tags must be nested in a particular order

=> the parent element's opening and closing tags must contain all of its child elements' tags; in this way, you close first the tag that was opened last:

<parentElement>
      <childElement1>data</childElement1>
      <childElement2>
              <subChildElementA>data</subChildElementA>
              <subChildElementB>data</subChildElementB>
      </childElement2>
      <childElement3>data</childElement3>
</parentElement>
  • Attribute values should have quotation marks around them and no spaces.
  • Empty tags or empty elements must have a space and a slash (/) at the end of the tag.
  • Comments in the XML language begin with "<!--" and end with "-->".


XML Element Naming Convention[edit]

Any name can be used but the idea is to make names meaningful to those who might read the document.

  • XML elements may only start with either a letter or an underscore character.
  • The name must not start with the string "xml" which is reserved for the XML specification.
  • The name may not contain spaces.
  • The ":" should not be used in element names because it is reserved to be used for namespaces (This will be covered in more detail in a later chapter).
  • The name may contain a mixture of letters, numbers, or other characters.


XML documents often have a corresponding database. The database will contain fields which correspond to elements in the XML document. A good practice is to use the naming rules of your database for the elements in the XML documents.

DTD (Document Type Definition) Validation - Simple Example[edit]

Simple Internal DTD[edit]
 <?xml version="1.0"?>
 <!DOCTYPE cdCollection [
    <!ELEMENT cdCollection (cd)>
    <!ELEMENT cd (title, artist, year)>
    <!ELEMENT title (#PCDATA)>
    <!ELEMENT artist (#PCDATA)>
    <!ELEMENT year (#PCDATA)>
 ]>
 <cdCollection>
  <cd>
    <title>Dark Side of the Moon</title>
    <artist>Pink Floyd</artist>
    <year>1973</year>
  </cd>
 </cdCollection>

Every element that will be used MUST be included in the DTD. Don’t forget to include the root element, even though you have already specified it at the beginning of the DTD. You must specify it again, in an <!ELEMENT> tag. <!ELEMENT cdCollection (cd)> The root element, <cdCollection>, contains all the other elements of the document, but only one direct child element: <cd>. Therefore, you need to specify the child element (only direct child elements need to be specified) in the parentheses. <!ELEMENT cd (title, artist, year)> With this line, we define the <cd> element. Note that this element contains the child elements <title>, <artist>, and <year>. These are spelled out in a particular order. This order must be followed when creating the XML document. If you change the order of the elements (with this particular DTD), the document won’t validate. <!ELEMENT title (#PCDATA)> The remaining three tags, <title>, <artist>, and <year> don’t actually contain other tags. They do however contain some text that needs to be parsed. You may remember from an earlier lecture that this data is called Parsed Character Data, or #PCDATA. Therefore, #PCDATA is specified in the parentheses. So this simple DTD outlines exactly what you see here in the XML file. Nothing can be added or taken away, as long as we stick to this DTD. The only thing you can change is the #PCDATA text part between the tags.

Adding complexity[edit]

There may be times when you will want to put more than just character data, or more than just child elements into a particular element. This is referred to as mixed content. For example, let’s say you want to be able to put character data OR a child element, such as the <b> tag into a <description> element:

 <!ELEMENT description (#PCDATA | b | i )*>

This particular arrangement allows us to use PCDATA, the <b> tag, or the <i> tag all at once. One particular caveat though, is that if you are going to mix PCDATA and other elements, the grouping must be followed by the asterisk (*) suffix. This declaration allows us to now add the following to the XML document (after defining the individual elements of course)

  <cd>
    <title>Love. Angel. Music. Baby</title>
    <artist>Gwen Stefani</artist>
    <year>2004</year>
    <genre>pop</genre>
    <description>
      This is a great album from former  
      <nowiki><i>No Doubt</i> singer <b>Gwen Stephani</b>.</nowiki>
    </description>
  </cd>

With attributes this is done a little differently than with elements. Please see following example:

  <cd remaster_date=”1992”>
    <title>Dark Side of the Moon</title>
    <artist>Pink Floyd</artist>
    <year>1973</year>
  </cd>

In order for this to validate, it must be specified in the DTD. Attribute content models are specified with:

 <!ATTLIST element_name attribute_name attribute_type default_value>

Let’s use this to validate our CD example:

 <!ATTLIST cd remaster_date CDATA #IMPLIED>
Choices[edit]
 <ATTLIST person gender (male|female) “female”>
Grouping Attributes for an Element[edit]

If a particular element is to have many different attributes, group them together like so:

<!ATTLIST car horn CDATA #REQUIRED
             seats CDATA #REQUIRED
     steeringwheel CDATA #REQUIRED
             price CDATA #IMPLIED>
Adding STATIC validation, for items that must have a certain value[edit]
<!ATTLIST classList   classNumber CDATA #IMPLIED
                      building (UWINNIPEG_DCE|UWINNIPEG_MAIN) "UWINNIPEG_MAIN"
                      originalDeveloper CDATA #FIXED "Khal Shariff">

Suffixes=[edit]

So what happens with our last example with the CD collection, when we want to add more CDs? With the current DTD, we cannot add any more CDs without getting an error. Try it and see. When you specify a child element (or elements) the way we did, only one of each child element can be used. Not very suitable for a CD collection is it? We can use something called suffixes to add functionality to the <!ELEMENT> tag. Suffixes are added to the end of the specified child element(s). There are 3 main suffixes that can be used:

  • ( No suffix ): Only 1 child can be used.
  • ( + ): One or more elements can be used.
  • ( * ): Zero or more elements can be used.
  • ( ? ): Zero or one element may be used.
Validating for multiple children with a DTD[edit]

So in the case of our CD collection XML file, we can add more CDs to the list by adding a + suffix:

<!ELEMENT cd_collection(cd+)>
Using more internal formatting tags[edit]

Bold tags, B's for example are also defined in the DTD as elements, that are optional like thus:

<ELEMENT notes (#PCDATA | b | i)*>
   <!ELEMENT b (#PCDATA)*>
   <!ELEMENT i (#PCDATA)*>
]>

_______________

<classList classNumber="303" building="UWINNIPEG_DCE" originalDeveloper="Khal Shariff">
 <student>
   <firstName>Kenneth
   </firstName>
   <lastName>Branaugh
   </lastName>
   <studentNumber>
   </studentNumber>
   <notes><b>Excellent </b>, Kenneth is doing well.
   </notes>
etc

Case Study on BMEcat[edit]

One of the first major national projects for the use of XML as a B2B exchange format was initiated by the federal association for material management, purchasing and logistics (BME) in cooperation with leading German companies, e.g. Bayer, BMW, SAP and Siemens. They all created a standard for the exchange of product catalogues. This project was named BMEcat. The result of this initiative is a DTD collection for the description of product catalogues and related transactions (new catalogue, updating of product data and updating of prices).

Companies operating in the electronic commerce (suppliers, purchasing companies and market places) exchange increasingly large amounts of data. They quickly reach their limits here by the variety of data exchange formats. The BMEcat solution creates a basis for a straightforward transfer of catalogue data from various data formats. This lays the foundation to bringing forward the goods traffic through the Internet in Germany. The use of the BMEcat reduces the costs for all parties as standard interfaces can be used.

The XML-based standard BMEcat was successfully implemented in many projects. Nowadays a variety of companies applies BMEcat and use it for the exchange of their product catalogs in this established standard.


A BMEcat catalogue (Version 1.2) consists of the following main elements:

CATALOG This element contains the essential information of a shopping catalog, e.g. language version and validity. BMEcat expects exactly one language per catalog.

SUPPLIER This element includes identification and address of the catalog suppliers. BMEcat expects exactly one supplier per catalog.

BUYER This element contains the name and address of the catalogue recipient. BMEcat expects no more than one recipient per catalog.

AGREEMENT This element contains one or more framework agreement IDs associated with the appropriate validity period. BMEcat expects all prices of a catalogue belonging to the contract mentioned above.

CLASSIFICATION SYSTEM This element allows the full transfer of one or more classification systems, including feature definitions and key words.

CATALOG GROUP SYSTEM This element originates from version 1.0. It is mainly used for the transfer of tree-structures which facilitate the navigation of a user in the target system (Browser).

ARTICLE (since 2005 PRODUCT) This element represents a product. It contains a set of standard attributes.

ARTICLE PRICE (since 2005 PRODUCT PRICE) This element represents a price. The support of different pricing models is very powerful in comparison with other exchange formats. Season prices, country prices, different currencies and different validity periods, etc. will be supported.

ARTICLE FEATURE (since 2005 PRODUCT FEATURE) This element allows the transfer of characteristic values. You can either record predefined group characteristics or individual product characteristics.

VARIANT This element allows listing of product variants, without having to duplicate them. However, the variations of BMEcat only apply to individual changes in value, leading to a change of Article ID. Otherwise there can’t exist any dependences on other attributes (especially at prices).

MIME This element includes any number of additional documents such as product images, data sheets, or websites.

ARTICLE REFERENCE (since 2005 REFERENCE PRODUCT) This element allows cross-referencing between articles within a catalogue as well as between catalogues. These references may used restrictedly for mapping product bundles.

USER DEFINED EXTENSION This element enables transportation of data at the outside the BMEcat standards. The transmitter and receiver have to be coordinated.

You can find a typical BMEcat file here.

ONLINE Validator[edit]

http://www.stg.brown.edu/service/xmlvalid/

Well-formed and valid XML[edit]

Well-formed XML  -  An XML document that correctly abides by the rules of XML syntax.

Valid XML  -  An XML document that adheres to the rules of an XML schema (which we will discuss shortly). To be valid an XML document must first be well-formed.


A Valid XML Document must be Well-formed. But, a Well-formed XML Document might not be valid - in other words, a well-formed XML document, that meets the criteria for XML syntax, might not meet the criteria for the XML schema, and will therefore be invalid.

For example, think of the situation where your XML document contains the following (for this schema):

  <city>
    <cityName>Boston</cityName>
    <country>United States</country>
    <adminUnit>Massachusetts</adminUnit>
  :
  :
  :
  </city>

Notice that the elements do not appear in the correct sequence according to the schema (cityName, adminUnit, country). The XML document can be validated (using validation software) against its declared schema – the validation software would then catch the out of sequence error.


Using an XML Editor[edit]

Check chapter XML Editor for instructions on how to start an XML editor. Once you have followed the steps to get started you can copy the code in the sample XML document and paste it into the XML editor. Then check your results. Is the XML document well-formed? Is the XML document valid? (you will need to have copied and pasted the schema in order to validate - we will look at schemas next)


XML schema[edit]

An XML schema is an XML document. XML schemas have an .xsd file extension.

An XML schema is used to govern the structure and content of an XML document by providing a template for XML documents to follow in order to be valid. It is a guide for how to structure your XML document as well as indicating your XML document's components (elements and attributes - and their relationships). An XML editor will examine an XML document to ensure that it conforms to the specifications of the XML schema it is written against - to ensure it is valid.

XML schemas engender confidence in data transfer. With schemas, the receiver of data can feel confident that the data conforms to expectations. The sender and the receiver have a mutual understanding of what the data represent.

Because an XML schema is an XML document, you use the same language - standard XML markup syntax - with elements and attributes specific to schemas.


A schema defines:

  • the structure of the document
  • the elements
  • the attributes
  • the child elements
  • the number of child elements
  • the order of elements
  • the names and contents of all elements
  • the data type for each element

For more detailed information on XML schemas and reference lists of: Common XML Schema Primitive Data Types, Summary of XML Schema Elements, Schema Restrictions and Facets for data types, and Instance Document Attributes, click on this wikibook link => http://en.wikibooks.org/wiki/XML_Schema


Schema reference=[edit]

This is the part of the XML Document that references an XML Schema:

Exhibit 5: XML document's schema reference

  <tourGuide
      xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
      xsi:noNamespaceSchemaLocation='city.xsd'>

This is the part we left out when we described the root element in the basic XML document from the previous section. The additional attributes of the root element <tourGuide> reference the XML schema (it is the schemaLocation attribute).

xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'  -  references the W3C Schema-instance namespace
xsi:noNamespaceSchemaLocation='city.xsd'  -  references the XML schema document (city.xsd)

Schema document[edit]

Below is a sample XML schema using our TourGuide model. We will refer to it as we describe the parts of an XML schema.

Exhibit 6: XML schema document for city entity

  <?xml version="1.0" encoding="UTF-8"?>
  <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
   elementFormDefault="unqualified">  
    <xsd:element name="tourGuide">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="city" type="cityDetails" minOccurs = "1" maxOccurs="unbounded" />
            </xsd:sequence>
        </xsd:complexType>
     </xsd:element>
     <xsd:complexType name="cityDetails">
        <xsd:sequence> 
             <xsd:element name="cityName" type="xsd:string"/>
             <xsd:element name="adminUnit" type="xsd:string"/>
             <xsd:element name="country" type="xsd:string"/>
             <xsd:element name="population" type="xsd:integer"/>
             <xsd:element name="area" type="xsd:integer"/>
             <xsd:element name="elevation" type="xsd:integer"/>
             <xsd:element name="longitude" type="xsd:decimal"/>
             <xsd:element name="latitude" type="xsd:decimal"/>
             <xsd:element name="description" type="xsd:string"/>
             <xsd:element name="history" type="xsd:string"/>
         </xsd:sequence>
     </xsd:complexType>
  </xsd:schema>
  <!--
    Note: Latitude and Longitude are decimal data types.
    The conversion is from the usual form (e.g., 50º 17' 35")
    to a decimal by using the formula degrees+min/60+secs/3600.
  -->


Prolog[edit]

Remember that the XML schema is essentially an XML document and therefore must begin with the prolog, which in the case of a schema includes:

  • the XML declaration
  • the schema element declaration


The XML declaration:

  <?xml version="1.0" encoding="UTF-8"?>

The schema element declaration:

<xsd:schema xmlns:xsd="<nowiki>http://www.w3.org/2001/XMLSchema</nowiki>" elementFormDefault="unqualified">

The schema element is similar to a root element - it contains all other elements in the schema.

Attributes of the schema element include:

xmlns  -  XML NameSpace - the URL for the site that describes the XML elements and data types used in the schema.

You can find more about namespaces here => Namespace.

xmlns:xsd  -  All the elements and attributes with the 'xsd' prefix adhere to the vocabulary designated in the given namespace.

elementFormDefault  -  elements from the target namespace are either required or not required to be qualified with the namespace prefix. This is mostly useful when more than one namespace is referenced. In this case, 'elementFormDefault' must be qualified, because you must indicate which namespace you are using for each element. If you are referencing only one namespace, then 'elementFormDefault' can be unqualified. Perhaps, using qualified as the default is most prudent, this way you do not accidentally forget to indicate which namespace you are referencing.

Element declarations[edit]

Define the elements in the schema.

Include:

  • the element name
  • the element data type (optional)

Basic element declaration format: <xsd:element name="name" type="type">

Simple type[edit]

declares elements that:

  • do NOT have Child Elements
  • do NOT have Attributes

example: <xsd:element name="cityName" type="xsd:string" />

Default Value

If an element is not assigned a value then the default value is assigned.

example: <xsd:element name="description" type="xsd:string" default="really cool place to visit!" />

Fixed Value

An attribute that is defined as fixed must be empty or contained the specified fixed value. No other values are allowed.

example: <xsd:element name="description" type="xsd:string" '''fixed="you must visit this place - it is awesome!"''' />

Complex type[edit]

declares elements that:

  • can have Child Elements
  • can have Attributes

examples:

1. The root element 'tourGuide' contains a child element 'city'. This is shown here:

Nameless complex type

     <xsd:element name="tourGuide">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="city" type="cityDetails" minOccurs = "1" maxOccurs="unbounded" />
            </xsd:sequence>
        </xsd:complexType>
     </xsd:element>

Occurrence Indicators:

  • minOccurs = the minimum number of times an element can occur (here it is 1 time)
  • maxOccurs = the maximum number of times an element can occur (here it is an unlimited number of times, 'unbounded')


2. The parent element 'city' contains many child elements: 'cityName', 'adminUnit', 'country', 'population', etc. Why does this complex element set not start with the line: <xsd:element name="city" type="cityDetails">? The element 'city' was already defined above within the complex element 'tourGuide' and it was given the type, 'cityDetails'. This data type, 'cityDetails', is utilized here in identifying the sequence of child elements for the parent element 'city'.

Named Complex Type - and therefore can be reused in other parts of the schema

   <xsd:complexType name="cityDetails">
        <xsd:sequence>
             <xsd:element name="cityName" type="xsd:string"/>
             <xsd:element name="adminUnit" type="xsd:string"/>
             <xsd:element name="country" type="xsd:string"/>
             <xsd:element name="population" type="xsd:integer"/>
             <xsd:element name="area" type="xsd:integer"/>
             <xsd:element name="elevation" type="xsd:integer"/>
             <xsd:element name="longitude" type="xsd:decimal"/>
             <xsd:element name="latitude" type="xsd:decimal"/>
             <xsd:element name="description" type="xsd:string"/>
             <xsd:element name="history" type="xsd:string"/>
         </xsd:sequence>
   </xsd:complexType>

The <xsd:sequence> tag indicates that the child elements must appear in the order, the sequence, specified here.

Compare the sample XML Schema and the sample XML Document - try to observe patterns in the code and how the XML Schema sets up the XML Document.


3. Elements that have attributes are also designated as complex type.

a. this XML Document line: <adminUnit class="state" name="Cayo" /> would be defined in the XML Schema as:

     <xsd:element name="adminUnit">
          <xsd:complexType>
               <xsd:attribute name="class" type="xsd:string" />
               <xsd:attribute name="name" type="xsd:string" />
          </xsd:complexType>
     </xsd:element>

b. this XML Document line: <adminUnit class="state">Cayo</adminUnit> would be defined in the XML Schema as:

     <xsd:element name="adminUnit">
          <xsd:complexType>
               <xsd:simpleContent>
             		<xsd:extension base="xsd:string">
                                <xsd:attribute name="class" type="xsd:string" />
                        </xsd:extension>
	       </xsd:simpleContent>
          </xsd:complexType>
     </xsd:element>

Attribute declarations[edit]

Attribute declarations are used in complex type definitions. We saw some attribute declarations in the third example of the Complex Type Element.

<xsd:attribute name="class" type="xsd:string" />


Data type declarations[edit]

These are contained within element and attribute declarations as: type=" " .

Common XML Schema Data Types

XML schema has a lot of built-in data types. The most common types are:

string a string of characters
decimal a decimal number
integer an integer
boolean the values true or false or 1 or 0
date a date, the date pattern can be specified such as YYYY-MM-DD
time a time of day, the time pattern can be specified such as HH:MM:SS
dateTime a date and time combination
anyURI if the element will contain a URL


For an entire list of built-in simple data types see http://www.w3.org/TR/xmlschema-2/#built-in-datatypes



Using an XML Editor => XML Editor

This link will take you to instructions on how to start an XML editor. Once you have followed the steps to get started you can copy the code in the sample XML schema document and paste it into the XML editor. Then check your results. Is the XML schema well-formed? Is the XML schema valid?


XML stylesheet (XSL)[edit]

An XML Stylesheet is an XML Document. XML Stylesheets have an .xsl file extension.

The eXtensible Stylesheet Language (XSL) provides a means to transform and format the contents of an XML document for display. Since an XML document does not contain tags a browser understands, such as HTML tags, browsers cannot present the data without a stylesheet that contains the presentation information. By separating the data and the presentation logic, XSL allows people to view the data according to their different needs and preferences.

The XSL Transformation Language (XSLT) is used to transform an XML document from one form to another, such as creating an HTML document to be viewed in a browser. An XSLT stylesheet consists of a set of formatting instructions that dictate how the contents of an XML document will be displayed in a browser, with much the same effect as Cascading Stylesheets (CSS) do for HTML. Multiple views of the same data can be created using different stylesheets. The output of a stylesheet is not restricted to a browser.


During the transformation process, XSLT analyzes the XML document and converts it into a node tree – a hierarchical representation of the entire XML document. Each node represents a piece of the XML document, such as an element, attribute or some text content. The XSL stylesheet contains predefined “templates” that contain instructions on what to do with the nodes. XSLT will use the match attribute to relate XML element nodes to the templates, and transform them into the resulting document.

Exhibit 7: XML stylesheet document for city entity

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html"/> 
    <xsl:template match="/">
        <html>
            <head>
                <title>Tour Guide</title>
            </head>
            <body>
                <h2>Cities</h2>
                <xsl:apply-templates select="tourGuide"/>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="tourGuide">
        <xsl:for-each select="city">
            <br/><xsl:value-of select="continentName"/><br/>
            <xsl:value-of select="cityName"/><br/>
            <xsl:text>Population: </xsl:text>
            <xsl:value-of select='format-number(population, "##,###,###")'/><br/>
            <xsl:value-of select="country"/>
            <br/>
        </xsl:for-each>     
    </xsl:template>
</xsl:stylesheet>


The output of the city.xsl stylesheet in Table 2-3 will look like the following:

Cities

Europe
Madrid
Population: 3,128,600
Spain

Asia
Shanghai
Population: 18,880,000

China


You will notice that the stylesheet consists of HTML to inform the media tool (a web browser) of the presentation design. If you do not already know HTML this may seem a little confusing. Online resources such as the W3Schools tutorials can help with the basic understanding you will need =>(http://www.w3schools.com/html/default.asp).

Incorporated within the HTML is the XML that supplies the data, the information, contained within our XML document. The XML of the stylesheet indicates what information will be displayed and how. So, the HTML constructs a display and the XML plugs in values within that display. XSL is the tool that transforms the information into presentational form, but at the same time keeps the meaning of the data.

XML at Bertelsmann - a case study

The German Bertelsmann Inc. is a privately owned media conglomerate operating in 56 countries. It has interests in such businesses as TV broadcast (RTL), magazine (Gruner & Jahr), books (Random House) etc. In 2005 its 89 000 employees generated 18 billion € of revenue.

A major concern of such a diversified business is utilizing synergies. Management needs to make sure the Random House employees don´t spend time and money figuring out what RTL TV journalists already have come up with.

Thus knowledge management based on IT promises huge time savings. Consequently Bertelsmann in 2002 started a project called BeCom. BeCom´s purpose was to enable the different Bertelsmann businesses to use the same data for their different media applications. XML is crucial in this project, because it allows for separating data (document) from presentation (style sheet). Thus data can both be examined statistically and be modified to fit different media like TV and newspapers.

Statistical XML data management for example enables employees to benefit from CBR (Case Based Reasoning). CBR allows a Bertelsmann employee who searches for specific content to profit from previous search findings of other Bertelsmann employees, thus gaining info which is much more contextual than isolated research results only. Besides XML data management, Bertelsmann TV and Book units can apply this optimized data in their specific media using a variety of lay-out applications like 3B2 or QuarkXPress.


Prolog[edit]

  • the XML declaration;
  • the stylesheet declaration;
  • the namespace declaration;
  • the output document format.
 <?xml version="1.0" encoding="UTF-8"?>
 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="html"/>


The XML declaration

 <?xml version="1.0" encoding="UTF-8"?>


The stylesheet & namespace declarations

     <xsl:stylesheet version="1.0" xmlns:xsl="<nowiki>http://www.w3.org/1999/XSL/Transform</nowiki>">
  • identifies the document as an XSL style sheet;
  • identifies the version number;
  • refers to the W3C XSL namespace - the URL for the site that describes the XML elements and data types used in the schema. You can find more about namespaces here => Namespace. Every time the xsl: prefix is used it references the given namespace.


The output document format

      <xsl:output method="html"/>

This element designates the format of the output document and must be a child element of <xsl:stylesheet>

Templates[edit]

The <xsl:template> element is used to create templates that describe how to display elements and their content. Above, in the XSL introduction, we mentioned that XSL breaks up the XML document into nodes and works on individual nodes. This is done with templates. Each template within an XSL describes a single node. To identify which node a given template is describing, use the 'match' attribute. The value given to the 'match' attribute is called a pattern. Remember: (node tree – a hierarchical representation of the entire XML document. Each node represents a piece of the XML document, such as an element, attribute or some text content). Wherever there is branching in the node tree, there is a node. <xsl:template> defines the start of a template and contains rules to apply when a specified node is matched.


the match attribute

   <xsl:template match="/">

This template match attribute associates the XML document root (/), the whole branch of the XML source document, with the HTML document root. Contained within this template element is the typical HTML markup found at the beginning of any HTML document. This HTML is written to the output. The XSL looks for the root match and then outputs the HTML, which the browser understands.

   <xsl:template match="tourGuide">

This template match attribute associates the element 'tourGuide' with the display rules described within this element.


Elements[edit]

Elements specific to XSL:

XSL Element Meaning
(from our sample XSL)
<xsl:text> Prints the actual text found between this element's tags
<xsl:value-of> This element is used with a 'select' attribute to look up the value of the node selected and plug it into the output.
<xsl:for-each> This element is used with a 'select' attribute to handle elements that repeat by looping through all the nodes in the selected node set.
<xsl:apply-templates> This element will apply a template to a node or nodes. If it uses a 'select' attribute then the template will be applied only to the selected child node(s) and can specify the order of child nodes. If no 'select' attribute is used then the template will be applied to the current node and all its child nodes as well as text nodes.

For more XSL elements => http://www.w3schools.com/xsl/xsl_w3celementref.asp .

Language-Specific Validation and Transformation Methods[edit]

PHP Methods of XML Dom Validation[edit]

Using the DOM DocumentObjectModel to validate XML and with a DTD DocumentTypeDeclaration and the PHP language on a server and more http://wiki.cc/php/Dom_validation

Browser Methods[edit]

Place this line of code in your .xml document after the XML declaration (prologue).

 <?xml-stylesheet type="text/xsl" href="tourGuide.xsl"?>

PHP XML Production[edit]

 <?php
 $xmlData = "";
 mysql_connect('localhost','root','')
 or die('Failed to connect to the DBMS');
 // make connection to database
 mysql_select_db('issd')
 or die('Failed to open the requested database');
 $result = mysql_query('SELECT * from students') or die('Query to like get the records failed');
 if (mysql_num_rows($result)<1){
    die ('');
 }
 $xmlString = "<classlist>\n";
 $xmlString .= "\t<student>";
 while ($row = mysql_fetch_array($result)) {
         $xmlString .=  "
          \t<firstName>
              ".$row['firstName']."
           </firstName>\n
            \t<lastName>
              ".$row['lastName']."
          \t</lastName>\n";         
      }
 $xmlString .= "</student>\n";
 $xmlString .= "</classlist>";
 echo $xmlString;
 $myFile = "classList.xml"; //any file
 $fh = fopen($myFile, 'w') or die("can't open file"); //create filehandler
 fwrite($fh, $xmlString); //write the data into the file
 fclose($fh); //ALL DONE!
 ?>

PHP Methods of XSLT Transformation[edit]

This one is good for PHP5 and wampserver (latest). Please ensure that *xsl* is NOT commented out in the php.ini file.

 <?php
 // Load the XML source
 $xml = new DOMDocument;
 $xml->load('tourguide.xml');
 $xsl = new DOMDocument;
 $xsl->load('tourguide.xsl');
 // Configure the transformer
 $proc = new XSLTProcessor;
 $proc->importStyleSheet($xsl); // attach the xsl rules
 echo $proc->transformToXML($xml);
 ?>


Example 1, Using within PHP itself (use phpInfo() function to check XSLT extension; enable if needed) This example might produce XHTML. Please note it could produce anything defined by the XSL.

 <?php
 $xhtmlOutput = xslt_create();
 $args = array();
 $params = array('foo' => 'bar');
 $theResult = xslt_process(
                         $xhtmlOutput,
                         'theContentSource.xml',
                         'theTransformationSource.xsl',
                         null,
                         $args,
                         $params
                        );
 xslt_free($xhtmlOutput); // free that memory
 // echo theResult or save it to a file or continue processing (perhaps instructions)
 ?>

Example 2:

 <?php
 if (PHP_VERSION >= 5) {
   // Emulate the old xslt library functions
   function xslt_create() {
       return new XsltProcessor();
   }
   function xslt_process($xsltproc,
                         $xml_arg,
                         $xsl_arg,
                          $xslcontainer = null,
                         $args = null,
                         $params = null) {
       // Start with preparing the arguments
       $xml_arg = str_replace('arg:', '', $xml_arg);
       $xsl_arg = str_replace('arg:', '', $xsl_arg);
       // Create instances of the DomDocument class
       $xml = new DomDocument;
       $xsl = new DomDocument;
       // Load the xml document and the xsl template
       $xml->loadXML($args[$xml_arg]);
       $xsl->loadXML($args[$xsl_arg]);
       // Load the xsl template
       $xsltproc->importStyleSheet($xsl);
       // Set parameters when defined
       if ($params) {
           foreach ($params as $param => $value) {
               $xsltproc->setParameter("", $param, $value);
           }
       }
       // Start the transformation
       $processed = $xsltproc->transformToXML($xml);
       // Put the result in a file when specified
       if ($xslcontainer) {
           return @file_put_contents($xslcontainer, $processed);
       } else {
           return $processed;
       }
   }
   function xslt_free($xsltproc) {
       unset($xsltproc);
   }
 }
 $arguments = array(
   '/_xml' => file_get_contents("xml_files/201945.xml"),
   '/_xsl' => file_get_contents("xml_files/convertToSql_new2.xsl")
 );
 $xsltproc = xslt_create();
 $html = xslt_process(
   $xsltproc,
   'arg:/_xml',
   'arg:/_xsl',
   null,
   $arguments
 );
 xslt_free($xsltproc);
 print $html;
 ?>

PHP file writing code[edit]

 $myFile = "testFile.xml"; //any file
 $fh = fopen($myFile, 'w') or die("can't open file"); //create filehandler
 $stringData = "<foo>\n\t<bar>\n\thello\n"; // get a string ready to write
 fwrite($fh, $stringData); //write the data into the file
 $stringData2 = "\t</bar>\n</foo>";
 fwrite($fh, $stringData2); //write more data into the file
 fclose($fh); //ALL DONE!

XML Colors[edit]

For use in your stylesheet: these colors can be used for both background and font

http://www.w3schools.com/html/html_colors.asp

http://www.w3schools.com/html/html_colorsfull.asp

http://www.w3schools.com/html/html_colornames.asp


Using an XML Editor => XML Editor

This link will take you to instructions on how to start an XML editor. Once you have followed the steps to get started you can copy the code in the sample XML stylesheet document and paste it into the XML editor. Then check your results. Is the XML stylesheet well-formed?


XML at Thomas Cook - a case study[edit]

As the leading travel company and most widely recognized brands in the world, Thomas Cook works across the travel value chain - airlines, hotels, tour operators, travel and incoming agencies, providing its customers with the right product in all market segments across the globe. Employing over 11,000 staff, the Group has 33 tour operators, around 3,600 travel agencies, a fleet of 80 aircraft and a workforce numbering some 26,000. Thomas Cook operates throughout a network of 616 locations in Europe and overseas. The company is now the second largest travel group in Europe and the third largest in the world.

As Thomas Cook sells other companies´ products, ranging from packaged holidays to car hires, it needs to regularly change its online brochure. Before Thomas Cook started using XML, it put information into HTML format, and would take upto six weeks to get an online brochure up and running online. XML helps do this job in about three days. This helps provide all of Thomas Cook´s current and potential customers and its various agencies in different geographical locations with updated information, instead of having to wait six weeks for new information to be released.

XML allows Thomas Cook to put content information into a single database, which can be re-used as many times as required. "We did not want to keep having to re-do the same content, we wanted the ability to switch it on immediately," said Gwyn Williams, who is content manager at Thomascook.com. "This has brought internal benefits such as being able to re-deploy staff into more value added areas." Thomascook.com currently holds 65,000 pages of brochure and travel guide information and an online magazine in XML format.

Thomas Cook started using XML at a relatively early stage. As Thomas Cook has a large database, the early use of XML will stand it in good stead. At some point, the databases will have to be incorporated into XML, and it is reported that XML databases are quicker than conventional databases, giving Thomas Cook a slight competitive advantage against those who do not use XML.

Thomas Cook has found that this can lead to substantial cost reductions as well as consistency of information across all channels. By implementing a central content management system to facilitate brochure production and web publications, they have centralized the production, maintenance and distribution of content across their brands and channels.

Summary[edit]

From the previous chapter Introduction to XML, you have learned the need for data exchange and the usefulness of XML in data exchange. In this chapter, you have learned more about the three major XML files: the XML document, the XML schema, and the XML stylesheet. You learned the correct documentation required for each type of file. You learned basic rules of syntax applicable for all XML documents. You learned how to integrate the three types of XML documents. And you learned the definition and distinction between a well-formed document and a valid document. By following the XML Editor links, you were able to see the results of the sample code and learn how to use an XML Editor.

Below are Exercises and Answers for further practice. Good Luck!


Definitions[edit]

XML
SGML
Dan Connelly
RSS
XML Declaration
parent
child
sibling
element
attribute
*Well-formed XML
PCDATA

Exercises[edit]

Exercise 1.

a)Using "tourguide" above as a good example, create an XML document whose root is "classlist" . This CLASSLIST is created from a starting point of single entity, STUDENT. Any number of students contain elements: firstname, lastname, emailaddress.

Answers[edit]



Basic data structures



Previous Chapter Next Chapter
A single entity The one-to-many relationship




Learning objectives
  • introduce the concept and uses of basic data structures
  • describe how XML may be used to represent basic data structures
  • enumerate common technical considerations

Introduction[edit]

In reviewing the four central problems in data management, (capture, storage, retrieval, and exchange) the typical user of XML encounters recurring fundamental structural patterns that apply to all sorts of data throughout the storage and exchange phases. These patterns recur consistently because their use transcends the particular contexts in which the underlying data are processed. We call these patterns "data structures" (or datatypes).

In this section, we discuss a few of the most fundamental "basic data structures" and explain why they are useful, as well as how to work with them using XML.

We start our introduction with a simple example. Consider an ordinary grocery shopping list for a single-person household.

Introductory Shopping List Example:

   Andy's shopping list:
   * eggs
   * cough syrup(pick up for granny)
   * orange juice  
   * bread
   * laundry detergent **don't forget this**

When analyzing aspects of the information contained in this shopping list, we can make some basic generalizations:

  • Portability: the shopping list can be represented and transferred easily. If necessary, it could be stored in a database and processed by custom-designed software, but it could just as easily be written on a scrap of paper;
  • Comprehensibility: the shopping list is readily understood by its intended audience (in this instance, the sole person who wrote the list) and therefore needs no additional information or structure in order to be immediately usable;
  • Adaptability: if any changes become necessary (such as additions or removals to the list) there is an existing and well-known methodology for accomplishing this (e.g., in the case of a handwritten list, simply write down new entries or cross out unwanted entries).

The fundamental concept of basic data structures[edit]

Given that we have the previous example for background, we can now introduce the fundamental concept of "basic data structures".

Information icon.svg The concept of "basic data structures" describes the fundamental conventions we use to store our data, so that we can more easily exchange our data. When we follow these fundamental conventions, we help to ensure the portability, comprehensibility and adaptability of information.

Basic data structures defined[edit]

Now that we have introduced our concept of data structures, we can start with some concrete definitions, and then review those definitions in the context of our shopping list example.

Overview of "core" data structures[edit]

The following terms define some "core" data structures[1] that we use throughout this chapter. This list is ordered in ascending degrees of complexity:

  • SimpleBoolean: Any value capable of being expressed as either "True" or "False".
  • SimpleString: A contiguous sequence of characters, including both alphanumeric and non-alphanumeric.
  • SimpleSequence: An enumeration of items generally accessible by numeric indexing.
  • Name-value pair: An arbitrary singular name attached to a singular value.
  • SimpleDictionary: An enumeration of items generally accessible by alphanumeric indexing.
  • SimpleTable: An ordered arrangement of columns and rows. A SimpleTable can be classified as a "composite" data structure (e.g., SimpleSequence where each item in the sequence is a single SimpleDictionary).

An important point to remember while reviewing these "core" data structures is that they are elemental and complementary. That is, the core structures, when used in combination, can form even more complex structures. Once the reader comes to understand this fact, it will become apparent that there is no conceivable application or data specification that cannot be wholly described in XML using nothing more than these "core" data structures.

Information icon.svg Once we understand the "core" data structures, we can use them in combination to represent any conceivable kind of structured information.

Now review the "Introductory Shopping List Example" above. When we compare it with the "core" data structures that we've just defined, we can make some fairly straightforward observations:

  • The entire shopping list cannot be represented using a SimpleBoolean data structure, because the information is more complex than either "True" or "False".
  • The entire shopping list can be represented using a SimpleString.
  • There may be reasons why we would not want to use a SimpleString to represent the entire shopping list. For example, we might want to transfer the list into a database or other software application and then be able to sort, query, duplicate or otherwise process individual items on the list. Treating the entire list as a SimpleString would therefore complicate our processing requirements.

SimpleString[edit]

Different ways to represent a SimpleString in XML:

<Example>
    <String note="This XML attribute contains a SimpleString.">
    This XML Text Node represents a SimpleString.
    </String>

    <!-- This XML comment contains a SimpleString -->
    <![CDATA[ This XML CDATA section contains a SimpleString. ]]>
</Example>

SimpleSequence[edit]

Different ways to represent a SimpleSequence in XML:

<Example>
    <!-- use a single XML attribute with a space-delimited list of items -->
    <ShoppingList items="bread eggs milk juice" />

    <!-- use a single XML attribute with a semicolon-delimited list of items 
         (this allows us to add items with spaces in them) -->
    <ShoppingList items="bread;cough syrup;milk;juice;laundry detergent"  />

    <!-- yet another way (but not necessarily a good way) 
         using multiple XML attributes -->
    <ShoppingList item00="bread" item01="eggs" item02="cough syrup" />

    <!-- yet another way 
         using XML child elements -->
    <ShoppingList>
        <item>eggs</item><item>milk</item><item>cough syrup</item>
    </ShoppingList>
</Example>

Name-value pair[edit]

SimpleDictionary[edit]

SimpleTable[edit]

Side-by-side examples[edit]

SimpleTable (XML_Elem):

<table>
    <tr><item>eggs</item><getfor>andy</getfor><notes></notes></tr>
    <tr><item>milk</item><getfor>andy</getfor><notes></notes></tr>
    <tr><item>laundry detergent</item><getfor>andy</getfor><notes></notes></tr>
    <tr><item>cough syrup</item><getfor>granny</getfor><notes>try to get grape flavor</notes></tr>
</table>

SimpleTable (XML_Attr):

<table>
    <tr item="eggs"         getfor="andy"   notes=""    />
    <tr item="milk"         getfor="andy"   notes=""    />
    <tr item="laundry detergent"  getfor="andy"   notes=""  />
    <tr item="cough syrup"  getfor="granny" notes="try to get grape flavor"    />
</table>

SimpleTable (XML_Mixed):

<table>
    <tr>
        <item getfor="andy" >eggs</item><notes></notes>
    </tr>
    <tr>
        <item getfor="andy" >milk</item><notes></notes>
    </tr>
    <tr>
        <item getfor="andy" >laundry detergent</item><notes></notes>
        </tr>
    <tr>
        <item getfor="granny">cough syrup</item><notes>try to get grape flavor</notes>
    </tr>
</table>

Basic data structures in programming[edit]

To further illustrate how basic data structures apply in many different contexts, some of the basic data structures enumerated previously are examined and compared here in the context of computer programming.

For the first part of the comparison, we examine the generic terminology against that used commonly in programming languages:

  • SimpleBoolean: is commonly called a boolean and can usually take the values true or false, 0 or 1, or other values, depending on the language.
  • SimpleString: commonly called a string or stringBuffer.
  • SimpleSequence: numerically indexed variables in programming are commonly represented with an array.
  • Name-value pair: (explained in more detail below)
  • SimpleDictionary: these are commonly represented with a dictionary, or an associative array.
  • SimpleTable: (explained in more detail below)

Technical considerations[edit]

Now that we've introduced and discussed specific examples of the basic data structures, there are a few technical considerations that apply to all of the data structures, and are particularly important to those who may be responsible for implementing and designing XML schemas to deal with specific implementation scenarios.

  • Exact terminology depends on context: Although the "basic" structures described here apply to many different scenarios, the terms used to describe them can overlap or conflict. For example, the term "SimpleSequence" as used here closely coincides with what is called an "array" in many programming languages. Similarly, the term "SimpleDictionary" is shorthand for what some programming languages call an "associative array". Although this close correlation is intentional, one must always remember that the specific nuances of an application or programming language will require additional attention. Sometimes minor conflicts or discrepancies arise when one digs into the details for any specific data structure in any given project or technology.
  • Basic structures are flexible concepts: Structures can be defined in terms of one another, and some structures can be applied recursively. For example, one could easily define a SimpleSequence using a SimpleString along with some basic assumptions. (e.g., a SimpleSequence is a string of alphanumeric characters where each item in the sequence is separated by one or more whitespace characters: "eggs bread butter milk").
  • Abstract structures tend to hide tricky details: For example, the term "SimpleString" describes the abstract notion of a sequence of characters (e.g., "ISBN 0-596-00327-7"). The abstract notion is fairly intuitive and uncomplicated. Nevertheless, the precise notation used to implement that abstract notion, and represent it in real-live working code is a different matter entirely. Different programming languages and different environments may use different conventions for representing the same "string". Because of this variability, one can also recognize that the abstract notion of a "SimpleString" in XML is also subject to differing representations, based on the needs of any given project.


Notes and references[edit]

  1. An important note: the basic terms used here are generalizations. Although they may coincide with terms used in specific software, specific programming languages, or specific applications, these are not intended as technically precise definitions. The concepts described here are presented to help emphasize the context-neutral principle of interoperability in XML.



The one-to-many relationship



Previous Chapter Next Chapter
Basic data structures The one-to-one relationship




Learning objectives

  • Learn different techniques of implementing one-to-many relationships in XML
  • create custom data types in an XML schema
  • create empty elements with attributes in an XML document
  • define a presentation layout for an XML document using a table with varying background colors and font characteristics, and display images in an XML stylesheet



Introduction[edit]

In a one-to-many relationship, one object can reference several instances of another. A model is mapped into a schema whereby each data model entity becomes a complex element type. Each data model attribute becomes a simple element type, and the one-to-many relationship is recorded as a sequence.

Exhibit 1:Data model for 1:m relationship


In the previous chapter, we introduced a simple XML schema, XML document, and an XML stylesheet for a single entity data model. We now include more features of each of the key aspects of XML.

Implementing a one-to-many relationship[edit]

There are three different techniques for implementing a one-to-many relationship:

Containment relationship: A structure is defined where one element is contained within another. The "contained" element ceases to exist when the "container" element is removed. For instance, where a city has many hotels, the hotels are "contained" in the city.

  <cityDetails>
    <cityName>Belmopa</cityName>
    <hotelDetails>
      <hotelName>Bull Frog Inn</hotelName>
    </hotelDetails>
    <hotelDetails>
      <hotelName>Pook's Hill Lodge</hotelName>
    </hotelDetails>
  </cityDetails>
  <cityDetails>
    <cityName>Kuala Lumpur</cityName>
    <hotelDetails>
      <hotelName>Pan Pacific Kuala Lumpur</hotelName>
    </hotelDetails>
    <hotelDetails>
      <hotelName>Mandarin Oriental Kuala Lumpur</hotelName>
    </hotelDetails>
  </cityDetails>

Intra-document relationships: In a case where you have one city with many hotels, rather than a city containing hotels, a hotel will have a "location in" relationship to a city. A city id is used as a reference on the hotel element. Therefore, rather than the hotels being contained in the city, they now just reference the city's id via the cityRef attribute. This is very similar to a foreign key in a relational database.

  <cityDetails>
   <city ID="c1">
    <cityName>Belmopa</cityName>
   </city ID>
   <city ID="c2">
    <cityName>Kuala Lumpur</cityName>
   </city ID>
  </cityDetails>
  <hotelDetails>
    <hotel cityRef="c1">
      <hotelName>Bull Frog Inn</hotelName>
    </hotel>
    <hotel cityRef="c2">
      <hotelName>Pan Pacific Kuala Lumpur</hotelName>
    </hotel>
  </hotelDetails>

Inter-document relationships: The inter-document relationship is much like the intra-document relationship. It also uses the id and idRef attributes to assign an attribute to a parent attribute. The difference is that the inter-document relationship is used when tables, such as the city and hotel tables, might live in different filesystems or tablespaces.

  <city id="c1">
    <cityName>Belmopa</cityName>
  </city>
  <city id="c2">
    <cityName>Kuala Lumpur</cityName>
  </city>
  <hotel>
    <city href="cityDetails.xml#c1"/>
    <hotelName>Bull Frog Inn</hotelName>
  </hotel>
  <hotel>
    <city href="cityDetails.xml#c2"/>
    <hotelName>Pan Pacific Kuala Lumpur</hotelName>
  </hotel>


Exhibit 2:Checklist for deciding what technique to use:

<table width="100%" border="1" cellspacing="0" cellpadding="0">
  <tr>
    <th width="30%">Technique</th>
    <th width="25%">Passing Data</th>
    <th width="15%">Flexibility</th>
    <th width="30%">Ease of Use</th>
  </tr>
  <tr style="text-align:center">
    <td>Containment</td>
    <td>Excellent</td>
    <td>Fair</td>
    <td>Excellent</td>
  </tr>
  <tr>
    <td style="text-align:center">Intra-Document</td>
    <td style="text-align:center">Good</td>
    <td style="text-align:center">Good</td>
    <td style="text-align:center">Good</td>
  </tr>
  <tr>
    <td style="text-align:center">Inter-Document</td>
    <td style="text-align:center">Fair</td>
    <td style="text-align:center">Excellent</td>
    <td style="text-align:center">Fair</td>
  </tr>
</table>

XML schema[edit]

Some of the built-in data types for an XML schema were introduced in the previous chapter, but still, there are more that are very useful, such as anyURI, date, time, year, and month. In addition to the built-in data types, a custom data type can be defined by the schema designer to accept specific data input. As we have learned, data are defined in XML documents using markup tags defined in an XML schema. However, some elements might not have values. An empty element tag can be used to address this situation. An empty element tag (and any custom markup tag) can contain attributes that add additional information about the tag without adding extra text to the element. An example will be shown in the chapter, using attributes in an empty element tag.

Empty elements with attributes in XML document[edit]

Elements can have different content types depending on how each element is defined in the XML schema. The different types are element content, mixed content, simple content, and empty content. An XML element consists of everything from the start of the element tag to the close of that element tag.

  • An element with element content is the root element - everything in between the opening and closing tags consists of elements only.
Example: <tourGuide>
      :
  </tourGuide>
  • A mixed content element is one that has text and as well as other elements between its opening and closing tags.
Example: <restaurant>My favorite restaurant is
  <restaurantName>Provino's Italian Restaurant</restaurantName>
      :
  </restaurant>
  • A simple content element is one that contains only text between its opening and closing tags.
Example: <restaurantName>Provino's Italian Restaurant</restaurantName>
  • An empty content element, which is an empty element, is one that does not contain anything between its opening and closing tags (or the element tag is opened and ended with a single tag, by using / before the closing of the opening tag.
Example: <hotelPicture filename="pan_pacific.jpg" size="80"
          value="Image of Pan Pacific"/>

An empty element is useful when there is no need to specify its content or that the information describing the element is fixed. Two examples illustrated this concept. First, a picture element that references the source of an image with its attributes, but has no need in specifying text content. Second, the owner’s name is fixed for a company, thus it can specify the related information inside the owner tag using attributes. An attribute is meta-information, information that describes the content of the element.

European Central Bank's use of XML[edit]

<?xml version="1.0" encoding="UTF-8"?>
<gesmes:Envelope xmlns:gesmes="http://www.gesmes.org/xml/2002-08-01" 
xmlns="http://www.ecb.int/vocabulary/2002-08-01/eurofxref">
    <gesmes:subject>Reference rates</gesmes:subject>
    <gesmes:Sender>
        <gesmes:name>European Central Bank</gesmes:name>
    </gesmes:Sender>
    <Cube>
        <Cube time="2004-05-28">
            <Cube currency="USD" rate="1.2246"/>
            <Cube currency="JPY" rate="135.77"/>
            <Cube currency="DKK" rate="7.4380"/>
            <Cube currency="GBP" rate="0.66730"/>
            <Cube currency="SEK" rate="9.1150"/>
            <Cube currency="CHF" rate="1.5304"/>
            <Cube currency="ISK" rate="87.72"/>
            <Cube currency="NOK" rate="8.2120"/>
        </Cube>
    </Cube>

<!--For the sake of illustration, some of the currencies are omitted 
in the preceding code.Banks, consultants, currency traders, 
and firms involved in international trade are the major users 
of this information.-->

</gesmes:Envelope>

XML schema data types[edit]

Some of the commonly used data types, such as string, decimal, integer, and boolean, are introduced in chapter 2. The following are a few more data types that are useful.

Exhibit 3:Other data types:

Type Format Example Comment
year YYYY 1999  
month YYYY-MM 1999-03 Month type is used when the day is irrelevant for the data element
time hh:mm:ss.sss with optional time zone indicator 20:14:05 Z for UTC or one of –hh:mm or +hh:mm to indicate the difference from UTC. This time type is used when you want the content to represent a particular time of day that recurs every day, such as 4:15 pm.
date YYYY-MM-DD 1999-03-14  
anyURI The domain name specified beginning with http:// http://www.panpacific.com  

More data types[edit]

Besides the built-in data types, custom data types can be created as required. A custom data type can be a simple type or complex type. For simplicity, we create a custom data type that is a simple type, which means that the element does not contain other elements or attributes. It contains text only. The creation of a custom simple type starts from using a built-in simple type and applying it with restrictions, or facets, to limit the acceptable values of the tag. A custom simple type can be nameless or named. If the custom simple type is to be used only once, then it makes sense to not name it; thus, that custom type will only be used in where it is defined. Since a named custom type can be referenced (by its name), that custom type can be used wherever necessary.

A pattern can be used to specify exactly how the content of the element should look. For example, one might want to specify the format of a telephone number, a postal code, or a product code. By having a defined pattern for certain elements, the data exchanged will be uniform and the values will be consistent when stored in a database. A useful way to set patterns is through Regex, which will be discussed in later chapters.

Schema examples[edit]

The following is a schema that extends the schema introduced in the previous chapter to include a one-to-many relationship of city to hotels with two examples of custom data types.

Exhibit 1:Data model for 1:m relationship

1:m relationship - City Hotel

Important, this is a continuing example, so new code is added to the last chapter's example!

Containment example[edit]

 <?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">

  <!--Tour Guide-->

  <xsd:element name="tourGuide">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="city" type="cityDetails" minOccurs="1" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>

  <!--This will contain the City details-->

  <xsd:complexType name="cityDetails">
    <xsd:sequence>
      <xsd:element name="cityName" type="xsd:string"/>
      <xsd:element name="adminUnit" type="xsd:string"/>
      <xsd:element name="country" type="xsd:string"/>

      <!--The element Continent uses a Nameless Custom Simple Type-->

      <xsd:element name="continent">
        <xsd:simpleType>
          <xsd:restriction base="xsd:string">
            <xsd:enumeration value="Asia"/>
            <xsd:enumeration value="Africa"/>
            <xsd:enumeration value="Australia"/>
            <xsd:enumeration value="Europe"/>
            <xsd:enumeration value="North America"/>
            <xsd:enumeration value="South America"/>
            <xsd:enumeration value="Antarctica"/>
          </xsd:restriction>
        </xsd:simpleType>
      </xsd:element>
      <xsd:element name="population" type="xsd:integer"/>
      <xsd:element name="area" type="xsd:integer"/>
      <xsd:element name="elevation" type="xsd:integer"/>
      <xsd:element name="longitude" type="xsd:decimal"/>
      <xsd:element name="latitude" type="xsd:decimal"/>
      <xsd:element name="description" type="xsd:string"/>
      <xsd:element name="history" type="xsd:string"/>
      <xsd:element name="hotel" type="hotelDetails" minOccurs="1" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

  <!-- This will contain the Hotel details-->

  <xsd:complexType name="hotelDetails">
    <xsd:sequence>
      <xsd:element name="hotelName" type="xsd:string"/>
      <xsd:element name="hotelPicture"/>
      <xsd:element name="streetAddress" type="xsd:string"/>
      <xsd:element name="postalCode" type="xsd:string" minOccurs="0"/>
      <xsd:element name="phone" type="xsd:string"/>
      <xsd:element name="emailAddress" type="emailAddressType" minOccurs="0"/>

      <!-- The custom simple type, emailAddressType, defined in the xsd:complexType, 
           is used as the type of the emailAddress element. -->

      <xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
      <xsd:element name="hotelRating" type="xsd:integer"/>
    </xsd:sequence>
  </xsd:complexType>

  <!-- NOTE: Since postalCode, emailAddress, and websiteURL are not standard elements that
          must be provided, the minOccurs=”0” indicates that they are optional -->

  <!--This is a Named Custom SimpleType that is called from Hotel whenever someone types in an 
      email address-->

  <xsd:simpleType name="emailAddressType">
    <xsd:restriction base="xsd:string">

      <!--You can learn more about this pattern by reading the Regex section.-->

      <xsd:pattern value="\w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:schema>

Intra-document example[edit]

 <?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">

  <!--Tour Guide-->

  <xsd:element name="tourGuide">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="city" type="cityDetails" minOccurs="1" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>

  <!--This will contain the City details-->

  <xsd:complexType name="cityDetails">
    <xsd:sequence>
      <xsd:element name="cityID" type="xsd:ID"/>
      <xsd:element name="cityName" type="xsd:string"/>
      <xsd:element name="adminUnit" type="xsd:string"/>
      <xsd:element name="country" type="xsd:string"/>

      <!--The element Continent uses a Nameless Custom Simple Type-->

      <xsd:element name="continent">
        <xsd:simpleType>
          <xsd:restriction base="xsd:string">
            <xsd:enumeration value="Asia"/>
            <xsd:enumeration value="Africa"/>
            <xsd:enumeration value="Australia"/>
            <xsd:enumeration value="Europe"/>
            <xsd:enumeration value="North America"/>
            <xsd:enumeration value="South America"/>
            <xsd:enumeration value="Antarctica"/>
          </xsd:restriction>
        </xsd:simpleType>
      </xsd:element>
      <xsd:element name="population" type="xsd:integer"/>
      <xsd:element name="area" type="xsd:integer"/>
      <xsd:element name="elevation" type="xsd:integer"/>
      <xsd:element name="longitude" type="xsd:decimal"/>
      <xsd:element name="latitude" type="xsd:decimal"/>
      <xsd:element name="description" type="xsd:string"/>
      <xsd:element name="history" type="xsd:string"/>
     </xsd:sequence>
  </xsd:complexType>

  <!-- This will contain the Hotel details-->

  <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="hotel" type="hotelDetails" minOccurs="1" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  <xsd:complexType name="hotelDetails">
    <xsd:sequence>
      <xsd:element name="cityRef" type="xsd:IDRef"/>
      <xsd:element name="hotelName" type="xsd:string"/>
      <xsd:element name="hotelPicture"/>
      <xsd:element name="streetAddress" type="xsd:string"/>
      <xsd:element name="postalCode" type="xsd:string" minOccurs="0"/>
      <xsd:element name="phone" type="xsd:string"/>
      <xsd:element name="emailAddress" type="emailAddressType" minOccurs="0"/>

      <!-- The custom simple type, emailAddressType, defined in the xsd:complexType, 
           is used as the type of the emailAddress element. -->

      <xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
      <xsd:element name="hotelRating" type="xsd:integer"/>
    </xsd:sequence>
  </xsd:complexType>

  <!-- NOTE: Since postalCode, emailAddress, and websiteURL are not standard elements that
          must be provided, the minOccurs=”0” indicates that they are optional -->

  <!--This is a Named Custom SimpleType that is called from Hotel whenever someone types in an 
      email address-->

  <xsd:simpleType name="emailAddressType">
    <xsd:restriction base="xsd:string">

      <!--You can learn more about this pattern by reading the Regex section.-->

      <xsd:pattern value="\w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:schema>

Inter-document example[edit]

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">

  <!--Tour Guide-->

  <xsd:element name="tourGuide">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="city" type="cityDetails" minOccurs="1" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>

  <!--This will contain the City details-->

  <xsd:complexType name="cityDetails">
    <xsd:sequence>
      <xsd:element name="cityID" type="xsd:ID"/>
      <xsd:element name="cityName" type="xsd:string"/>
      <xsd:element name="adminUnit" type="xsd:string"/>
      <xsd:element name="country" type="xsd:string"/>

      <!--The element Continent uses a Nameless Custom Simple Type-->

      <xsd:element name="continent">
        <xsd:simpleType>
          <xsd:restriction base="xsd:string">
            <xsd:enumeration value="Asia"/>
            <xsd:enumeration value="Africa"/>
            <xsd:enumeration value="Australia"/>
            <xsd:enumeration value="Europe"/>
            <xsd:enumeration value="North America"/>
            <xsd:enumeration value="South America"/>
            <xsd:enumeration value="Antarctica"/>
          </xsd:restriction>
        </xsd:simpleType>
      </xsd:element>
      <xsd:element name="population" type="xsd:integer"/>
      <xsd:element name="area" type="xsd:integer"/>
      <xsd:element name="elevation" type="xsd:integer"/>
      <xsd:element name="longitude" type="xsd:decimal"/>
      <xsd:element name="latitude" type="xsd:decimal"/>
      <xsd:element name="description" type="xsd:string"/>
      <xsd:element name="history" type="xsd:string"/>
     </xsd:sequence>
  </xsd:complexType>
  <!-- This will contain the Hotel details-->

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">

  <!--Tour Guide 2-->

  <xsd:element name="tourGuide2">
  <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="hotel" type="hotelDetails" minOccurs="1" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  <xsd:complexType name="hotelDetails">
    <xsd:sequence>
      <xsd:element name="cityRef" type="xsd:IDRef"/>
      <xsd:element name="hotelName" type="xsd:string"/>
      <xsd:element name="hotelPicture"/>
      <xsd:element name="streetAddress" type="xsd:string"/>
      <xsd:element name="postalCode" type="xsd:string" minOccurs="0"/>
      <xsd:element name="phone" type="xsd:string"/>
      <xsd:element name="emailAddress" type="emailAddressType" minOccurs="0"/>

      <!-- The custom simple type, emailAddressType, defined in the xsd:complexType, 
           is used as the type of the emailAddress element. -->

      <xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
      <xsd:element name="hotelRating" type="xsd:integer"/>
    </xsd:sequence>
  </xsd:complexType>

  <!-- NOTE: Since postalCode, emailAddress, and websiteURL are not standard elements that
          must be provided, the minOccurs=”0” indicates that they are optional -->

  <!--This is a Named Custom SimpleType that is called from Hotel whenever someone types in an 
      email address-->

  <xsd:simpleType name="emailAddressType">
    <xsd:restriction base="xsd:string">

      <!--You can learn more about this pattern by reading the Regex section.-->

      <xsd:pattern value="\w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:schema>

Refers to Chapter 2 - A single entity for steps in using NetBeans to create the above XML schema.

XML document[edit]

Attributes

  • The valid element naming structure applies to attribute names as well
  • In a given element, all attributes’ names must be unique
  • An attribute may not contain the symbol ‘<’ The character string ‘&lt;’ can be used to represent it
  • Each attribute must have a name and a value. (i.e. <hotelPicture filename=“pan_pacific.jpg” />, filename is the name and pan_pacific.jpg is the value)
  • If the assigned value itself contains a quoted string, the type of quotation marks must differ from those used to enclose the entire value. (For instance, if double quotes are used to enclose the whole value then use single quotes for the string: <name familiar=”’Jack’”>John Smith</name>)
  <?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="city_hotel.xsl"?>
<tourGuide xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="TourGuide3.xsd">

    <!--This is where you define the first city and all its attributes-->

    <city>
        <cityName>Belmopa</cityName>
        <adminUnit>Cayo</adminUnit>
        <country>Belize</country>

        <!--The content of the element “continent” must be one of the values specified in the set of 
            acceptable values in the XML schema for the element “continent”-->

        <continent>South America</continent>
        <population>11100</population>
        <area>5</area>
        <elevation>130</elevation>
        <longitude>12.3</longitude>
        <latitude>123.4</latitude>
        <description>Belmopan is the capital of Belize</description>
        <history>Belmopan was established following devastation of the former capitol, Belize City ,
            by Hurricane Hattie in 1965. High ground and open space influenced the choice and
            ground-breaking began in 1966. By 1970 most government offices and operations had
            already moved to the new location. </history>

        <!--This is where you would store the name of the Hotel and its attributes-->

        <!--Notice that the hotelDetails elements did not contain the postalCode entity. The document is 
            still valid, because postalCode is optional-->

        <hotel>
            <hotelName>Bull Frog Inn</hotelName>

            <!--The empty element, hotelPicture, contains attributes: “filename”, “size”, and “value”, to 
                                indicate the name and location of the image file, the desired size, and 
                                the description of the empty element, hotelPicture-->

            <hotelPicture filename="bull_frog_inn.jpg" size="80" value="Image of Bull Frog Inn"
                imageURL="http://www.bullfroginn.com"/>
            <streetAddress>25 Half Moon Avenue</streetAddress>
            <phone>501-822-3425</phone>

            <!--The emailAddress elements must match the pattern specified in the schema to be valid -->

            <emailAddress>bullfrog@btl.net</emailAddress>
            <websiteURL>http://www.bullfroginn.com/</websiteURL>
            <hotelRating>4</hotelRating>
        </hotel>

        <!--This is where you put the information for another Hotel-->

        <hotel>
            <hotelName>Pook's Hill Lodge</hotelName>
            <hotelPicture filename="pook_hill_lodge.jpg" size="80" value="Image of Pook's Hill
                Lodge" imageURL="http://www.global-travel.co.uk/pook1.htm"/>
            <streetAddress>Roaring River</streetAddress>
            <phone>440-126-854-1732</phone>
            <emailAddress>info@global-travel.co.uk</emailAddress>
            <websiteURL>http://www.global-travel.co.uk/pook1.htm</websiteURL>
            <hotelRating>3</hotelRating>
        </hotel>
    </city>

    <!--This is where you define another city and its attributes-->

    <city>
        <cityName>Kuala Lumpur</cityName>
        <adminUnit>Selangor</adminUnit>
        <country>Malaysia</country>
        <continent>Asia</continent>
        <population>1448600</population>
        <area>243</area>
        <elevation>111</elevation>
        <longitude>101.71</longitude>
        <latitude>3.16</latitude>
        <description>Kuala Lumpur is the capital of Malaysia and is the largest city in the nation.
        </description>
        <history>The city was founded in 1857 by Chinese tin miners and superseded Klang. In 1880
            the British government transferred their headquarters from Klang to Kuala Lumpur , and
            in 1896 it became the capital of Malaysia. </history>

        <!--This is where you put the information for a Hotel-->

        <hotel>
            <hotelName>Pan Pacific Kuala Lumpur </hotelName>
            <hotelPicture filename="pan_pacific.jpg" size="80" value="Image of Pan Pacific"
             imageURL="http://www.malaysia-hotels-discount.com/hotels/kualalumpur/pan_pacific_hotel/index.shtml"/>
            <streetAddress>Jalan Putra</streetAddress>
            <postalCode>50746</postalCode>
            <phone>1-866-260-0402</phone>
            <emailAddress>president@panpacific.com</emailAddress>
            <websiteURL>http://www.panpacific.com</websiteURL>
            <hotelRating>5</hotelRating>
        </hotel>

        <!--This is where you put the information for another Hotel-->

        <hotel>
            <hotelName>Mandarin Oriental Kuala Lumpur </hotelName>
            <hotelPicture filename="mandarin_oriental.jpg" size="80" value="Image of Mandarin
                Oriental" imageURL="http://www.mandarinoriental.com/kualalumpur"/>
            <streetAddress>Kuala Lumpur City Centre</streetAddress>
            <postalCode>50088</postalCode>
            <phone>011-603-2380-8888</phone>
            <emailAddress>mokul-sales@mohg.com</emailAddress>
            <websiteURL>http://www.mandarinoriental.com/kualalumpur/</websiteURL>
            <hotelRating>5</hotelRating>
        </hotel>
    </city>
</tourGuide>

Table 3-2: XML Document for a one-to-many relationship – city_hotel.xml

Refers to Chapter 2 - A single entity for steps in using NetBeans to create the above XML document.

XML style sheet[edit]

<?xml version="1.0" encoding="UTF-8"?>
  <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="html"/>    
        <xsl:template match="/">
            <html>
                <head>
                    <title>Tour Guide</title>
                </head>
                <body>
                    <h2>Cities</h2>
                    <xsl:apply-templates select="tourGuide"/>
                </body>
            </html>
        </xsl:template>
        <xsl:template match="tourGuide">
            <xsl:for-each select="city">
                <xsl:text>City: </xsl:text>
                <xsl:value-of select="cityName"/>
                <br/>
                <xsl:text>Population: </xsl:text>
                <xsl:value-of select="population"/>
                <br/>
                <xsl:text>Country: </xsl:text>
                <xsl:value-of select="country"/>
                <br/>
                
                <xsl:for-each select="hotel">
                    <xsl:text>Hotel: </xsl:text>
                    <xsl:value-of select="hotelName"/>
                    <br/>
                </xsl:for-each>
               
               <br/>
            </xsl:for-each>     
        </xsl:template>    
  </xsl:stylesheet>

Summary[edit]

Besides the simple built-in data types (e.g, year, month, time, anyURI, and date) schema designers may create custom data types to suit their needs. A simple custom data type can be created from one of the built-in data types by applying to it some restrictions, facets (enumerations that specify a set of acceptable values), or specific patterns.

An empty element does not contain any text, however, it may contain attributes to provide additional information about that element.

The presentation layout for displaying a HTML page can include code for style tags, background color, font size, font weight, and alignment. Table tags can be used to organize the layout of content in a HTML page, and images can also be displayed using an image tag.


Exercises[edit]

In order to learn more about the one-to-many relationship, exercises are provided.

Answers[edit]

In order to learn more about the one-to-many relationship, answers are provided to go with the exercises above.



The one-to-one relationship



Previous Chapter Next Chapter
The one-to-many relationship The many-to-many relationship



Learning objectives

  • Create a schema for a data model containing a 1:1 relationship
  • Place restrictions on elements or attributes in an XML schema
  • Specify fixed or default values for an element in an XML schema


Introduction[edit]

In the previous chapter, some new features of XML schemas, documents, and stylesheets were introduced as well as how to model a one-to-many relationship. In this chapter, we will introduce the modeling of a one-to-one relationship in XML. We will also introduce more features of an XML schema.


A one-to-one (1:1) relationship[edit]

The following diagram shows a one-to-one and a one-to-many relationship. The one-to-one relationship records each country as a single top destination.


Xmldm1to1.png

Exhibit 4-1: Data model for a 1:1 relationship

XML schema[edit]

A one-to-one (1:1) relationship is represented in the data model in Exhibit 4-1. The addition of country and destination to the data model allows the 1:1 relationship named topDestination. A country has many different destinations, but only one top destination. The XML schema in Exhibit 4-2 shows how to represent a 1:1 relationship in an XML schema.

XML schema example[edit]

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified"> 
<!--
Tour Guide
--> 
 <xsd:element name="tourGuide"> 
  <xsd:complexType> 
   <xsd:sequence> 
    <xsd:element name="country" type="countryDetails" minOccurs="1" maxOccurs="unbounded" /> 
   </xsd:sequence> 
  </xsd:complexType> 
 </xsd:element> 
<!--
Country
--> 
 <xsd:complexType name="countryDetails"> 
  <xsd:sequence> 
   <xsd:element name="countryName" type="xsd:string" minOccurs="1" maxOccurs="1"/> 
   <xsd:element name="population" type="xsd:integer" minOccurs="0" maxOccurs="1" default="0"/> 
   <xsd:element name="continent" minOccurs="0" maxOccurs="1"> 
    <xsd:simpleType> 
     <xsd:restriction base="xsd:string"> 
      <xsd:enumeration value="Asia"/> 
      <xsd:enumeration value="Africa"/> 
      <xsd:enumeration value="Australasia"/> 
      <xsd:enumeration value="Europe"/> 
      <xsd:enumeration value="North America"/> 
      <xsd:enumeration value="South America"/> 
      <xsd:enumeration value="Antarctica"/> 
     </xsd:restriction> 
    </xsd:simpleType> 
   </xsd:element> 
   <xsd:element name="topDestination" type="destinationDetails" minOccurs="0" maxOccurs="1"/> 
   <xsd:element name="destination" type="destinationDetails" minOccurs="0" maxOccurs="unbounded"/> 
  </xsd:sequence> 
 </xsd:complexType> 
<!--
Destination
--> 
 <xsd:complexType name="destinationDetails"> 
  <xsd:all> 
   <xsd:element name="destinationName" type="xsd:string"/> 
   <xsd:element name="description" type="xsd:string"/> 
   <xsd:element name="streetAddress" type="xsd:string" minOccurs="0"/> 
   <xsd:element name="telephoneNumber" type="xsd:string" minOccurs="0"/> 
   <xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/> 
  </xsd:all> 
 </xsd:complexType> 
</xsd:schema>


Exhibit 4-2: XML Schema for a one-to-one relationship

New elements in schema[edit]


Let’s examine the new elements and attributes in the schema in Exhibit 4-2.

  • Country is a complex type defined in City to represent the 1:M relationship between a country and its cities.
  • Destination is a complex type defined in Country to represent the 1:M relationship between a country and its many destinations.
  • topDestination is a complex type defined in Country to represent the 1:1 relationship between a country and its top destination.

Restrictions in schema[edit]


Placing restrictions on elements was introduced in the previous chapter; however, there are more potentially useful restrictions that can be placed on an element. Restrictions can be placed on elements and attributes that affect how the processor handles whitespace characters:

<xsd:element name="streetAddress">
 <xsd:simpleType>
  <xsd:restriction base="xsd:string">
   <xsd:whiteSpace value="preserve"/>
  </xsd:restriction>
 </xsd:simpleType>
</xsd:element>

White space & length constraints[edit]

The whiteSpace constraint is set to "preserve", which means that the XML processor will not remove any white space characters. Other useful restrictions include the following:

  • Replace – the XML processor will replace all whitespace characters with spaces.
<xsd:whiteSpace value="replace"/>
  • Collapse – The processor will remove all whitespace characters.
<xsd:whiteSpace value="collapse"/>
  • Length, maxLength, minLength—the length of the element can be fixed or can have a predefined range.
<xsd:length value="8"/>
<xsd:minLength value="5"/>
<xsd:maxLength value="8"/>

Order indicators[edit]

In addition to placing restrictions on elements, order indicators can be used to define in what order elements should occur.

All indicator[edit]

The <all> indicator specifies by default that the child elements can appear in any order and that each child element must occur once and only once:

<xsd:element name="person">
 <xsd:complexType>
  <xsd:all>
   <xsd:element name="firstname" type="xsd:string"/>
   <xsd:element name="lastname" type="xsd:string"/>
  </xsd:all>
 </xsd:complexType>
</xsd:element>
Choice indicator[edit]

The <choice> indicator specifies that either one child element or another can occur:

<xsd:element name="person">
 <xsd:complexType>
  <xsd:choice>
   <xsd:element name="employee" type="employee"/>
   <xsd:element name="visitor" type="visitor"/>
  </xsd:choice>
 </xsd:complexType>
</xsd:element>
Sequence indicator[edit]

The <sequence> indicator specifies that the child elements must appear in a specific order:

<xsd:element name="person">
 <xsd:complexType>
  <xsd:sequence>
   <xsd:element name="firstname" type="xsd:string"/>
   <xsd:element name="lastname" type="xsd:string"/>
  </xsd:sequence>
 </xsd:complexType>
</xsd:element>

XML document[edit]


The XML document in Exhibit 4-3 shows how the new elements (country and destination) defined in the XML schema found in Exhibit 4-2 are used in an XML document. Note that the child elements of <topDestination> can appear in any order because of the <xsd:all> order indicator used in the schema.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="newXMLSchema.xsl" media="screen"?>
<tourGuide xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="XMLSchema.xsd">   
<!--
Malaysia
-->   
<country> 
   <countryName>Malaysia</countryName> 
   <population>22229040</population> 
   <continent>Asia</continent> 
   <topDestination> 
    <description>A popular duty-free island north of Penang.</description> 
    <destinationName>Pulau Langkawi</destinationName> 
   </topDestination> 
   <destination> 
    <destinationName>Muzium Di-Raja</destinationName> 
    <description>The original palace of the Sultan</description>
    <streetAddress>122 Muzium Road</streetAddress>
    <telephoneNumber>48494030</telephoneNumber>
    <websiteURL>www.muziumdiraja.com</websiteURL> 
   </destination> 
   <destination> 
    <destinationName>Kinabalu National Park</destinationName> 
    <description>A national park</description>
    <streetAddress>54 Ocean View Drive</streetAddress>
    <telephoneNumber>4847101</telephoneNumber>
    <websiteURL>www.kinabalu.com</websiteURL> 
   </destination> 
  </country>
<!--
Belize
--> 
  <country> 
   <countryName>Belize</countryName> 
   <population>249183</population> 
   <continent>South America</continent> 
   <topDestination> 
    <destinationName>San Pedro</destinationName> 
    <description>San Pedro is an island off the coast of Belize</description> 
   </topDestination> 
   <destination> 
    <destinationName>Belize City</destinationName> 
    <description>Belize City is the former capital of Belize</description>
    <websiteURL>www.belizecity.com</websiteURL> 
   </destination> 
   <destination> 
    <destinationName>Xunantunich</destinationName> 
    <description>Mayan ruins</description>
    <streetAddress>4 High Street</streetAddress>
    <telephoneNumber>011770801</telephoneNumber> 
   </destination> 
  </country> 
  </tourGuide>

Exhibit 4-3: XML Document for a one-to-one relationship

Summary[edit]

Schema designers may place restrictions on the length of elements and on how the processor handles white space. Schema designers may also specify fixed or default values for an element. Order indicators can be used to specify the order in which elements must appear in an XML document.

Exercises[edit]


Answers[edit]



The many-to-many relationship



Previous Chapter Next Chapter
The one-to-one relationship Recursive relationships



Learning objectives
  • Learn different methods to represent a many-to-many relationship using XML
  • Create XML schemas using the "Eliminate" and "ID/IDREF" methods to structure content based on a many-to-many relationship
  • Create the corresponding XML documents for the "Eliminate" and "ID/IDREF" methods
  • Learn to use the key function in an XML stylesheet to format data structured with the "ID/IDREF" method
  • Create a basic XML stylesheet that incorporates the key function


Introduction[edit]

In the previous chapters, you learned how to use XML to structure and format data based on one-to-one and one-to-many relationships. Because XML provides the means to model data using hierarchical parent-child relationships, the one-to-one and one-to-many relationships are relatively simple to represent in XML. However, this hierarchical parent-child structure is difficult to use to model the many-to-many relationship, a common relationship between entities in many situations.

In this chapter, we will explore the pros and cons of a few methods that are used to model a many-to-many relationship in XML; these methods offer compromises in overcoming the problems that arise when applying this relationship to XML. In particular, we will see examples of how to model the many-to-many relationship using two different methods, "Eliminate" and "ID/IDREF." Additionally, in the XML stylesheet, we will learn how to implement the key function to display the data that was modeled using the "ID/IDREF" method.

Problems: many-to-many relationship[edit]

In XML, the parent-child relationship is most commonly used to represent a relationship. This can easily be applied to a one-to-one or one-to-many relationship. A many-to-many relationship is not supported directly by XML; the parent-child relationship will not work as each element may only have a single parent element. There are couple of possible solutions to get around this.

Solutions: many-to-many relationship[edit]

Eliminate[edit]

Create XML documents that eliminate the need for a many-to-many relationship
By limiting the extent of information that is conveyed, you can get around the need for a many-to-many relationship. Instead of trying to have one XML document encompass all of the information, separate the information where one document describes only one of the entities that participates in the many-to-many relationship. Using our tourGuide relationship for example, one way for us to accomplish this would be creating a separate XML document for each hotel. The relationship with amenity would ultimately then become a one-to-many. This method is more suitable for situations in which the scope of data exchange can be limited to subsets of data. However, using this method for more broadly scoped data exchange, you may repeat data several times, especially if there are many attributes. To avoid this redundancy, use the ID/IDREF method.

ID/IDREF[edit]

Represent the many-to-many relationship using unique identifiers
Although not the most user-friendly way to handle this problem, one way of getting around the many-to-many relationship is by creating keys that would uniquely identify each entity. To do this, an element with ID or IDREF attributes-types must be specified within the XML schema. To use a data modeling analogy, ID is similar to the primary key, and IDREF is similar to the foreign key.

Many-to-many relationship data model[edit]

Exhibit 1: Data model for a m:m relationship
Data Model for a m:m relationship

The relationship reads, a hotel can have many amenities, and an amenity can exist at many hotels.

As you will notice, in order to represent a many-to-many relationship, two entities were added. The middle entity is necessary for the data model to represent an associative entity that stores data about the relationship between hotel and amenity. Using our Tour Guide example, "Amenity" was added to represent a list of possible amenities that a hotel can possess.

The following examples illustrate methods to represent a many-to-many relationship in XML.

Eliminate: sample solution[edit]

In this example, the many-to-many relationship has been converted to a one-to-many relationship.

XML schema[edit]

Exhibit 2: XML schema for "Eliminate" method

<?xml version="1.0" encoding="UTF-8" ?>
<!--
     Document   : amenity1.xsd
     Created on : February 4, 2006
     Author     : Dr. Rick Watson
-->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
    <xsd:element name="hotelGuide">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="hotel" type="hotelDetails" minOccurs="1" maxOccurs="unbounded"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
    <xsd:simpleType name="emailAddressType">
        <xsd:restriction base="xsd:string">
            <xsd:pattern value="\w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*"/>
        </xsd:restriction>
    </xsd:simpleType>
    <xsd:complexType name="hotelDetails">
        <xsd:sequence>
            <xsd:element name="hotelPicture"/>
            <xsd:element name="hotelName" type="xsd:string"/>
            <xsd:element name="streetAddress" type="xsd:string"/>
            <xsd:element name="postalCode" type="xsd:string" minOccurs="0"/>
            <xsd:element name="telephoneNumber" type="xsd:string"/>
            <xsd:element name="emailAddress" type="emailAddressType" minOccurs="0"/>
            <xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
            <xsd:element name="hotelRating" type="xsd:integer" default="0"/>
            <xsd:element name="lowerPrice" type="xsd:positiveInteger"/>
            <xsd:element name="upperPrice" type="xsd:positiveInteger"/>
            <xsd:element name="amenity" type="amenityValue" minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>
    <xsd:complexType name="amenityValue">
        <xsd:sequence>
            <xsd:element name="amenityType" type="xsd:string"/>
            <xsd:element name="amenityOpenHour" type="xsd:time"/>
            <xsd:element name="amenityCloseHour" type="xsd:time"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:schema>

XML document[edit]

Exhibit 3: XML document for "Eliminate" method

<?xml version="1.0" encoding="UTF-8"?>
<!--
     Document   : amenity1.xml
     Created on : February 4, 2006
     Author     : Dr. Rick Watson
-->
<hotelGuide xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="amenity1.xsd">
    <hotel>
        <hotelPicture/>
        <hotelName>Narembeen Hotel</hotelName>
        <streetAddress>Churchill Street</streetAddress>
        <telephoneNumber>+61 (08) 9064 7272</telephoneNumber>
        <emailAddress>narempub@oz.com.au</emailAddress>
        <hotelRating>1</hotelRating>
        <lowerPrice>50</lowerPrice>
        <upperPrice>100</upperPrice>
        <amenity>
            <amenityType>Restaurant</amenityType>
            <amenityOpenHour>06:00:00</amenityOpenHour>
            <amenityCloseHour>22:00:00 </amenityCloseHour>
        </amenity>
        <amenity>
            <amenityType>Pool</amenityType>
            <amenityOpenHour>06:00:00</amenityOpenHour>
            <amenityCloseHour>18:00:00 </amenityCloseHour>
        </amenity>
        <amenity>
            <amenityType>Complimentary Breakfast</amenityType>
            <amenityOpenHour>07:00:00</amenityOpenHour>
            <amenityCloseHour>10:00:00 </amenityCloseHour>
        </amenity>
    </hotel>
    <hotel>
        <hotelPicture/>
        <hotelName>Narembeen Caravan Park</hotelName>
        <streetAddress>Currall Street</streetAddress>
        <telephoneNumber>+61 (08) 9064 7308</telephoneNumber>
        <emailAddress>naremcaravan@oz.com.au</emailAddress>
        <hotelRating>1</hotelRating>
        <lowerPrice>20</lowerPrice>
        <upperPrice>30</upperPrice>
        <amenity>
            <amenityType>Pool</amenityType>
            <amenityOpenHour>10:00:00</amenityOpenHour>
            <amenityCloseHour>22:00:00 </amenityCloseHour>
        </amenity>
    </hotel>
</hotelGuide>

ID/IDREF: sample solution[edit]

To avoid redundancy, we create a separate element, "amenity," which is included at the top of the schema along with "hotel." Remember, the data types ID and IDREF are synonymous with the primary key and foreign key, respectively. For every foreign key (IDREF), there must be a matching primary key (ID). Note that the IDREF data type has to be an alphanumeric string.

The following example illustrates the ID/IDREF approach. Notice that the ID for the amenity pool is defined as "k1," and every hotel with a pool as an amenity references "k1," using IDREF. If the IDREF does not match any ID, then the document will not validate.

XML schema[edit]

Exhibit 4: XML schema for "ID/IDREF" method

<?xml version="1.0" encoding="UTF-8" ?>
<!--
     Document   : amenity2.xsd
     Created on : February 4, 2006
     Author     : Dr. Rick Watson
-->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
    <xsd:element name="hotelGuide">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="hotel" type="hotelDetails" minOccurs="1" maxOccurs="unbounded"/>
                <xsd:element name="amenity" type="amenityList" minOccurs="1" maxOccurs="unbounded"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
    <xsd:simpleType name="emailAddressType">
        <xsd:restriction base="xsd:string">
            <xsd:pattern value="\w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*"/>
        </xsd:restriction>
    </xsd:simpleType>
    <xsd:complexType name="hotelDetails">
        <xsd:sequence>
            <xsd:element name="hotelPicture"/>
            <xsd:element name="hotelName" type="xsd:string"/>
            <xsd:element name="streetAddress" type="xsd:string"/>
            <xsd:element name="postalCode" type="xsd:string" minOccurs="0"/>
            <xsd:element name="telephoneNumber" type="xsd:string"/>
            <xsd:element name="emailAddress" type="emailAddressType" minOccurs="0"/>
            <xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
            <xsd:element name="hotelRating" type="xsd:integer" default="0"/>
            <xsd:element name="lowerPrice" type="xsd:positiveInteger"/>
            <xsd:element name="upperPrice" type="xsd:positiveInteger"/>
            <xsd:element name="amenities" type="amenityDesc" minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>
    <xsd:complexType name="amenityDesc">
        <xsd:sequence>
            <xsd:element name="amenityIDREF" type="xsd:IDREF"/>
            <xsd:element name="amenityOpenHour" type="xsd:time"/>
            <xsd:element name="amenityCloseHour" type="xsd:time"/>
        </xsd:sequence>
    </xsd:complexType>
    <xsd:complexType name="amenityList">
        <xsd:sequence>
            <xsd:element name="amenityID" type="xsd:ID"/>
            <xsd:element name="amenityType" type="xsd:string"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:schema>

XML document[edit]

Exhibit 5: XML document for "ID/IDREF" method

<?xml version="1.0" encoding="UTF-8"?>
<!--
     Document   : amenity2.xml
     Created on : February 4, 2006
     Author     : Dr. Rick Watson
-->
<?xml-stylesheet href="amenity2.xsl" type="text/xsl" media="screen"?>
<hotelGuide xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="amenity2.xsd">
    <hotel>
        <hotelPicture/>
        <hotelName>Narembeen Hotel</hotelName>
        <streetAddress>Churchill Street</streetAddress>
        <telephoneNumber>+61 (08) 9064 7272</telephoneNumber>
        <emailAddress>narempub@oz.com.au</emailAddress>
        <hotelRating>1</hotelRating>
        <lowerPrice>50</lowerPrice>
        <upperPrice>100</upperPrice>
        <amenities>
            <amenityIDREF>k2</amenityIDREF>
            <amenityOpenHour>06:00:00</amenityOpenHour>
            <amenityCloseHour>22:00:00 </amenityCloseHour>
        </amenities>
        <amenities>
            <amenityIDREF>k1</amenityIDREF>
            <amenityOpenHour>06:00:00</amenityOpenHour>
            <amenityCloseHour>18:00:00 </amenityCloseHour>
        </amenities>
        <amenities>
            <amenityIDREF>k5</amenityIDREF>
            <amenityOpenHour>07:00:00</amenityOpenHour>
            <amenityCloseHour>10:00:00 </amenityCloseHour>
        </amenities>
    </hotel>
    <hotel>
        <hotelPicture/>
        <hotelName>Narembeen Caravan Park</hotelName>
        <streetAddress>Currall Street</streetAddress>
        <telephoneNumber>+61 (08) 9064 7308</telephoneNumber>
        <emailAddress>naremcaravan@oz.com.au</emailAddress>
        <hotelRating>1</hotelRating>
        <lowerPrice>20</lowerPrice>
        <upperPrice>30</upperPrice>
        <amenities>
            <amenityIDREF>k1</amenityIDREF>
            <amenityOpenHour>10:00:00</amenityOpenHour>
            <amenityCloseHour>22:00:00 </amenityCloseHour>
        </amenities>
    </hotel>
    <amenity>
        <amenityID>k1</amenityID>
        <amenityType>Pool</amenityType>
    </amenity>
    <amenity>
        <amenityID>k2</amenityID>
        <amenityType>Restaurant</amenityType>
    </amenity>
    <amenity>
        <amenityID>k3</amenityID>
        <amenityType>Fitness room</amenityType>
    </amenity>
    <amenity>
        <amenityID>k4</amenityID>
        <amenityType>Complimentary breakfast</amenityType>
    </amenity>
    <amenity>
        <amenityID>k5</amenityID>
        <amenityType>in-room data port</amenityType>
    </amenity>
    <amenity>
        <amenityID>k6</amenityID>
        <amenityType>Water slide</amenityType>
    </amenity>
</hotelGuide>

Key function: XML stylesheet[edit]

In order to set up an XML stylesheet using the ID/IDREF method for a many-to-many relationship, the key function should be used. In the stylesheet, the <xsl:key> element specifies the index, which is used to return a node-set from the XML document.

A key consists of the following:

1. the node that has the key
2. the name of the key
3. the value of a key

The following XML stylesheet illustrates how to use the key function to present content that is structured in a many-to-many relationship.

XML stylesheet[edit]

Exhibit 6: XML stylesheet for "ID/IDREF" method

<?xml version="1.0" encoding="UTF-8"?>
<!--
     Document   : amenity2.xsl
     Created on : February 4, 2006
     Author     : Dr. Rick Watson
-->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:key name="amList" match="amenity" use="amenityID"/>
    <xsl:output method="html"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Hotel Guide</title>
            </head>
            <body>
                <h2>Hotels</h2>
                <xsl:apply-templates select="hotelGuide"/>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="hotelGuide">
        <xsl:for-each select="hotel">
            <xsl:value-of select="hotelName"/>
            <br/>
            <xsl:for-each select="amenities">
                <xsl:value-of select="key('amList',amenityIDREF)/amenityType"/>
                <xsl:text>   </xsl:text>
                <xsl:value-of select="amenityOpenHour"/> - 
                <xsl:value-of select="amenityCloseHour"/>
                <BR/>
            </xsl:for-each>
            <br/>
            <br/>
        </xsl:for-each>
        <br/>
    </xsl:template>
</xsl:stylesheet>

Expedia.de: XML and affiliate marketing[edit]

Expedia.de is the German subsidiary of expedia.com, the internet-based travel agency headquartered in Bellevue, Washington, USA. It offers its customers the booking of airline tickets, car rentals, vacation packages and various other attractions and services via its website and by phone. Its websites attract more than 70 million visitors each month. Currently expedia.com employs 4.600 employees serving customers in the United States, Canada, the UK, France, Germany, Italy, and Australia.

For marketing purposes expedia.de set up an affiliate marketing program. Affiliate marketing is a way to reach potential customers without any financial risk for the company intending to advertise (merchant). The merchant gives website owners, which are called affiliates, the opportunity to refer to the merchant page, offering commission-based monetary rewards as incentives. In the case of Expedia.de the affiliate partners receive a commission every time users from their websites book travel on expedia.de. So the affiliates can concentrate on selling and the merchant takes care of handling the transactions.

To ease the business of the affiliate partners – and of course to make the program more attractive – Expedia.de offers its partners a service called xmlAdEd. xmlAdEd is a service providing current product information on using XML. Affiliates using this service are able to request more than 8 million of travel offerings in XML format via HTTP-request. The data is updated several times a day. In the HTTP-request you can set certain parameters such as location, price, airport code, …

The use of XML in this case gives the affiliates several advantages:
- Efficient and flexible processing of the data because of separation of structure, content and style.
- Platform-independent processing of the data.
- Lossless conversion into other file formats.
- Easy integration in their websites.
- Possibility to create an own web shop in individual design

By providing their affiliates product information in XML, expedia.de not only eases the business of their partners, but also ensures that customers receive consistent, up-to-date information on their services.

Summary[edit]

When describing a many-to-many relationship in XML, there are a few solutions available for designers to use. In choosing how to represent the many-to-many relationship, the designer not only must consider the most efficient way to represent the information, but also the audience for which the document is intended and how the document will be used.

References[edit]

http://www-128.ibm.com/developerworks/xml/library/x-xdm2m.html

http://www.w3.org/TR/xslt#key

Exercises[edit]

Answers[edit]



Recursive relationships



Previous Chapter Next Chapter
The many-to-many relationship Data schemas



Learning objectives

  • Understand the concept of a recursive relationship
  • Create a schema for a one-to-one recursive relationship
  • Create a schema for a one-to-many recursive relationship
  • Create a schema for a many-to-many recursive relationship
  • Define a unique identifier in a schema
  • Create a primary key/foreign key relationship




Introduction[edit]

Recursive relationships are an interesting and more complex concept than the relationships you have seen in the previous chapters. A recursive relationship occurs when there is a relationship between an entity and itself. For example, a one-to-many recursive relationship occurs when an employee is the manager of other employees. The employee entity is related to itself, and there is a one-to-many relationship between one employee (the manager) and many other employees (the people who report to the manager). Because of the more complex nature of these relationships, we will need slightly more complex methods of mapping them to a schema and displaying them in a style sheet.

The one-to-one recursive relationship[edit]

Continuing with the tour guide model, we will develop a schema that shows cities that have hosted the Olympics and the previous host city. Since the previous host is another city and only one city can be the previous host this is a one to one recursive relationship.

Recursive


host.xsd (XML schema for a one-to-one recursive model)[edit]

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
attributeFormDefault="unqualified">

<xsd:element name="cities">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element name="city" type="cityType" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element>

<xsd:complexType name="cityType">
  <xsd:sequence>
    <xsd:element name="cityID" type="xsd:ID"/>
     <xsd:element name="cityName" type="xsd:string"/>
     <xsd:element name="cityCountry" type="xsd:string"/>
     <xsd:element name="cityPop" type="xsd:integer"/>
     <xsd:element name="cityHostYr" type="xsd:integer"/>
     <xsd:element name="cityPreviousHost" type="xsd:IDREF" minOccurs="0" maxOccurs="1"/>
  </xsd:sequence>
</xsd:complexType>
</xsd:schema>

Exhibit 1: XML schema for Host City Entity

host.xml (XML document for a one-to-one recursive model)[edit]

<?xml version="1.0" encoding="UTF-8"?>
<cities xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:noNamespaceSchemaLocation='host.xsd'>
  <city>
    <cityID>c1</cityID>
    <cityName>Atlanta</cityName>
    <cityCountry>USA</cityCountry>
    <cityPop>4000000</cityPop>
    <cityHostYr>1996</cityHostYr>    
  </city>
  <city>
    <cityID>c2</cityID>
    <cityName>Sydney</cityName>
    <cityCountry>Australia</cityCountry>
    <cityPop>4000000</cityPop>
    <cityHostYr>2000</cityHostYr>  
    <cityPreviousHost>c1</cityPreviousHost>   
  </city>
  <city>
    <cityID>c3</cityID>
    <cityName>Athens</cityName>
    <cityCountry>Greece</cityCountry>
    <cityPop>3500000</cityPop>
    <cityHostYr>2004</cityHostYr>  
    <cityPreviousHost>c2</cityPreviousHost>   
  </city>
</cities>

Exhibit 2: XML Document for Olympic Host City

The one-to-many recursive relationship[edit]

A hypothetical sports team is divided into squads with each squad having a captain. Every person on the team is a player, regardless of whether they are a squad captain. Since a squad captain is a player, this situation meets the definition of a recursive relationship—a squad captain is also a player and has a one-to-many relationship with the other players. This is a one-to-many recursive relationship because one captain has many players under him/her. See the example below for how to model the relationship.

team.xsd (XML schema for a one-to-many recursive model)[edit]

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
<xsd:element name="team">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element name="player" type="playerType" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element>
  <xsd:complexType name="playerType">
    <xsd:sequence>
      <xsd:element name="playerID" type="xsd:ID"/>
      <xsd:element name="playerName" type="xsd:string"/>
      <xsd:element name="playerCap" type="playerC" minOccurs="0" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
  <xsd:complexType name="playerC">
    <xsd:sequence>
      <xsd:element name="memberOf" type="xsd:IDREF"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:schema>

Exhibit 3: XML schema for Team Entity

team.xml (XML document for a one-to-many recursive model)[edit]

<?xml version="1.0" encoding="UTF-8"?>
<team xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:noNamespaceSchemaLocation='Recursive1toMSchema.xsd'>
<player>
   <playerID>c1</playerID>
   <playerName>Tommy Jones</playerName>
   <playerCap>
      <memberof>c3</memberof>
   </playerCap>
</player>
<player>
   <playerID>c2</playerID>
   <playerName>Eddie Thomas</playerName>
   <playerCap>
      <memberof>c3</memberof>
   </playerCap>
</player>
<player>
   <playerID>c3</playerID>
   <playerName>Sean McCombs</playerName>
</player>
<player>
   <playerID>c4</playerID>
   <playerName>Patrick O’Shea</playerName>
   <playerCap>
      <memberof>c3</memberof>
    </playerCap>
</player>
</team>

Exhibit 4: XML Document for Team Entity

Natural one-to-many recursive structure[edit]

A more natural approach for most one-to-many recursive relationships is to use XML's heirarchical nature to directly represent the heirarchy. Consider Locations:

<?xml version="1.0" encoding="UTF-8"?>
<location type="country">
  <name>USA</name>
  <sub-locations>
    <location type="state">
      <name>Ohio</name>
      <sub-locations>
        <location type="city"><name>Akron</name></location>
        <location type="city"><name>Columbus</name></location>
      </sub-location>
    </location>
  </sub-locations>
</location>

The many-to-many recursive relationship[edit]

Think you're getting a feel for recursive relationships yet? Well, there is still the third and final relationship to add to your repertoire — the many-to-many recursive. A common example of a many-to-many recursive relationship is when one item can be comprised of many items of the same data type as itself, and each of those sub-items may belong to another parent item of the same data type. Sound confusing? Let's look at the example of a product that can consist of a single item or multiple items (i.e., a packaged product). The example below describes tourist products that can be packaged together to create a new product.

product.xsd (XML schema for a many-to-many recursive model)[edit]

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
    <xsd:element name="products">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="product" type="prodType" maxOccurs="unbounded"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
    <xsd:complexType name="prodType">
        <xsd:sequence>
            <xsd:element name="prodID" type="xsd:ID"/>
            <xsd:element name="prodName" type="xsd:string"/>
            <xsd:element name="prodCost" type="xsd:decimal" minOccurs="0"/>
            <xsd:element name="prodPrice" type="xsd:decimal"/>
            <xsd:element name="components" type="componentsType" minOccurs="0" maxOccurs="1"/>
        </xsd:sequence>
    </xsd:complexType>
    <xsd:complexType name="componentsType">
        <xsd:sequence>
            <xsd:element name="component" type="xsd:IDREF"/>
            <xsd:element name="componentqty" type="xsd:integer"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:schema>

Exhibit 5: XML schema for Product Entity

product.xml (XML document for a many-to-many recursive model)[edit]

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="product.xsl"?>
<products xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="product.xsd">
    <product>
        <prodID>p1000</prodID>
        <prodName>Animal photography kit</prodName>
        <prodPrice>725</prodPrice>
        <components>
            <component>p101</component>
            <componentqty>1</componentqty>
        </components>
    </product>
    <product>
        <prodID>p101</prodID>
        <prodName>Camera case</prodName>
        <prodCost>150</prodCost>
        <prodPrice>300</prodPrice>
    </product>
</products>

Exhibit 6: XML Document for Product Entity

Summary[edit]

When the child has the same type of data as its parent in a parent-child type data relationship, this is a sign of the existence of a recursive relationship. The xsd:ID and xsd:IDREF elements can be used in a schema to create primary key-foreign key values in an XML document.


Exercises[edit]


Answers[edit]

External Links



Data schemas



Previous Chapter Next Chapter
Recursive relationships DTD




Learning objectives

  • Overview of Data Schemas
  • Starting your schema the right way
  • Entities in general
  • The Parent Child Structure
  • Attributes and Restrictions
  • Ending your schema the right way

Initiated by:

The University of Georgia

Terry College of Business

Department of Management Information Systems


Introduction[edit]

Data schemas are the foundation of all XML pages. They define objects, their relationships, their attributes, and the structure of the data model. Without them, XML documents would not exist. In this chapter, you will come to understand the purpose of XML data schemas, their intricate parts, and how to utilize them. Also, examples will be included for you to copy when creating your own data schema, making your job a lot easier. At the bottom of this Web page a whole Schema has been included, from which parts have been included in the different sections throughout this chapter. Refer to it if you would like to see how the whole Schema works as one.

Overview of Data Schemas[edit]

The data schema, all technicalities aside, is the data model with which all the XML information is conveyed. It has a hierarchy structure starting with a root element (to be explained later) and goes all the way down to cover even the most minute detail of the model with detailed steps in between. Data schemas have two main parts, the entities and their relationships. The entities contained in a data schema represent objects from the model. They have unique identifiers, attributes, and names for what kind of object they are. The relationships in the schema represent the relationships between the objects, simple enough. Relationships can be one to one, one to many, many to many, recursive, and any other kind you could find in a data model. Now we will begin to create our own data schema.

Starting your schema the right way[edit]

All schemas begin the same way, no matter what type of objects they represent. The first line in every Schema is this declaration:

<?xml version="1.0" encoding="UTF-8"?>

Exhibit 1: XML Declaration

Exhibit 1 simply tells the browser or whatever file/program accessing this schema that it is an XML file and uses the encoding structure "UTF-8". You can copy this to use to start your own XML file. Next comes the Namespace declaration:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">

Exhibit 2: Namespace Declaration

Namespaces are basically dictionaries containing definitions of most of the coding in the schema. For example, when creating a schema, if you declare an object to be of type "String", the definition of the type "String" is contained in the Namespace along with all of its attributes. This is true for most of the code you write. If you have made or seen other schemas, most of the code is prefaced by "xsd:". A good example is something like "xsd:sequence" or "xsd:complexType". sequence and complexType are both objects defined in the Namespace that has been linked to the prefix "xsd". In fact, you could theoretically name the default Namespace anything, as long as you referenced it the same way throughout the Schema. The most common Namespace which contains most of the XML objects is http://www.w3.org/2001/XMLSchema. Now onto Exhibit 2.

The first part lets any file/program know that this file is a schema. Pretty easy to understand. Like the XML declaration, this is universal to XML schemas and you can use it in yours. The second part is the actual Namespace declaration; xmlns stands for XML NameSpace. This defines the Schema's default Namespace and is usually the one given in the code. Again, I would recommend using this code to start your Schemas. The last part is difficult to understand, but here is a pretty detailed explanation. Using "unqualified" is most applicable until you get to some really complicated code.

Entities in general[edit]

Entities are basically the objects a Schema is created to represent. As stated before, they have attributes and relationships. We will now go much further into explaining exactly what they are and how to write code for them.

There are two types of Entities: simpleType and complexType. A simpleType object has one value associated with it. A string is a perfect example of a simpleType object as it only contains the value of the string. Most simpleTypes used will be defined in the default Namespace; however, you can define your own simpleType at the bottom of the Schema (this will be brought up in the restrictions section). Because of this, the only objects you will most often need to include in your Schema are complexTypes. A complexType is an object with more than one attribute associated with it, and it may or may not have a child elements attached to it. Here is an example of a complexType object:

<xsd:complexType name="GenreType">
  <xsd:sequence>
    <xsd:element name="name" type="xsd:string"/>
    <xsd:element name="description" type="xsd:string"/>
    <xsd:element name="movie" type="MovieType" minOccurs="1" maxOccurs="unbounded"/>
  </xsd:sequence>
</xsd:complexType>

Exhibit 3: The complexType Element

This code begins with the declaration of a complexType and its name. When other entities refer to it, such as a parent element, it will refer to this name. The 2nd line begins the sequence of attributes and child elements, which are all declared as an "element". The elements are declared as elements with the 1st part of the line of code, and their name to which other documents will refer is included as the "name" as the 2nd part. After the first two declarations comes the "type" declaration. Note that for the name and description elements their type is "xsd:string" showing that the type string is defined in the Namespace "xsd". For the movie element, the type is "MovieType", and because there is no Namespace before "MovieType", it is assumed that this type is included in this Schema. (it could refer to a type defined in another Schema if the other Schema was included at the top of the Schema. don't worry about that now) "minOccurs" and "maxOccurs" represents the relationship between Genre's and MovieTypes. "minOccurs" can be either 0 or an arbitrary number, depending only on the data model. "maxOccurs" can be either 1 (a one to one relationship), an arbitrary number (a one to many relationship), or "unbounded" (a one to many relationship).

For each schema, there must be one root element. This entity contains every other entity underneath it in the hierarchy. For instance, when creating a schema to include a list of movies, the root element would be something like MovieDatabase, or maybe MovieCollection, just something that would logically contain all the other objects (like genre, movie, actor, director, plotline, etc.) It is always started with this line of code: <xsd:element name="xxx"> showing that it is the root element and then goes on as a normal complexType. All other objects will begin with either simpleType or complexType. Here is sample code for a MovieDatabase root element:

<xsd:element name="MovieDatabase">
     <xsd:complexType>
       <xsd:sequence>
         <xsd:element name="Genre" type="GenreType" minOccurs="1" maxOccurs="unbounded"/>            
       </xsd:sequence>
     </xsd:complexType>
   </xsd:element>

Exhibit 4: The Root Element

This represents a MovieDatabase where the child element of MovieDatabase is a Genre. From there it goes onto movie, etc. We will continue to use this example help you better understand.

The Parent / Child Relationship[edit]

The Parent / Child Relationship is a key topic in Data Schemas. It represents the basic structure of the data model's hierarchy by clearly laying out the top down configuration. Look at this piece of code which shows how movies have actors associated with them:

<xsd:complexType name="MovieType">
  <xsd:sequence>
    <xsd:element name="name" type="xsd:string"/>       
    <xsd:element name="actor" type="ActorType" minOccurs="1" maxOccurs="unbounded"/>
  </xsd:sequence>
</xsd:complexType>
     
<xsd:complexType name="ActorType">
  <xsd:sequence>
    <xsd:element name="lname" type="xsd:string"/>
    <xsd:element name="fname" type="xsd:string"/>
  </xsd:sequence>
</xsd:complexType>

Exhibit 5: The Parent/Child Relationship

Within each MovieType, there is an element named "actor" which is of "ActorType". When the XML document is populated with information, the surrounding tags for actor will be <actor></actor> and not <ActorType></ActorType>. To keep your Schema flowing smoothly and without error, the type field in the Parent Element will always equal the name field in the declaration of the complexType Child Element.

Attributes and Restrictions[edit]

An attribute of an entity is a simpleType object in that it only contains one value. <xsd:element name="lname" type="xsd:string"/> is a good example of an attribute. It is declared as an element, has a name associated with it, and has a type declaration. Located in the appendix of this chapter is a long list of simpleTypes built into the default Namespace. Attributes are incredibly simple to use, until you try and restrict them.

In some cases, certain data must abide by a standard to maintain data integrity. An example of this would be a Social Security number or an email address. If you have a database of email addresses that sends mass emails to, you would need all of them to be valid addresses, or else you'd get tons of error messages each time you send out that mass email. To avoid this problem, you can essentially take a known simpleType and add a restriction to it to better suit your needs. Now you can do this two ways, but one is simpler and better to use in Data Schemas. You could edit the simpleType within its declaration in the Parent Element, but it gets messy, and if another Schema wants to use it, the code must be written again. The better way to do it is to list a new type at the bottom of the Schema that edits a previously known simpleType. Here is an example of this with a Social Security number:

<xsd:simpleType name="emailaddressType">
  <xsd:restriction base="xsd:string">
    <xsd:pattern value="[^@]+@[^\.]+\..+"/>
  </xsd:restriction>
</xsd:simpleType>
   
<xsd:simpleType name="ssnType">
  <xsd:restriction base="xsd:string">
    <xsd:pattern value="\d{3}-\d{2}-\d{4}"/>
  </xsd:restriction>
</xsd:simpleType>

Exhibit 6: Restriction on a simpleType

This was included in the Schema below the last Child Element and before the closing </xsd:schema>. The first line declares the simpleType and gives it a name, "ssnType". You could name yours anything you want, as long as you reference it correctly throughout the Schema. By doing this, you can use this type anywhere in the Schema, or anywhere in another Schema, provided the references are correct. The second line lets the Schema know it is a restricted type and its base is a string defined in the default Namespace. Basically, this type is a string with a restriction on it, and the third line is the actual restriction. It can be one of many types of restrictions, which are listed in the Appendix of this chapter. This one happens to be of type "pattern". A "pattern" means that only a certain sequence of characters will be allowed in the XML document and is defined in the value field. This particular one means three digits, a hyphen, two digits, a hyphen, and four digits. To learn more about how to use restrictions, follow this link to the W3 school's section on restrictions.

Not of little import: Introducing the <xsd:import> tag[edit]

The <xsd:import> tag is used to import a schema document and the namespace associated with the data types defined within the schema document. This allows an XML schema document to reference a type library using namespace names (prefixes). Let's take a closer look at a simple XML instance document for a store that uses these multiple namespace names:

<?xml version="1.0" encoding="UTF-8"?>
<store:SimpleStore xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.opentourism.org/xmltext/SimpleStore.xsd"
  xmlns:store="http://www.opentourism.org/xmltext/Store"
  xmlns:MGR="http://www.opentourism.org/xmltext/CoreSchema">
  <!-- Note the explicitly defined namespace declarations, the prefix store 
     represents data types defined in the     
     <code>http://www.opentourism.org/xmltext/Store.xml</code> namespace and the 
     prefix MGR represents data types defined in the 
     <code>http://www.opentourism.org/xmltext/CoreSchema</code> namespace. 
     Also, notice that there is no default namespace declaration – every element
     and attribute must be associated with a namespace (we will see this is 
     necessary weh we examine the schema document)  
-->
  <store:Store>
    <MGR:Name xmlns:MGR=" http://www.opentourism.org/xmltext/CoreSchema ">
      <MGR:FirstName>Michael</MGR:FirstName>
      <MGR:MiddleNames>Jay</MGR:MiddleNames>
      <MGR:LastName>Fox</MGR:LastName>
    </MGR:Name>
    <store:StoreName>The Gap</store:StoreName>
    <store:StoreAddress>
      <store:Street>86 Nowhere Ave.</store:Street>
      <store:City>Los Angeles</store:City>
      <store:State>CA</store:State>
      <store:ZipCode>75309</store:ZipCode>
    </store:StoreAddress>
    <!-- More store information would go here. -->
  </store:Store>
  <!-- More stores would go here. -->
</store:SimpleStore>

Exhibit 7 XML Instance Document – [1]


Let's look at the schema document and see how the <xsd:import> tag was used to import data types from a type library (external schema document).

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns="http://www.opentourism.org/xmltext/Store.xml"
  xmlns:MGR="http://www.opentourism.org/xmltext/CoreSchema"
  targetNamespace="http://www.opentourism.org/xmltext/Store.xml" elementFormDefault="qualified">
  <!-- The prefix MGR is bound to the following namespace name: 
          <code>http://www.opentourism.org/xmltext/CoreSchema</code>
          The managerTypeLib.xsd schema document is imported by associating the 
          schema with the <code>http://www.opentourism.org/xmltext/CoreSchema</code> 
          namespace name, which was bound to the MGR prefix. 
          The elementFormDefault attribute has the value ‘qualified' indicating that 
          an XML instance document must use qualified names for every element(default
          namespace can not be used)  
-->
  <!-- The target namespace and default namespace are the same  -->
  <xsd:import namespace="http://www.opentourism.org/xmltext/CoreSchema"
    schemaLocation="ManagerTypeLib.xsd"/>
  <xsd:element name="SimpleStore">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="Store" type="StoreType" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
  <xsd:complexType name="StoreType">
    <xsd:sequence>
      <xsd:element ref="MGR:Name"/>
      <xsd:element name="StoreName" type="xsd:string"/>
      <xsd:element name="StoreAddress" type="StoreAddressType"/>
    </xsd:sequence>
  </xsd:complexType>
  <xsd:complexType name="StoreAddressType">
    <xsd:sequence>
      <xsd:element name="Street" type="xsd:string"/>
      <xsd:element name="City" type="xsd:string"/>
      <xsd:element name="State" type="xsd:string"/>
      <xsd:element name="ZipCode" type="xsd:string"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:schema>

Exhibit 8: XML Schema [http://www.opentourism.org/xmltext/SimpleStore.xsd


Like the include tag and the redefine tag, the import tag is another means of incorporating any data types from an external schema document into another schema document and must occur before any element or attribute declarations. These mechanisms are important when XML schemas are modularized and type libraries are being maintained and used in multiple schema documents.

When the whole is greater than the sum of its parts:
Schema Modularization
[edit]

Now that we have covered all three methods of incorporating external XML schemas, let’s consider the importance of these mechanisms. As is typical with most programming code, redundancy is frowned upon; this is true for custom data type definitions as well. If a custom data type already exists that can be applied to an element in your schema document, does it not make sense to use this data type rather than create it again within your new schema document? Moreover, if you know that a single data type can be reused for several applications, should you not have a method for referencing that data type when you need it?

The idea behind modular schemas is to examine what your schema does, determine what data types are frequently used in one form or another and develop a type library. As your needs for more complex schemas increase you can continue to add to your library, reuse data types in your type library, and redefine those data types as needed. An example of this reuse would be a schema for customer information – different departments would use different schemas as they would need only partial customer information. However most, if not all, departments would need some specific customer information, like name and contact information, which could be incorporated in the individual departmental schema documents.

Schema modularization is a “best practice”. By maintaining a type library and reusing and redefining types in the type library, you can help ensure that your XML schema documents don't become overwhelming and difficult to read. Readability is important, because you may not be the only one using these schemas, and it is important that others can easily understand your schema documents.

“Choose, but choose wisely…”: Schema alternatives[edit]

Thus far in this book we have only discussed XML schemas as defined by the World Wide Web Consortium (W3C). Yet there are other methods of defining the data contained within an XML instanced document, but we will only mention the two most popular and well known alternatives: Document Type Definition (DTD) and Relax NG Schema.

We will cover DTDs in the next chapter. Relax NG schema is a newer and has many of the same features that W3C XML schema have; Relax NG also claims to be simpler, and easier to learn, but this is very subjective. For more about Relax NG, visit: http://www.relaxng.org/

Appendix[edit]

First is the full Schema used in the examples throughout this chapter:

<?xml version="1.0" encoding="UTF-8"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="unqualified">
    
   <xsd:element name="MovieDatabase">
     <xsd:complexType>
       <xsd:sequence>
         <xsd:element name="Genre" type="GenreType" minOccurs="1" maxOccurs="unbounded"/>            
       </xsd:sequence>
     </xsd:complexType>
   </xsd:element>
         
     <xsd:complexType name="GenreType">
       <xsd:sequence>
         <xsd:element name="name" type="xsd:string"/>
         <xsd:element name="description" type="xsd:string"/>
         <xsd:element name="movie" type="MovieType" minOccurs="1" maxOccurs="unbounded"/>
       </xsd:sequence>
     </xsd:complexType>
     
     <xsd:complexType name="MovieType">
       <xsd:sequence>
         <xsd:element name="name" type="xsd:string"/>
         <xsd:element name="rating" type="xsd:string"/>
         <xsd:element name="director" type="xsd:string"/>
         <xsd:element name="writer" type="xsd:string"/>
         <xsd:element name="year" type="xsd:int"/>
         <xsd:element name="tagline" type="xsd:string"/>       
         <xsd:element name="actor" type="ActorType" minOccurs="1" maxOccurs="unbounded"/>
       </xsd:sequence>
     </xsd:complexType>
     
     <xsd:complexType name="ActorType">
       <xsd:sequence>
         <xsd:element name="lname" type="xsd:string"/>
         <xsd:element name="fname" type="xsd:string"/>
         <xsd:element name="gender" type="xsd:string"/>
         <xsd:element name="bday" type="xsd:string"/>
         <xsd:element name="birthplace" type="xsd:string"/>
         <xsd:element name="ssn" type="ssnType"/>
       </xsd:sequence>
     </xsd:complexType>
     
     <xsd:simpleType name="ssnType">
       <xsd:restriction base="xsd:string">
         <xsd:pattern value="\d{3}-\d{2}-\d{4}"/>
       </xsd:restriction>
   </xsd:simpleType>
   
</xsd:schema>

It’s time to go back to the beginning…and review all of the schema data types, elements, and attributes that we have covered thus far (and maybe a few that we have not). The following tables will detail the XML data types, elements and attributes that can be used in an XML Schema.

Primitive Types

This is a table with all the primitive types the attributes in your schema can be.

Type Syntax Legal value example Constraining facets
xsd:anyURI <xsd:element name = “url” type = “xsd:anyURI” /> http://www.w3.com length, minLength, maxLength, pattern, enumeration, whitespace
xsd:boolean <xsd:element name = “hasChildren” type = “xsd:boolean” /> true or false or 1 or 0 pattern and whitespace
xsd:byte <xsd:element name = “stdDev” type = “xsd:byte” /> -128 through 127 length, minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, whitespace, and totalDigits
xsd:date <xsd:element name = “dateEst” type = “xsd:date” /> 2004-03-15 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace
xsd:dateTime <xsd:element name = “xMas” type = “xsd:dateTime” /> 2003-12-25T08:30:00 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace
xsd:decimal <xsd:element name = “pi” type = “xsd:decimal” /> 3.1415292 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, fractionDigits, and totalDigits
xsd:double <xsd:element name = “pi” type = “xsd:double” /> 3.1415292 or INF or NaN minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:duration <xsd:element name = “MITDuration” type = “xsd:duration” /> P8M3DT7H33M2S
xsd:float <xsd:element name = “pi” type = “xsd:float” /> 3.1415292 or INF or NaN minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:gDay <xsd:element name = “dayOfMonth” type = “xsd:gDay” /> ---11 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:gMonth <xsd:element name = “monthOfYear” type = “xsd:gMonth” /> --02-- minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:gMonthDay <xsd:element name = “valentine” type = “xsd:gMonthDay” /> --02-14 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:gYear <xsd:element name = “year” type = “xsd:gYear” /> 1999 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:gYearMonth <xsd:element name = “birthday” type = “xsd:gYearMonth” /> 1972-08 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:ID <xsd:attribute name="id" type="xsd:ID"/> id-102 length, minLength, maxLength, pattern, enumeration,   and whitespace
xsd:IDREF <xsd:attribute name="version" type="xsd:IDREF"/> id-102 length, minLength, maxLength, pattern, enumeration,   and whitespace
xsd:IDREFS <xsd:attribute name="versionList" type="xsd:IDREFS"/> id-102 id-103 id-100 length, minLength, maxLength, pattern, enumeration,   and whitespace
xsd:int <xsd:element name = “age” type = “xsd:int” /> 77 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, whitespace, and totalDigits
xsd:integer <xsd:element name = “age” type = “xsd:integer” /> 77 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:long <xsd:element name = “cannelNumber” type = “xsd:int” /> 214 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:negativeInteger <xsd:element name = “belowZero” type = “xsd:negativeInteger” /> -123 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, and totalDigits
xsd:nonNegativeInteger <xsd:element name = “numOfchildren” type = “xsd:nonNegativeInteger” /> 2 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, and totalDigits
xsd:nonPositiveInteger <xsd:element name = “debit” type = “xsd:nonPositiveInteger” /> 0 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, and totalDigits
xsd:positiveInteger <xsd:element name = “credit” type = “xsd:positiveInteger” /> 500 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, and totalDigits
xsd:short <xsd:element name = “numOfpages” type = “xsd:short” /> 476 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, and totalDigits
xsd:string <xsd:element name = “name” type = “xsd:string” /> Joeseph length, minLength, maxLength, pattern, enumeration,   whitespace, and totalDigits
xsd:time <xsd:element name = “credit” type = “xsd:time” /> 13:02:00 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace,

Schema Elements
( from http://www.w3schools.com/schema/schema_elements_ref.asp )

Here is a list of all the elements which can be included in your schemas.

Element Explanation
all Specifies that the child elements can appear in any order. Each child element can occur 0 or 1 time
annotation Specifies the top-level element for schema comments
any Enables the author to extend the XML document with elements not specified by the schema
anyAttribute Enables the author to extend the XML document with attributes not specified by the schema
appInfo Specifies information to be used by the application (must go inside annotation)
attribute Defines an attribute
attributeGroup Defines an attribute group to be used in complex type definitions
choice Allows only one of the elements contained in the <choice> declaration to be present within the containing element
complexContent Defines extensions or restrictions on a complex type that contains mixed content or elements only
complexType Defines a complex type element
documentation Defines text comments in a schema (must go inside annotation)
element Defines an element
extension Extends an existing simpleType or complexType element
field Specifies an XPath expression that specifies the value used to define an identity constraint
group Defines a group of elements to be used in complex type definitions
import Adds multiple schemas with different target namespace to a document
include Adds multiple schemas with the same target namespace to a document
key Specifies an attribute or element value as a key (unique, non-nullable, and always present) within the containing element in an instance document
keyref Specifies that an attribute or element value correspond to those of the specified key or unique element
list Defines a simple type element as a list of values
notation Describes the format of non-XML data within an XML document
redefine Redefines simple and complex types, groups, and attribute groups from an external schema
restriction Defines restrictions on a simpleType, simpleContent, or a complexContent
schema Defines the root element of a schema
selector Specifies an XPath expression that selects a set of elements for an identity constraint
sequence Specifies that the child elements must appear in a sequence. Each child element can occur from 0 to any number of times
simpleContent Contains extensions or restrictions on a text-only complex type or on a simple type as content and contains no elements
simpleType Defines a simple type and specifies the constraints and information about the values of attributes or text-only elements
union Defines a simple type as a collection (union) of values from specified simple data types
unique Defines that an element or an attribute value must be unique within the scope

Schema Restrictions and Facets for data types
( from http://www.w3schools.com/schema/schema_elements_ref.asp )

Here is a list of all the types of restrictions which can be included in your schema.

Constraint Description
enumeration Defines a list of acceptable values
fractionDigits Specifies the maximum number of decimal places allowed. Must be equal to or greater than zero
length Specifies the exact number of characters or list items allowed. Must be equal to or greater than zero
maxExclusive Specifies the upper bounds for numeric values (the value must be less than this value)
maxInclusive Specifies the upper bounds for numeric values (the value must be less than or equal to this value)
maxLength Specifies the maximum number of characters or list items allowed. Must be equal to or greater than zero
minExclusive Specifies the lower bounds for numeric values (the value must be greater than this value)
minInclusive Specifies the lower bounds for numeric values (the value must be greater than or equal to this value)
minLength Specifies the minimum number of characters or list items allowed. Must be equal to or greater than zero
pattern Defines the exact sequence of characters that are acceptable
totalDigits Specifies the exact number of digits allowed. Must be greater than zero
whiteSpace Specifies how white space (line feeds, tabs, spaces, and carriage returns) are handled

Regex

Special regular expression (regex) language can be used to construct a pattern. The regex language in XML Schema is based on Perl's regular expression language. The following are some common notations:

. (the period for any character at all
\d for any digit
\D for any non-digit
\w for any word (alphanumeric) character
\W for any non-word character (i.e. -, +, =)
\s for any white space (including space, tab, newline, and return)
\S for any character that is not white space
x* to have zero or more x's
(xy)* to have zero or more xy's
x+ repetition of the x, at least once
x? to have one or zero x's
(xy)? To have one or no xy's
[abc] to include one of a group of values
[0-9] to include the range of values from 0 to 9
x{5} to have exactly 5 x's (in a row)
x{5,} to have at least 5 x's (in a row)
x{5,8} at least 5 but at most 8 x's (in a row)
(xyz){2} to have exactly 2 xyz's (in a row)
For example, the pattern for validating a Social Security Number is \d{3}-\d{2}-\d{4}

The schema code for emailAddressType is \w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*

[w+] at least one word (alphanumeric) character, e. g. answer
[W*] followed by none, one or many non-word character(s), e. g. -
[w*@{1}] followed by any (or none) word character and one at-sign, e. g. my@
[w+] followed by at least one word character, e. g. mail
[W*] followed by none, one or many non-word character(s), e. g. _
[w+.] followed by at least one word character and period, e. g. please.
[w+.*] zero to infinite times followed by the previous string, e. g. opentourism.
[w*] finally followed by none, one or many word character(s) e. g. org
email-address: answer-my@mail_please.opentourism.org

Instance Document Attributes
These attributes do NOT need to be declared within the schemas

Attribute Explanation Example
xsi:nil Indicates that a certain element does not have a value or that the value is unknown.   The element must be set to nillable inside the schema document:

<xsd:element name=”last_name” type=”xsd:string” nillable=true”/>

<full_name xmlns:xsi= ”http://www.w3.org/2001/XMLSchema-instance”>    <first_name>Madonna</first_name>

<last_name xsi:nil=”true”/> </full_name>

xsi:noNamespaceSchemaLocation Locates the schema for elements that are not in any namespace <radio xsi:noNamespaceSchemaLocation= ”http://www.opentourism.org/xmtext/radio.xsd”>

<!—radio stuff goes here -- > </radio>

xsi:schemaLocation Locates schemas for elements and attributes that are in a specified namespace <radio xmlns= ”http://www.opentourism.org/xmtext/NS/radio xmlns:xsi= ”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation= ”http://www.arches.uga.eduNS/radio”http://www.opentourism.org/xmtext/radio.xsd”>

<!—radio stuff goes here -- > </radio>

xsi:type Can be used in instance documents to indicate the type of an element. <height xsi:type=”xsd:decimal”>78.9</height>


For more information on XML Schema structures, data types, and tools you can visit http://www.w3.org/XML/Schema.



DTD



Previous Chapter Next Chapter
Data schemas XHTML



A Document Type Definition is a file that links to an XML page. It controls what must or can be displayed, what attributes and their values must/can have and how the XML file should look like. XHTML, HTML and other markup languages use DTDs to validate their documents. Note: Web browsers accept bad markup in HTML.

Uses OF DTDs[edit]

DTDs are used to store large amounts of data in a custom markup language that can be used for a specific program or organization. Like schemas they can have elements, attributes and entities. The only difference is how it is displayed.

Prologue[edit]

Like in a schema, a DTD has a prolog. It is one line of text.

<?xml version="1.0" encoding="UTF-8"?>

The question mark is to tell the computer you are giving him an instruction. The word xml tells him that you are using XML, the version attribute tells what version of XML you are using and the encoding attribute tells him how to encode the data (you would use a different encoding if you wanted to use chinese text).

<!ELEMENT> tag[edit]

The element tag is used to display an element of the page, depending on how you declare it. It can go only on a specific part of the page or anywhere on the page.

The first element you declare is the root element (in HTML it's html). Let's pretend that there was an organization that wanted a bunch of XML files containing info about each person. They probably would have a root element of the file named "person". The standard for declaring an element with children elements is

<!ELEMENT elementName (childElement, childElement2, childElement3)>

So the orginization root element tag declaration would be

<!ELEMENT person (firstName, lastName, postalCode, cellNumber, homeNumber, email)>

Note: A child element must be declared in a separate element tag to be valid.

Note: The comma is used where you identify the child element is an occurrence indicator (something that tells the computer how it should occur). There are other occurrence indicators. We will cover them later in this chapter.

Note: The parentheses define what content type is found in the bracket. Different content types are found later in this chapter.

Some elements you don't want to be linked to specific tags (like a formatting tag you want to use to highlight important info), you do the same thing except you don't use it as a child element for any element depending on your needs, you may use the ANY content type, which allows you to use character data or other tags in your tag, the EMPTY content type, which looks like "<exampleXmlTag />" or #PCDATA for text.

Note:In an element declaration you can combine parentheses with #PCDATA. It looks like this <!ELEMENT elementName ( #PCDATA| childName). The pipe bar means that you can use text or other tags.



XHTML



Previous Chapter Next Chapter
DTD XPath




Learning objectives

  • List the differences between XHTML and HTML
  • Create a valid, well-formed XHTML document
  • Convert an existing HTML document to XHTML
  • Decide when XHTML is more appropriate than HTML


In previous chapters, we have learned how to generate HTML documents from XML documents and XSL stylesheets. In this chapter, we will learn how to convert those HTML documents into valid XHTML. We will discuss why XHTML has evolved as a standard and when it should be used.

The Evolution of XHTML[edit]

Originally, Web pages were designed in HTML. Unfortunately most implementations of this markup language allow all sorts of mistakes and bad formatting. Major browsers were designed to be forgiving, and poor code would display with few problems in most cases. This poor code was often not portable between browsers, e.g. a page would render in Netscape but not Internet Explorer or vice versa. The accounting for human error and bad formatting takes an amount of processing power that small handheld devices might not have. Thus when displaying data on handhelds, a tiny mistake can crash the device.

XHTML partially mitigates these problems. The processing burden is reduced by requiring XHTML documents to conform to the much stricter rules defined in XML. Aside from the stricter rules, HTML 4.01 and XHTML 1.0 are functionally equivalent. If a document breaks XML's well-formedness rules, an XHTML-compliant browser must not render the page. If a document is well-formed but invalid, an XHTML-compliant browser may render the page, so a significant number of mistakes still slip through.

In this chapter, we will examine in detail how to create an XHTML document.

The biggest problem with HTML from a design standpoint is that it was never meant to be a graphical design language. The original version of HTML was intended to structure human readable content (e.g. marking a section of text as a paragraph), not to format it (e.g. this paragraph should be displayed in 14pt Arial). HTML has evolved far past its original purpose and is being stretched and manipulated to cover cases that the original HTML designers never imagined.

The recommended solution is to use a separate language to describe the presentation of a group of documents. Cascading Style Sheets (CSS) is a language used for describing presentation. From version 1.1 of XHTML upwards web pages must be formatted using CSS or a language with equivalent capabilites such as XSLT (XSL Transformations). The use of CSS or XSLT is optional in XHTML 1.0 unless the strict variant is used. HTML 4.01 supports CSS but not XSLT.

So What is XHTML?[edit]

As you might have guessed, XHTML stands for eXtensible HyperText Markup Language. It is a cross between HTML and XML. It fulfills two major purposes that were ignored by HTML:

  1. XHTML is a stricter standard than HTML. XHTML documents must be well-formed just like regular XML. This reduces vagaries and inconsistency between browsers, because browsers do not have to decide how to display a badly-formed page. Malformed XHTML is not allowed.
    Note 1: Browsers only enforce well-formedness if the MIME type is set to application/xhtml+xml. If the MIME type is set to text/html, the browser will allow badly-formed documents. There are a large number of 'XHTML' documents on the web that are badly-formed and get away with it because their MIME type is text/html.
    Note 2: Browsers are not required to check for validity. See Invalid XHTML below for an example.
  2. XHTML allows for modularization (m12n). For different environments different element and attribute subsets can be defined.

The best thing about XHTML is that it is almost the same as HTML! If you know how to write an HTML document, it will be very simple for you to create an XHTML document without too much trouble. The biggest thing that you must keep in mind is that unlike with HTML, where simple errors like missing a closing tag are ignored by the browser, XHTML code must be written according to an exact specification. We will see later that adhering to these strict specifications actually allows XHTML to be more flexible than HTML.

XHTML Document Structure[edit]

At a minimum, an XHTML document must contain a DOCTYPE declaration and four elements: html, head, title, and body:

<!DOCTYPE ... >
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="...">
   <head>
      <title></title>
   </head>
   <body></body>
</html>

The opening html tag of an XHTML document must include a namespace declaration for the XHTML namespace.

The DOCTYPE declaration should appear immediately before the html tag in an XHTML document. It can follow one of three formats.

XHTML 1.0 Strict[edit]

<!DOCTYPE html
 PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
The Strict declaration is the least forgiving. This is the preferred DOCTYPE for new documents. Strict documents tend to be streamlined and clean. All formatting will appear in Cascading Style Sheets rather than the document itself. Elements that should be included in the Cascading Style Sheet and not the document itself include, but are not limited to:
<body text="blue">, <u>nderline</u>, <b>old</b>, <i>talics</i>, and <font color="#9900FF" face="Arial" size="+2">

There are also certain instances where your code needs to be nested within block elements.

Incorrect Example:
<p>I hope that you enjoy</p> your stay.
Correct Example:
<p>I hope that you enjoy your stay.</p>

XHTML 1.0 Transitional[edit]

<!DOCTYPE html
 PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

This declaration is intended as a halfway house for migrating legacy HTML documents to XHTML 1.0 Strict. The W3C encourages authors to use the Strict DOCTYPE for new documents. (The XHTML 1.0 Transitional DTD refers readers to the relevant note in the HTML4.01 Transitional DTD.)

This DOCTYPE does not require CSS for formatting; although, it is recommended. It generally tolerates inline elements found where block-level elements are expected.

There are a couple of reasons why you might choose this DOCTYPE for new documents.

  • You require backwards compatibility with browsers that support the formatting elements of XHTML but do not support CSS. This is a very small fraction of general users (less than 1%). Many browsers that don't support CSS don't support HTML 4.0 or XHTML either. However, it may be useful on a corporate intranet that has a larger than normal fraction of very old (pre-2000) browsers.
  • You need to link to frames. Using frames is discouraged as they work badly in many browsers.

XHTML 1.0 Frameset[edit]

<!DOCTYPE html
 PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

If you are creating a page with frames, this declaration is appropriate. However, since frames are generally discouraged when designing Web pages, this declaration should be used rarely.

XML Prolog[edit]

Additionally, XHTML authors are encouraged by the W3C to include the following processing instruction as the first line of each document:

<?xml version="1.0" encoding="UTF-8"?>

Although it is recommended by the standard, this processing instruction may cause errors in older Web browsers including Internet Explorer version 6. It is up to the individual author to decide whether to include the prolog.

Language[edit]

It is good practice to include the optional xml:lang attribute [2] on the html element to describe the document's primary language. For compatibility with HTML the lang attribute should also be specified with the same value. For an English language document use:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

The xml:lang and lang attributes can also be specified on other elements to indicate changes of language within the document, e.g. a French quotation in an English document.

Converting HTML to XHTML[edit]

In this section, we will discover how to transform an HTML document into an XHTML document. We will examine each of the following rules:

  • Documents must be well-formed
    • Tags must be properly nested
    • Elements must be closed
  • Tags must be lowercase
  • Attribute names must be lowercase
  • Attribute values must be quoted
  • Attributes cannot be minimized
  • The name attribute is replaced with the id attribute (in XHTML 1.0 both name and id should be used with the same value to maintain backwards-compatibility).
  • Plain ampersands are not allowed
  • Scripts and CSS must be escaped(enclose them within the tags <![CDATA[ and ]]>) or preferably moved into external files.

Documents must be well-formed[edit]

Because XHTML conforms to all XML standards, an XHTML document must be well-formed according to the W3C's recommendations for an XML document. Several of the rules here reemphasize this point. We will consider both incorrect and correct examples.

Tags must be properly nested[edit]

Browsers widely tolerate badly nested tags in HTML documents.

<b><u>
This text is probably bold and underlined, but inside incorrectly nested tags.
</b></u>

The text above would display as bold and underlined, even though the end tags are not in the proper order. An XHTML page will not display if the tags are improperly nested, because it would not be considered a valid XML document. The problem can be easily fixed.

<b><u>
This text is bold and underlined and inside properly nested tags.
</u></b>

Elements must be closed[edit]

Again, XHTML documents must be considered valid XML documents. For this reason, all tags must be closed. HTML specifications listed some tags as having "optional" end tags, such as the <p> and <li> tags.

<p>Here is a list:
<ul>
   <li>Item 1
   <li>Item 2
   <li>Item 3
</ul>

In XHTML, the end tags must be included.

<p>Here is a list: </p>
<ul>
   <li>Item 1</li>
   <li>Item 2</li>
   <li>Item 3</li>
</ul>

What should we do about HTML tags that do not have a closing tag? Some special tags do not require or imply a closing tag.

<img src="titlebar.gif" alt="Title">
<hr>
<br>
<p>Welcome to my web page!</p>

In XHTML, the XML rule of including a closing slash within the tag must be followed.

<img src="titlebar.gif" alt="title" />
<hr />
<br />
<p>Welcome to my Web page!</p>

Note that some of today's browsers will incorrectly render a page if the closing slash does not have a space before it (<br/>). Although it is not part of the official recommendation, you should always include the space (<br />) for compatibility purposes.

Here are the common empty tags in HTML:

  • area
  • base
  • basefont
  • br
  • hr
  • img
  • input
  • link
  • meta
  • param

Tags must be lowercase[edit]

In HTML, tags could be written in either lowercase or uppercase. In fact, some Web authors preferred to write tags in uppercase to make them easier to read. XHTML requires that all tags be lowercase.

<H1>This is an example of bad case.</h1>

This difference is necessary because XML differentiates between cases. XML would read <H1> and <h1> as different tags, causing problems in the above example.

<h1>This is an example of good case.</h1>

The problem can be easily fixed by changing all tags to lowercase.

Attribute names must be lowercase[edit]

Following the pattern of writing all tags in lowercase, all attribute names must also be in lowercase.

<p CLASS="specialText">Important Notice</p>

The correct tags are easy to create.

<p class="specialText">Important Notice</p>

Attribute values must be quoted[edit]

Some HTML values do not require quotation marks around them. They are understood by browsers.

<table border=1 width=100%>
</table>

XHTML requires all attributes to be quoted. Even numeric, percentage, and hexadecimal values must appear in quotations for them to be considered part of a proper XHTML document.

<table border="1"  width="100%">
</table>

Attributes cannot be minimized[edit]

HTML allowed some attributes to be written in shorthand, such as selected or noresize.

<form>
   <input checked ... />
   <input disabled ... />
</form>

When using XHTML, attribute minimization is forbidden. Instead, use the syntax x="x", where x is the attribute that was formerly minimized.

<form>
   <input checked="checked"  .../>
   <input disabled="disabled"  .../>
</form>

A complete list of minimized attributes follows:

  • checked
  • compact
  • declare
  • defer
  • disabled
  • ismap
  • nohref
  • noresize
  • noshade
  • nowrap
  • readonly
  • selected
  • multiple

The name attribute is replaced with the id attribute[edit]

HTML 4.01 standards define a name attribute for the tags a, applet, frame, iframe, img, and map.

<a name="anchor">
<img src="banner.gif" name="mybanner" />
</a>

XHTML has deprecated the name attribute. Instead, the id attribute is used. However, to ensure backwards compatibility with today's browsers, it is best to use both the name and id attributes.

<a name="anchor" id="anchor" >
<img src="banner.gif" name="mybanner" id="mybanner"  />
</a>

As technology advances, it will eventually be unnecessary to use both attributes and XHTML 1.1 removed name altogether.

Ampersands are not supported[edit]

Ampersands are illegal in XHTML.

<a href="home.aspx?status=done&amp;itWorked=false">Home &amp; Garden</a>

They must instead be replaced with the equivalent character code &amp;.

<a href="home.aspx?status=done&amp;amp;itWorked=false">Home &amp;amp; Garden</a>

Image alt attributes are mandatory[edit]

Because XHTML is designed to be viewed on different types of devices, some of which are not image-capable, alt attributes must be included for all images.

<img src="titlebar.gif">

Remember that the img tag must include a closing slash in XHTML!

<img src="titlebar.gif" alt="title"  />

Scripts and CSS must be escaped[edit]

Internal scripts and CSS often include characters like the ampersand and less-than characters.

<script language="JavaScript">
   <!--
      document.write('Hello World!'); 
   //-->
</script>

If you are using internal scripts or CSS, enclose them within the tags <![CDATA[ and ]]>. This will mark them as character data that should not be parsed. If you do not use these tags, characters like & and < will be treated as start-of-character entities (like &nbsp;) and tags (like <b>) respectively. This will cause your page to behave unpredictably, and it may invalidate your code.

Additionally, the type attribute is mandatory for scripts. The comment tags <!-- and --> that have traditionally been used to hide JavaScript from noncompliant browsers should not be included. The XML standard states that text enclosed in comment tags may be completely excluded from rendered documents, which would lose all script enclosed in the tags.

<script type="text/javascript" language="javascript">
/*<![CDATA[*/
   document.write('Hello World!');
/*]]>*/
</script>

Also document.write(); is not permitted in XHTML documents. You must used node creation methods such as document.createElementNS(); instead. Confusingly, document.write(); will appear to work as expected if the document is incorrectly served with a MIME type of text/html (the type for HTML documents), instead of application/xhtml+xml (the type for XHTML documents). If the MIME type is text/html the document will be parsed as HTML which allows document.write();. Parsing the document as HTML defeats the purpose of writing it in XHTML.

Similar changes must be made for internal stylesheets.

<style>
<!--
   .SpecialClass {
      color: #000000;
   }
-->
</style>

The type attribute must be included, and the CDATA tags should be used.

<style type="text/css">
/*<![CDATA[*/
   .SpecialClass {
      color: #000000;
   }
/*]]>*/
</style>

Because scripts and CSS may complicate an XHTML document, it is strongly recommended that they be placed in external .js and .css files, respectively. They can then be linked to from your XHTML document.

<script src="myscript.js" type="text/javascript" />

<link href="styles.css" type="text/css" rel="stylesheet" />

Some elements may not be nested[edit]

The W3C recommendations state that certain elements may not be contained within others in an XHTML document, even when no XML rules are violated by the inclusion. Elements affected are listed below.

Element Cannot contain ...
a a
pre big, img, object, small, sub, sup
button button, fieldset, form, iframe, input, isindex, label, select, textarea
label label
form form

When to convert[edit]

By now, it probably sounds as though converting an HTML document into XHTML is easy, but tedious. When would you want to convert your existing pages into XHTML? Before deciding to change your entire Web site, consider these questions.

  • Do you want your pages to be easily viewed over a nontraditional Internet-capable device, such as a PDA or Web-enabled telephone? Will this be a goal of your site in the future? XHTML is the language of choice for Web-enabled portable devices. Now may be a good time for you to commit to creating an all-XHTML site.
  • Do you plan to work with XML in the future? If so, XHTML may be a logical place to begin. If you head up a team of designers who are accustomed to using HTML, XHTML is a small step away. It may be less intimidating for beginners to learn XHTML than it is to try teaching them all about XML from scratch.
  • Is it important that your site be current with the most recent W3C standards? Staying on top of current standards will make your site more stable and help you stay updated in the future, as you will only have to make small changes to upgrade your site to the newest versions of XHTML as they are approved by the W3C.
  • Will you need to convert your documents to another format? As a valid XML document, XHTML can utilize XSL to be converted into text, plain HTML, another XHTML document, or another XML document. HTML cannot be used for this purpose.

If you answered yes to any of the above questions, then you should probably convert your Web site to XHTML.

MIME Types[edit]

XHTML 1.0 documents should be served with a MIME Type of application/xhtml+xml to Web browsers that can accept this type. XHTML 1.0 may be served with the MIME type text/html to clients that cannot accept application/xhtml+xml provided that the XHTML complies with the additional constraints in [Appendix C] of the XHTML 1.0 specification. If you cannot configure your Web server to serve documents as different MIME types, you probably should not convert your Web site to XHTML.

You should check that your XHTML documents are served correctly to browsers that support application/xhtml+xml, e.g. Mozilla Firefox. Use 'Page Info' to verify that the type is correct.

XHTML 1.1 documents are often not backwards compatible with HTML and should not be served with a MIME type of text/html.[3]

Help Converting[edit]

HTML Tidy[edit]

When creating HTML, it's very easy to make a mistake by leaving out an end tag or not properly nesting tags. HTML Tidy is a wonderful application that can be used to correct a number of errors with poorly formed HTML documents and convert it into XHTML. Tidy can also format ugly code to be more readable, including code generated by WYSIWYG editors. HTML Tidy can't generate clean code when it encounters problems it isn't sure of how to fix. In these cases, it will generate an error to let you know where the mistake is located in your document.

A few examples of problems that HTML Tidy can remedy:

  • Missing or mismatched end tags.
  • Improperly nested elements.
  • Mixed up tags.
  • Add a missing "/" to properly close tags.
  • Insert missing tags into lists.
  • Add missing quotes around attribute values.
  • Ability to insert the correct DOCTYPE value based on your code (can also recognize and report proprietary elements).

HTML Tidy can also be customized at runtime using a wide array of command line arguments. It is capable of indenting code to make it more readable as well as replacing FONT, NOBR, and CENTER tags with style tags and rules using CSS. Tidy can also be taught new tags by declaring them in the configuration file.

You can read more about HTML Tidy at the W3C's HTML Tidy site, as well as download the application as a binary or get the source code. There are several sites that offer HTML Tidy as an online service including the W3C and Site Valet.

You can also validate your page using the validator available at http://validator.w3.org/.

When not to convert[edit]

You shouldn't convert your Web pages if they will always be served with a MIME type of text/html. Make sure you know how to configure your server or server-side script to perform HTTP content negotiation so that XHTML capable browsers receive XHTML marked as application/xhtml+xml. If you can't set up content negotiation, stick to HTML 4.01. People viewing your Web pages with mainstream browsers will be unable to tell the difference between a valid HTML 4.01 web page and a valid XHTML 1.0 Web page.

Make sure the automated tests you run on your site simulate connections from both XHTML-compatible browsers, e.g. Mozilla Firefox, and non–XHTML-compatiable browsers, e.g. Internet Explorer 6.0. This is particularly important if you use Javascript on your Web site. If maintaining two copies of your test suite is too time consuming, don't convert.

Bear in mind that valid HTML 4.01 Strict documents generally require less effort to convert to XHTML 1.1 than valid XHTML 1.0 Transitional documents. A valid HTML 4.01 Strict document can only contain elements that are valid in XHTML 1.1, although a few attributes may need changing. XHTML 1.0 Transitional documents on the other hand can contain ten element types and more than a dozen attributes that are not valid in XHTML 1.1. The XHTML 1.0 Transitional body element alone has six atrributes that are not supported in XHTML 1.1.

Don't be pressured into using XHTML by people talking vaguely about bad practice. Pin them down to what they mean by bad practice. If they start talking about separation of content and presentation, they have confused the differences between HTML and XHTML with the differences between the Transitional and Strict doctypes. Both XHTML 1.0 Transitional and HTML 4.01 Transitional allow you to mix presentation and content in the same document, i.e. they allow this type of bad practice. Both HTML 4.01 Strict and XHTML 1.0 Strict force you to move the bulk of the presentation (but not all of it) in to CSS or an equivalent language. All four doctypes allow you to use embedded stylesheets, whereas, true separation requires that all CSS and Javascript be moved to external files.

XHTML 1.1[edit]

XHTML 1.0 is a suitable markup language for most purposes. It provides the option to separate content and presentation, which fits the needs of most Web authors. XHTML 1.1 enforces the separation of content and presentation. All deprecated elements and attributes have been removed. It also removes two attributes that were retained in XHTML 1.0 purely for backwards-compatibility. The lang attribute is replaced by xml:lang and name is replaced by id. Finally it adds support for ruby text found in East Asian documents.

DOCTYPE[edit]

The DOCTYPE for XHTML 1.1 is:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

Modularization[edit]

The modularization of XHTML, or XHTML m12n, provides suggestions for customizing XHTML, either by integrating subsets of XHTML into other XML applications or extending the XHTML element set. The framework defines two proceses:

  • How to group elements and attributes into "modules"
  • How to combine modules to create new markup languages

The resulting languages, which the W3C calls "XHTML Host Languages", are based on the familiar XHTML structure but specialized for specific purposes. XHTML 1.1 is an example of a host language. It was created by grouping the different elements available to XHTML.

XHTML variations, while possible in theory, have not been widely adopted. There is continuing work being done to develop host languages, but their details are beyond the scope of this discussion.

Invalid XHTML[edit]

XHTML-compliant browsers are allowed to render invalid XHTML documents provded that the documents are well-formed. A simple example is given below:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>Invalid XHTML</title>
  </head> 
  <body>
     <p>This sentence contains a <p>nested paragraph.</p></p>
  </body>
</html>

Save the example as invalid.xhtml (the .xhtml extension is important) and open the page with Mozilla Firefox. The page will render even though it is invalid.


Summary[edit]

XHTML stands for eXtensible HyperText Markup Language. XHTML is very similar to HTML, but it is stricter and easier to parse. XHTML documents must be well-formed just like regular XML. XHTML allows for modularization. XHTML code must be written according to an exact specification unlike with HTML, where simple errors like missing a closing tag are ignored by the browser. Adhering to these strict specifications actually allows XHTML to be more flexible than HTML. The benefits described in this summary are only gained if the MIME type of the document is application/xhtml+xml. XHTML documents can be validated but most browsers choose not to.

Exercises[edit]

Answers[edit]



XPath



Previous Chapter Next Chapter
XHTML XLink



Learning objectives

  • Be able to conceptualize an XML document as a node tree
  • Refer groups of elements in an XML document
  • Understand the differences between abbreviated and unabbreviated XPath syntax
  • Understand the differences between absolute and relative Paths
  • Be able to use XPath predicates and functions to refine an XPath's node-set


Introduction[edit]

Throughout the previous chapters you have learned the basic concepts of XSL and how you must refer to nodes in an XML document when performing an XSL transformation. Up to this point you have been using a straightforward syntax for referring to nodes in an XML document. Although the syntax you have used so far has been XPath there are many more functions and capabilities that you will learn in this chapter. As you begin to comprehend how path language is used for referring to nodes in an XML document your understanding of XML as a tree structure will begin to fall into place. This chapter contains examples that demonstrate many of the common uses of XPath, but for the full XPath specification, see the latest version of the standard at:

http://www.w3.org/TR/xpath

XSL uses XPath heavily.

XPath[edit]

When you go to copy a file or ‘cd’ into a directory at a command prompt you often type something along the lines of ‘/home/darnell/’ to refer to folders. This enables you to change into or refer to folders throughout your computer’s file system. XML has a similar way of referring to elements in an XML document. This special syntax is called XPath, which is short for XML Path Language.

XPath is a language for finding information in an XML document. XPath is used to navigate through elements and attributes in an XML document.

XPath, although used for referring to nodes in an XML tree, is not itself written in XML. This was a wise choice on the part of the W3C, because trying to specify path information in XML would be a very cumbersome task. Any characters that form XML syntax would need to be escaped so that it is not confused with XML when being processed. XPath is also very succinct, allowing you to call upon nodes in the XML tree with a great degree of specificity without being unnecessarily verbose.

XML as a tree structure[edit]

The great benefit about XML is that the document itself describes the structure of data. If any of you have researched your family history, you have probably come across a family tree. At the top of the tree is some early ancestor and at the bottom of the tree are the latest children.

With a tree structure you can see which children belong to which parents, which grandchildren belong to which grandparents and many other relationships.

The neat thing about XML is that it also fits nicely into this tree structure, often referred to as an XML Tree.

Understanding node relationships[edit]

We will use the following example to demonstrate the different node relationships.

<bookstore>
	<book> 
		<title>Less Than Zero</title>
		<author>Bret Easton Ellis</author>
		<year>1985</year>
		<price>13.95</price>
	</book> 
</bookstore>
Parent
Each element and attribute has one parent.
The book element is the parent of the title, author, year, and price:
Children
Element nodes may have zero, one or more children.
The title, author, year, and price elements are all children of the book element:
Siblings
Nodes that have the same parent.
The title, author, year, and price elements are all siblings:
Ancestors
A node's parent, parent's parent, etc.
The ancestors of the title element are the book element and the bookstore element:
Descendants
A node's children, children's children, etc.
Descendants of the bookstore element are the book, title, author, year, and price elements:

Also, it is still useful in some ways to think of an XML file as simultaneously being a serialized file, like you would view it in an XML editor. This is so you can understand the concepts of preceding and following nodes. A node is said to precede another if the original node is before the other in document order. Likewise, a node follows another if it is after that node in document order. Ancestors and descendants are not considered to be either preceding or following a node. This concept will come in handy later when discussing the concept of an axis.

Abbreviated vs. Unabbreviated XPath syntax[edit]

XPath was created so that nodes can be referred to very succinctly, while retaining the ability to search on many options. Most uses of XPath will involve searching for child nodes, parent nodes, or attribute nodes of a particular node. Because these uses are so common, an abbreviated syntax can be used to refer to these commonly-searched nodes. Following is an XML document that simulates a tree (the type that has leaves and branches.) It will be used to demonstrate the different types of syntax.

<?xml version="1.0" encoding="UTF-8"?>
    <trunk name="the_trunk"> 
        <bigBranch name="bb1" thickness="thick"> 
            <smallBranch name="sb1"> 
                <leaf name="leaf1" color="brown" />
		<leaf name="leaf2" weight="50" />
		<leaf name="leaf3" /> 
	    </smallBranch> 
	    <smallBranch name="sb2">
                <leaf name="leaf4" weight="90" /> 
		<leaf name="leaf5" color="purple" />   
            </smallBranch>
        </bigBranch> 
        <bigBranch name="bb2">
            <smallBranch name="sb3"> 
		<leaf name="leaf6" /> 
	    </smallBranch> 
	    <smallBranch name="sb4">	
		<leaf name="leaf7" /> 
		<leaf name="leaf8" /> 
		<leaf name="leaf9" color="black" /> 
		<leaf name="leaf10" weight="100" />	 
            </smallBranch>
        </bigBranch> 
    </trunk>

Exhibit 9.2: tree. xml – Example XML page

Following are a few examples of XPath location paths in English, Abbreviated XPath, then Unabbreviated XPath.


Selection 1:

English: All <leaf> elements in this document that are children of <smallBranch> elements that are children of <bigBranch> elements, that are children of the trunk, which is a child of the root.
Abbreviated: /trunk/bigBranch/smallBranch/leaf
Unabbreviated: /child::trunk/child::bigBranch/child::smallBranch/child::leaf

Selection 2:

English: The <bigBranch> elements with ‘name’ attribute equal to ‘bb3,’ that are children of the trunk element, which is a child of the root.
Abbreviated: /trunk/bigBranch[@name=’bb3’]
Unabbreviated: /child::trunk/child::bigBranch[attribute::name=’bb3’]

Notice how we can specify which bigBranch objects we want by using a predicate in the previous example. This narrows the search down to only bigBranch nodes that satisfy the predicate. The predicate is the part of the XPath statement that is in square brackets. In this case, the predicate is asking for bigBranch nodes with their ‘name’ attribute set to ‘bb3’.

The last two examples assume we want to specify the path from the root. Let’s now assume that we are specifying the path from a <smallBranch> node.

Selection 3:

English:The parent node of the current <smallBranch>. (Notice that this selection is relative to a <smallBranch>)
Abbreviated: ..
Unabbreviated: parent::node()

When using the Unabbreviated Syntax, you may notice that you are calling a parent or child followed by two colons (::). Each of those are called an axis. You will learn more about axes shortly.

Also, this may be a good time to explain the concept of a location path. A location path is the series of location steps taken to reach the node/nodes being selected. Location steps are the parts of XPath statements separated by / characters. They are one step on the way to finding the nodes you would like to select.

Location steps are comprised of three parts: an axis (child, parents, descendant, etc.), a node test (name of a node, or a function that retrieves one or more nodes), and a series of predicates (tests on the retrieved nodes that narrow the results, eliminating nodes that do not pass the predicate’s test).

So, in a location path, each of its location steps returns a node-list. If there are further steps on the path after a location step, the next step is executed on all the nodes returned by that step.

Relative vs. Absolute paths[edit]

When specifying a path with XPath, there are times when you will already be ‘in’ a node. But other times, you will want to select nodes starting from the root node. XPath lets you do both. If you have ever worked with websites in HTML, it works the same way as referring to other files in HTML hyperlinks. In HTML, you can specify an Absolute Path for the hyperlink, describing where another page is with the server name, folders, and filename all in the URL. Or, if you are referring to another file on the same site, you need not enter the server name or all of the path information. This is called a Relative Path. The concept can be applied similarly in XPath.

You can tell the difference by whether there is a ‘/’ character at the beginning of the XPath expression. If so, the path is being specified from the root, which makes it an Absolute Path. But if there is no ‘/’ at the beginning of the path, you are specifying a Relative Path, which describes where the other nodes are relative to the context node, or the node for which the next step is being taken.

Below is an XSL stylesheet (Exhibit 9.3) for use with our tree.xml file above (Exhibit 9.2).

<?xml version="1.0" encoding="UTF-8" ?>
 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="html"/>
                
<!-- Example of an absolute link. The element '/child::trunk' 
 is being specified from the root element. -->

 <xsl:template match="/child::trunk"> 

<html> 
    <head> 
        <title>XPath Tree Tests</title> 
    </head> 
     <body> 
    
<!-- Example of a relative link. The <for-each> xsl statement will 
    execute for every <bigBranch> node in the 
    ‘current’ node, which is the <trunk>node. -->

 <xsl:for-each select="child::bigBranch">
               
         <xsl:call-template name="print_out" />
           </xsl:for-each>
        </body> 
   </html> 
</xsl:template> 
      <xsl:template name="print_out"> 
             <xsl:value-of select="attribute::name" /> <br/>    
   </xsl:template>
 </xsl:stylesheet>

Exhibit 9.3: xsl_tree.xsl – Example of both a relative and absolute path

Four types of XPath location paths[edit]

In the last two sections you learned about two different distinctions to separate out different location paths: Unabbreviated vs. Abbreviated and Relative vs. Absolute. Combining these two concepts could be helpful when talking about XPath location paths. Not to mention, it could make you sound really smart in front of your friends when you say things like:

  1. Abbreviated Relative Location Paths- Use of abbreviated syntax while specifying a relative path.
  2. Abbreviated Absolute Location Paths- Use of abbreviated syntax while specifying a absolute path.
  3. Unabbreviated Relative Location Paths- Use of unabbreviated syntax while specifying a relative path.
  4. Unabbreviated Absolute Location Paths- Use of unabbreviated syntax while specifying a absolute path.

I only mention this four-way distinction now because it could come in handy while reading the specification, or other texts on the subject.

XPath axes[edit]

In XPath, there are some node selections whose performance requires the Unabbreviated Syntax. In this case, you will be using an axis to specify each location step on your way through the location path.

From any node in the tree, there are 13 axes along which you can step. They are as follows:


Axes Meaning
ancestor:: Parents of the current node up to the root node
ancestor-or-self:: Parents of the current node up to the root node and the current node
attribute:: Attributes of the current node
child:: Immediate children of the current node
descendant:: Children of the current node (including children's children)
descendant-or-self:: Children of the current node (including children's children) and the current node
following:: Nodes after the current node (excluding children)
following-sibling:: Nodes after the current node (excluding children) at the same level
namespace:: XML namespace of the current node
parent:: Immediate parent of the current node
preceding:: Nodes before the current node (excluding children)
preceding-sibling:: Nodes before the current node (excluding children) at the same level
self:: The current node

XPath predicates and functions[edit]

Sometimes, you may want to use a predicate in an XPath Location Path to further filter your selection. Normally, you would get a set of nodes from a location path. A predicate is a small expression that gets evaluated for each node in a set of nodes. If the expression evaluates to ‘false’, then the node is not included in the selection. An example is as follows:

//p[@class=‘alert’]

In the preceding example, every <p> tag in the document is checked to see if its ‘class’ attribute is set to ‘alert’. Only those <p> tags with a ‘class’ attribute with value ‘alert’ are included in the set of nodes for this location path.

The following example uses a function, which can be used in a predicate to get information about the context node.

/book/chapter[position()=3]

This previous example selects only the chapter of the book in the third position. So, for something to be returned, the current <book> element must have at least 3 <chapter> elements.

Also notice that the position function returns an integer. There are many functions in the XPath specification. For a complete list, see the W3C specification at http://www.w3.org/TR/xpath#corelib

Here are a few more functions that may be helpful:

number last() – last node in the current node set

number position() – position of the context node being tested

number count(node-set) – the number of nodes in a node-set

boolean starts-with(string, string) – returns true if the first argument starts with the second

boolean contains(string, string) – returns true if the first argument contains the second

number sum(node-set) – the sum of the numeric values of the nodes in the node-set

number floor(number) – the number, rounded down to the nearest integer

number ceiling(number) – the number, rounded up to the nearest integer

number round(number) – the number, rounded to the nearest integer

Example[edit]

The following XML document, XSD schemas, and XSL stylesheet examples are to help you put everything you have learned in this chapter together using real life data. As you study this example you will notice how XPath can be used in the stylesheet to call and modify the output of specific information from the document.

Below is an XML document (Exhibit 9.4)

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="movies.xsl" type="text/xsl" media="screen"?>
<movieCollection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="movies.xsd"> 

<movie>
    <movieTitle>Meet the Parents</movieTitle>
    <movieSynopsis>
    Greg Focker is head over heels in love with his girlfriend Pam, and is ready to
    pop the big question. When his attempt to propose is thwarted by a phone call
    with the news that Pam's younger sister is getting married, Greg realizes that
    the key to Pam's hand in marriage lies with her formidable father.
    </movieSynopsis>
    <role>
        <roleIDREF>bs1</roleIDREF>
        <roleType>Lead Actor</roleType>
    </role>
    <role>
        <roleIDREF>tp1</roleIDREF>
        <roleType>Lead Actress</roleType>
    </role>
    <role>
        <roleIDREF>rd1</roleIDREF>
        <roleType>Lead Actor</roleType>
    </role>
    <role>
        <roleIDREF>bd1</roleIDREF>
        <roleType>Supporting Actress</roleType>
    </role>      
</movie>

<movie>
    <movieTitle>Elf</movieTitle>
    <movieSynopsis>
    One Christmas Eve, a long time ago, a small baby at an orphanage crawled into
    Santa’s bag of toys, only to go undetected and accidentally carried back to Santa’s 
    workshop in the North Pole. Though he was quickly taken under the wing of a surrogate
    father and raised to be an elf, as he grows to be three sizes larger than everyone else, 
    it becomes clear that Buddy will never truly fit into the elf world. What he needs is
    to find his real family. This holiday season, Buddy decides to find his true place in the
    world and sets off for New York City to track down his roots.
    </movieSynopsis>
    <role>
        <roleIDREF>wf1</roleIDREF>
        <roleType>Lead Actor</roleType>
    </role>
    <role>
        <roleIDREF>jc1</roleIDREF>
        <roleType>Supporting Actor</roleType>
    </role>
    <role>
        <roleIDREF>zd1</roleIDREF>
        <roleType>Lead Actress</roleType>
    </role>
    <role>
        <roleIDREF>ms1</roleIDREF>
        <roleType>Supporting Actress</roleType>
    </role>      
    </movie>

<castMember>
    <castMemberID>rd1</castMemberID>
    <castFirstName>Robert</castFirstName>
    <castLastName>De Niro</castLastName>
    <castSSN>489-32-5984</castSSN>
    <castGender>male</castGender>     
</castMember> 

<castMember>
    <castMemberID>bs1</castMemberID>
    <castFirstName>Ben</castFirstName>
    <castLastName>Stiller</castLastName>
    <castSSN>590-59-2774</castSSN>
    <castGender>male</castGender>     
</castMember>

<castMember>
    <castMemberID>tp1</castMemberID>
    <castFirstName>Teri</castFirstName>
    <castLastName>Polo</castLastName>
    <castSSN>099-37-8765</castSSN>
    <castGender>female</castGender>      
</castMember>  

<castMember>
    <castMemberID>bd1</castMemberID>
    <castFirstName>Blythe</castFirstName>
    <castLastName>Danner</castLastName>
    <castSSN>273-44-8690</castSSN>
    <castGender>male</castGender>     
</castMember> 

<castMember>
    <castMemberID>wf1</castMemberID>
    <castFirstName>Will</castFirstName>
    <castLastName>Ferrell</castLastName>
    <castSSN>383-56-2095</castSSN>
    <castGender>male</castGender>     
</castMember>
  
<castMember>
    <castMemberID>jc1</castMemberID>
    <castFirstName>James</castFirstName>
    <castLastName>Caan</castLastName>
    <castSSN>389-49-3029</castSSN>
    <castGender>male</castGender>      
</castMember> 
      
<castMember>
    <castMemberID>zd1</castMemberID>
    <castFirstName>Zooey</castFirstName>
    <castLastName>Deschanel</castLastName>
    <castSSN>309-49-4005</castSSN>
    <castGender>female</castGender>      
</castMember>

<castMember>
    <castMemberID>ms1</castMemberID>
    <castFirstName>Mary</castFirstName>
    <castLastName>Steenburgen</castLastName>
    <castSSN>988-43-4950</castSSN>
    <castGender>female</castGender>      
</castMember>

</movieCollection>

Exhibit 9.4: movies_xpath.xml

Below is the second XML document (Exhibit 9.5)

<?xml version="1.0" encoding="UTF-8"?>

<cities xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="cities.xsd">

<city>
    <cityID>c2</cityID>
    <cityName>Mandal</cityName>
    <cityPopulation>13840</cityPopulation>
    <cityCountry>Norway</cityCountry>
    <tourismDescription>A small town with a big atmosphere.  Mandal provides comfort
away from normal luxuries.
    </tourismDescription>
    <capitalCity>c3</capitalCity>
</city>

<city>
    <cityID>c3</cityID>
    <cityName>Oslo</cityName>
    <cityPopulation>533050</cityPopulation>
    <cityCountry>Norway</cityCountry>
    <tourismDescription>Oslo is the capital of Norway for many reasons.
    It is also the capital location for tourism.  The culture, shopping,
    and attractions can all be experienced in Oslo.  Just remember
    to bring your wallet.
    </tourismDescription>
</city>

</cities>

Exhibit 9.5: cites__xpath.xml

Below is the Movies schema (Exhibit 9.6)

<?xml version="1.0" encoding="UTF-8"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">

  <!--Movie Collection-->

  <xsd:element name="movieCollection">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="movie" type="movieDetails" minOccurs="1" maxOccurs="unbounded"/>
        
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>

  <!--This contains the movie details.-->

  <xsd:complexType name="movieDetails">
    <xsd:sequence>
      <xsd:element name="movieTitle" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/>
      <xsd:element name="movieSynopsis" type="xsd:string"/>
      <xsd:element name="role" type="roleDetails" minOccurs="1" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

 <!--The contains the genre details.-->

  <xsd:complexType name="roleDetails">
    <xsd:sequence>
       <xsd:element name="roleIDREF" type="xsd:IDREF"/>
       <xsd:element name="roleType" type="xsd:string"/>     
    </xsd:sequence>
  </xsd:complexType>

  <xsd:simpleType name="ssnType">
       <xsd:restriction base="xsd:string">
           <xsd:pattern value="\d{3}-\d{2}-\d{4}"/>
       </xsd:restriction>
   </xsd:simpleType>
  
 <xsd:complexType name="castDetails">
    <xsd:sequence>
       <xsd:element name="castMemberID" type="xsd:ID"/>
       <xsd:element name="castFirstName" type="xsd:string"/>
       <xsd:element name="castLastName" type="xsd:string"/>
       <xsd:element name="castSSN" type="ssnType"/>
       <xsd:element name="castGender" type="xsd:string"/>   
    </xsd:sequence>
  </xsd:complexType>
  
</xsd:schema>

Exhibit 9.6: movies.xsd

Below is the Cities schema (Exhibit 9.7)

<?xml version="1.0" encoding="UTF-8"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
attributeFormDefault="unqualified">

<xsd:element name="cities">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element name="city" type="cityType" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element>
<xsd:complexType name="cityType">
  <xsd:sequence>
    <xsd:element name="cityID" type="xsd:ID"/>
     <xsd:element name="cityName" type="xsd:string"/>
     <xsd:element name="cityPopulation" type="xsd:integer"/>
     <xsd:element name="cityCountry" type="xsd:string"/>
     <xsd:element name="tourismDescription" type="xsd:string"/>
     <xsd:element name="capitalCity" type="xsd:IDREF" minOccurs="0" maxOccurs="1"/>
  </xsd:sequence>
</xsd:complexType>
</xsd:schema>

Exhibit 9.7: cities.xsd


Below is the XSL stylesheet (Exhibit 9.8)

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="castList" match="castMember" use="castMemberID"/>
<xsl:output method="html"/>

<!-- example of using an abbreviated absolute path to pull info 
from cities_xpath.xml for the city "Oslo" specifically -->

<!-- specify absolute path to select cityName and assign it the variable "city" -->
<xsl:variable name="city" select="document('cities_xpath.xml')
/cities/city[cityName='Oslo']/cityName" />

<!-- specify absolute path to select cityCountry and assign it the variable "country" -->
<xsl:variable name="country" select="document('cities_xpath.xml')
/cities/city[cityName='Oslo']/cityCountry" />

<!-- specify absolute path to select tourismDescription and assign it the variable "description" -->
<xsl:variable name="description" select="document('cities_xpath.xml')
/cities/city[cityName='Oslo']/tourismDescription" />

<xsl:template match="/">
<html>
    <head>
        <title>Movie Collection</title>
    </head>
    <body>
        <h2>Movie Collection</h2>
    <xsl:apply-templates select="movieCollection"/>
    </body>
</html>
</xsl:template>
<xsl:template match="movieCollection">
    
<!-- let's say we just want to see the actors. -->
<!--
<xsl:for-each select="movie">
<hr />
<br />
<b><xsl:text>Movie Title: </xsl:text></b>
<xsl:value-of select="movieTitle"/>
<br />
<br />
<b><xsl:text>Movie Synopsis: </xsl:text></b>           
<xsl:value-of select="movieSynopsis"/>
<br />
<br />-->

<!-- actor info begins here. -->
<b><xsl:text>Cast: </xsl:text></b>
<br />
<!-- specify an abbreviated relative path here for "role." 
NOTE: there is no predicate in this one; it's just a path. -->

<xsl:for-each select="movie/role"> 
<xsl:sort select="key('castList',roleIDREF)/castLastName"/>
<xsl:number value="position()" format="&#xa; 0. " />
<xsl:value-of select="key('castList',roleIDREF)/castFirstName"/>
<xsl:text>   </xsl:text>               
<xsl:value-of select="key('castList',roleIDREF)/castLastName"/>               
<xsl:text>,   </xsl:text>
<xsl:value-of select="roleType"/>
<br />
<xsl:value-of select="key('castList',roleIDREF)/castGender"/>
<xsl:text>,   </xsl:text>
<xsl:value-of select="key('castList',roleIDREF)/castSSN"/>
<br />
<br />
</xsl:for-each>      
<!--
</xsl:for-each>-->
<hr />

<!--calling the variables -->

<font color="red">
<p><b>Travel Advertisement</b></p>

<!-- reference the city, followed by a comma, and then the country -->
<p><xsl:value-of select="$city" />, <xsl:value-of select="$country" /></p>

<!-- reference the description -->
<xsl:value-of select="$description" />

</font>      
</xsl:template>
</xsl:stylesheet>

Exhibit 9.6: movies.xsl

Summary[edit]

Throughout the chapter we have learned many of the features and capabilities of the XML Path Language. You should now have a good understanding of node relationships though the use of the XML tree structure. Using the concept of Abbreviated and Unabbreviated location paths allows us to narrow our searches down to only a particular element by satisfying the predicate in the square brackets. Relative and Absolute are used for specifying the path to your location. The Relative path gives the file location in relation to the current working directory while the Absolute path gives an exact location of a file or directory name within a computer or file system. Both of these concepts can be combined to come up with four types of XPath location paths: Abbreviated Relative, Abbreviated Absolute, Unabbreviated Relative, and lastly Unabbreviated Absolute. If further filtering is required XPath predicates and functions can be used. These allow for the predicate to be evaluated for such things as true/false and count functions. When used correctly XPath can be a very powerful tool in the XML language.

Exercises[edit]

Answers[edit]



XLink



Previous Chapter Next Chapter
XPath CSS



Learning objectives

  • Learn different techniques of implementing XLink's in XML
  • create a custom XLink
  • learn the functionality behind various XLink parameters

sponsored by:

The University of Georgia

Terry College of Business

Department of Management Information Systems



Introduction[edit]

Through the use of Uniform Resource Identifiers (URI's), an XLink allows elements to be inserted into XML documents that create links between resources such as documents, images, files and other pages. An XLink is similar in concept to an HTML hyperlink, but is more powerful and flexible.

This chapter will be a general overview of the XLink syntax. It will also provide exposure to some of XLink's basic concepts. For the full XLink specification, see the latest version of the standard at:

http://www.w3.org/TR/xlink

XLink[edit]

XLinks create a linking relationship between two or more resources. They allow for any XML element, image, text or markup files to be specified in the link.

By using a method similiar to the centralized formatting of XSL stylesheets, XLinks allow a document's hyperlinks to be isolated and centralized in a separate document. As a linked document's addresses changes, the XLink remains functional.

The use of XLink requires the declaration of the XLink namespace. This namespace provides the global attributes for type, href, role, arcrole, title, show, actuate, label, from and to. The following example would make the prefix xlink available within the tourGuide element.

<tourGuide
  xmlns:xlink="http://www.w3.org/1999/xlink">
  ...
</tourGuide>

XLink global attributes[edit]

The following table outlines the attributes that can be used with the xlink namespace. The global attributes are type, href, role, arcrole, title, show, actuate, label, from, and to. The table also includes descriptions of how the attributes can be used.


Exhibit 1: Table of global attributes

Attributes

Description and Valid Values

type

Describes the meaning of an item

  • simple - basic format similar to html linkage
  • extended - more complex than simple types with multi functional format
  • resource - provides local resources
  • locator - provides remote resources
  • arc - provides the ability to traverse from one resource to another
  • title - readable name or explanation of link

href

Location of resource

  • value is URI

role

Description of XLink's content

  • value is URI
  • describes the element whose role it is

arcrole

Description of XLink's content

  • value is URI
  • describe the relationship between the two sides of the arc

title

Name displayed, usually short description of link

show

Describes behavior of the browser once the XLink has been actuated and loaded

  • new - load in a new window or frame
  • replace - load in same window or frame
  • embed - replace the current item
  • other - look for information elsewhere
  • none - not specified

actuate

Specifies when resource is retrieved or link processing occurs

  • onRequest - when user requests the link
  • onLoad - when page is loaded
  • other - looks for information elsewhere
  • none - not specified

label, from & to

Specifies link direction



XML schema[edit]


The following XML schema defines a tour guide that contains at least one city. Each city contains one or more attractions. The name of each attraction is an XLink.

Exhibit 2: XML schema for TourGuide

<?xml version="1.0" encoding="UTF-8"?>
<!--
      Document   : TourGuide.xsd
      Created on : February 28, 2006
      Author     : Billy Timmins
-->
<!--
      Declaration of usage of xlink Namespace
-->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified"
            xmlns:xlink="http://www.w3.org/1999/xlink">  
    <xsd:element name="tourGuide">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="city" type="cityDetails" minOccurs="1" maxOccurs="unbounded" />
            </xsd:sequence>
        </xsd:complexType>
     </xsd:element>
     <!--
     This section will contain the City details
     -->
     <xsd:complexType name="cityDetails">
         <xsd:sequence>
           <xsd:element name="cityName" type="xsd:string"/>
           <xsd:element name="adminUnit" type="xsd:string"/>
           <xsd:element name="country" type="xsd:string"/>
           <xsd:element name="continent">
                <xsd:simpleType>
                  <xsd:restriction base="xsd:string">
                  <xsd:enumeration value="Asia"/>
                  <xsd:enumeration value="Africa"/>
                  <xsd:enumeration value="Australia"/>
                  <xsd:enumeration value="Europe"/>
                  <xsd:enumeration value="North America"/>
                  <xsd:enumeration value="South America"/>
                  <xsd:enumeration value="Antarctica"/>
                  </xsd:restriction>
                </xsd:simpleType>
            </xsd:element>
            <xsd:element name="population" type="xsd:integer"/>
            <xsd:element name="description" type="xsd:string"/>
            <xsd:element name="attraction" type="attractionDetails" minOccurs="1" maxOccurs="unbounded"/>
         </xsd:sequence>
     </xsd:complexType>
     <xsd:complexType name="attractionDetails">
         <xsd:sequence>
         <!--   
         Note use of xlink
         -->
            <xsd:element name="attractionName" xlink:type="simple"/>
            <xsd:element name="attractionDescription" type="xsd:string"/>
            <xsd:element name="attractionRating" type="xsd:integer"/>
        </xsd:sequence>
     </xsd:complexType>
</xsd:schema>

XML document[edit]


The following XML document shows how the XLink, attractionName, defined in the XML schema, is used in an XML document. Note that it is necessary to include xlink:href="" within the attribute tags in order to define the linked website.

Exhibit 3: XML document for TourGuide.xsd (using XLink)

<?xml version="1.0" encoding="UTF-8"?>
<!--
      Document   : SomeTourGuide.xml
      Created on : February 28, 2006
      Author     : Billy Timmins
-->
<!--
      Declaration of usage of XLink Namespace
-->
<?xml-stylesheet href="TourGuide.xsl" type="text/xsl"?>
<tourGuide xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xlink="http://www.w3.org/1999/xlink" xsi:noNamespaceSchemaLocation="TourGuide.xsd">
    <city>
        <cityName>Atlanta</cityName>
        <adminUnit>Georgia</adminUnit>
        <country>USA</country>
        <continent>North America</continent>
        <population>425000</population>
        <description>Atlanta is the capital of and largest city in the U.S. state of Georgia.</description>
        <attraction>
            <!--
            Declaration of XLink and associated link
            -->
            <attractionName xlink:href="http://www.georgiaaquarium.org/"> Georgia Aquarium </attractionName>
            <attractionDescription>World’s Largest Aquarium</attractionDescription>
            <attractionRating>5</attractionRating>
        </attraction>
        <attraction>
            <!--
            Declaration of XLink and associated link
            -->
            <attractionName xlink:href="http://www.high.org/"> High Museum of Art </attractionName>
            <attractionDescription>The High Museum of Art, founded in 1905 as the Atlanta Art Association, is the leading art museum in the Southeastern United States.</attractionDescription>
            <attractionRating>4</attractionRating>
        </attraction>
        <attraction>
            <!--
            Declaration of XLink and associated link
            -->
            <attractionName xlink:href="http://www.underground-atlanta.com/"> Underground Atlanta </attractionName>
            <attractionDescription> Go beneath the streets of a bustling downtown, to the heart of a great American city.  Underground Atlanta is at the center of it all.</attractionDescription>
            <attractionRating>2</attractionRating>
        </attraction>        
    </city>
    <city>
        <cityName>Tampa</cityName>
        <adminUnit>Florida</adminUnit>
        <country>USA</country>
        <continent>North America</continent>
        <population>303000</population>
        <description>Tampa is a major United States city located in Hillsborough County, on the west coast of Florida.</description>
        <attraction>
            <!--
            Declaration of XLink and associated link
            -->
            <attractionName xlink:href="http://www.buschgardens.com/buschgardens/fla/default.aspx"> Bush Gardens </attractionName>
            <attractionDescription>The nation's fourth largest zoo, Bush Gardens is where you can see African animals roaming free and an exciting amusement park featuring its world-famous rides like Kumba and the new inverted roller-coaster, Montu.</attractionDescription>
            <attractionRating>5</attractionRating>
        </attraction>
        <attraction>
            <!--
            Declaration of XLink and associated link
            -->
            <attractionName xlink:href="http://www.plantmuseum.com/"> Henry B. Plant Museum </attractionName>
            <attractionDescription>Discover a museum which transports you to turn-of-the-century Florida.</attractionDescription>
            <attractionRating>1</attractionRating>
        </attraction>      
    </city>
</tourGuide>

XML stylesheet[edit]


The following XML stylesheet displays the contents of the XML document.

Exhibit 4: XML stylesheet TourGuide

<?xml version="1.0" encoding="UTF-8"?>
<!--
      Document   : TourGuide.xsl
      Created on : February 28, 2006
      Author     : Billy Timmins
-->
<!--
      Declaration of usage of XLink Namespace
-->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xlink="http://www.w3.org/1999/xlink" exclude-result-prefixes="xlink" version="1.0">
    <xsl:output method="html"/>
    <!--
    Attribute XLink defined as an href of simple type
    -->
    <xsl:template match="*[@xlink:type = 'simple' and @xlink:href]">
        <a href="{@xlink:href}">
            <xsl:apply-templates/>
        </a>
    </xsl:template>
    <xsl:template match="/">
        <html>
            <head>
                <title>Tour Guide XLink Example</title>
            </head>
            <body>
                <h2>Cities</h2>
                <xsl:apply-templates select="tourGuide"/>
            </body>
        </html>
    </xsl:template>
    <!-- 
    template for handling a link 
    -->
    <xsl:template match="attractionName">
        <a href="{@xlink:href}">
            <xsl:value-of select="."/>
        </a>
    </xsl:template>
    <xsl:template match="tourGuide">
        <table border="1" width="100%">
            <xsl:for-each select="city">
                <tr>
                    <td>
                        <br/>
                        <xsl:text>City: </xsl:text>
                        <xsl:value-of select="cityName"/>
                        <br/>
                        <xsl:text>County: </xsl:text>
                        <xsl:value-of select="adminUnit"/>
                        <br/>    
                        <xsl:text>Continent: </xsl:text>
                        <xsl:value-of select="continent"/>
                        <br/>
                        <xsl:text>Population: </xsl:text>
                        <xsl:value-of select="population"/>
                        <br/>
                        <xsl:text>Description: </xsl:text>
                        <xsl:value-of select="description"/>
                        <br/>              
                        <br/>
                    </td>
                </tr>
                <tr>
                    <td>
                        <xsl:text>Attraction: </xsl:text>
                    </td>
                    <td>
                        <xsl:text>Attraction Description: </xsl:text>
                    </td>
                    <td>
                        <xsl:text>Attraction Rating: </xsl:text>
                    </td>
                </tr>
                <xsl:for-each select="attraction">
                    <tr>
                        <td>
                            <!--
                            application of the template
                            -->
                            <xsl:apply-templates select="attractionName"/>
                        </td>
                        <td>
                            <xsl:value-of select="attractionDescription"/>
                        </td>
                        <td>
                            <xsl:value-of select="attractionRating"/>
                        </td>
                    </tr>
                </xsl:for-each>
            </xsl:for-each>
        </table>
    </xsl:template>
</xsl:stylesheet>

Summary[edit]

XLink is an extremely versatile specification that standardizes the process for linking to other data sources. Not only does XLink support unidirectional linking similar to an anchor tag in HTML but also can be used to create bidirectional links. Additionally, XLink allows for the linkage from any XML element. This gives great freedom to the developer.

Exercises[edit]

Answers[edit]



CSS



Previous Chapter Next Chapter
XLink XSLT and Style Sheets




Learning objectives[edit]

Upon completion of this chapter, for CSS you will be able to

  • know the benefits of using CSS
  • know the limitations of CSS, so you are able to find the best solution for your document
  • know how to implement and use CSS on an XML document

Introduction[edit]

CSS (Cascading Style Sheets) is a language that describes the presentation form of a structured document.

An XML or an HTML based document does not have a set style, but it consists of structured text without style information. How the document will look when printed on paper and viewed in a browser or maybe a cellphone is determined by a style sheet. A good way of making a document look consistent and easy to update is by using CSS, which Wikipedia is a good example of.

History of CSS[edit]

Style sheets have been around in one form or another since the beginnings of HTML in the early 1990s. Various browsers included their own style language which could be used to customize the appearance of web documents. Originally, style sheets were targeted towards the end-user; early revisions of HTML did not provide many facilities for presentational attributes, so it was often up to the user to decide how web documents would appear.

As the HTML language grew, however, it came to encompass a wider variety of stylistic capabilities to meet the demands of web developers. With these capabilities, style sheets became less important, and an external language for the purposes of defining style attributes was not widely accepted until the development of CSS.

The concept of Cascading Style Sheets was originally proposed in 1994 by Håkon Wium Lie. Bert Bos was at the time working on a browser called Argo which used its own style sheets; the two decided to work together to develop CSS.

A number of other style sheet languages had already been proposed, but CSS was the first to incorporate the idea of "cascading" -- the capability for a document's style to be inherited from more than one "style sheet." This permitted a user's preferred style to override the site author's specified style in some areas, while inheriting, or "cascading" the author's style in other areas. The capability to cascade in this way permits both users and site authors added flexibility and control; it permitted a mixture of stylistic preferences.

Håkon's proposal was presented at the "Mosaic and the Web" conference in Chicago in 1994, and again with Bert Bos in 1995. Around this time, the World Wide Web Consortium was being established; the W3C took an interest in the development of CSS, and organized a workshop toward that end. Håkon and Bert were the primary technical staff on the project, with additional members, including Thomas Reardon of Microsoft, participating as well. By the end of 1996, CSS was nearly ready to become official. The CSS level 1 Recommendation was published in December 1996.

Early in 1997, CSS was assigned its own working group within the W3C. The group began tackling issues that had not been addressed with CSS level 1, resulting in the creation of CSS level 2, which was published as an official Recommendation in May 1998. CSS level 3 is still under development as of 2005.

Why use CSS?[edit]

Cleaner Looking Code[edit]

A mass of HTML tags which manage design elements generally obscure the content of a page, making the code harder to read and maintain. Using CSS, the content of the page is separated from the design, making content production in formats such as HTML, XHTML, and XML as easy as possible.

Pages Will Load Faster[edit]

Non-CSS design typically consists of more code than a CSS-designed website.

In a non-CSS design, the information about the design is reloaded every time a visitor accesses a new page. Additionally, the finer points of design are executed awkwardly. For example, a common method of defining the spacing of a web page is to use blank GIF images inside tables.

Using CSS keeps content and design separated, so much less code will be needed. The CSS file loads only once per session, and is saved locally in the user's cache. All information about dimensions is defined in this stylesheet, rendering awkward constructions like blank GIF images unnecessary.

Although an increasing amount of Internet users have broadband, the size of a web page can be important to users who are limited to dial-up connections. Suppose a dial-up user accesses a company's website, and this visitor experiences lengthy loading times. It is quite possible that the visitor would stop their visit or form an opinion of this company as "slow." In this way, a seemingly small difference could mean added revenue.

Furthermore, bandwidth is not free and most webhosting firms limit the amount used. In fact, many hosts charge based on bandwidth usage, so less code could also reduce costs.

Redesign Becomes Trivial[edit]

When used properly, CSS is a very powerful tool that gives a web architect complete control over a site's presentation. It is a notation in which the rules of a design are governed. This becomes very useful for a large website which requires a consistent appearance for every type of element (such as a title, a subtitle, a piece of code, or a paragraph).

For example, suppose a company has a 1,200 page website which took many months to complete. The company then undergoes a rebranding and thus the font, the background, the style of hyperlinks, and so forth needs to be updated with the new corporate design. If the site was engineered properly using CSS, this change would be as simple as editing the appropriate lines of a single CSS file (assuming it is an external stylesheet). If CSS is not used, the code that manages the appearance is stored in each of the pages. In order to update the design in this case, each file would have to be updated individually.

UHURU Online is an Internet Media company whose main focus in to bring the best website design services and internet marketing expertise to our clients. We deliver customized websites to suit our clients’ online business requirements. Our goal is to design websites that give you a lot of potential customers (leads), bookings and high conversions. Our Services: Internet Marketing Search Engine Optimization Social Media Marketing Email Marketing Online Advertising Graphic design-->

Graceful Degradation[edit]

Accessibility[edit]

People with lowered vision or users with special web browsers, e.g. people that are blind, will probably like a CSS designed website better than one not designed using CSS. Because CSS allows you to define the reading order separately from the visual layout it makes it easier for the special web browsers to read the page. Bear in mind that anyone who wears glasses or contact lenses can be considered to have lower vision.

Many designers lock the font size in pixels which prevents the user changing the font size. Good CSS design allows the user to increase or decrease the font size at will making pages more usable. A significant number of web surfers like to use a magnification of 300% or more.

Giving the user the opportunity to change the font size will not make any difference for the normal user, but it can make a difference for people that have lowered vision. Ask yourself the question: who is the website made for? The visitors or the designer?

Websites designed with CSS tend to display better than table-based designs in the web browsers used in PDAs and cellphones. The use of cellphones for browsing will probably continue to increase. A table-based design will make web pages inaccessible to these users.

Be careful with your CSS designs. Misuse of absolute positioning and absolute rather than relative sizes can make your webpages less accessible rather than more accessible. A good table design is better than a bad CSS design.

Better results in search engines[edit]

Extensive use of tables confuses the search engines, they can actually get problems separating content from code. The search engine robots start reading on the top of the page, and they want to find out how relevant the webpage is as fast as possible. Again, less code will make it easier for the search engines to find code that's relevant, and it will probably give your webpage a better ranking.

Disadvantages of CSS[edit]

The use of CSS for styling has few disadvantages. However some browsers, especially older ones, will sometimes present the page incorrectly. When I was gathering information for this chapter it became clear to me that many experts think that formatting XML with CSS is not the future of the web. The main view is that XSL will be the new standard. So make sure you read through the previous chapter of this book one more time. The formatting parts of XSL and CSS will be quite similar. For example, you will be able to use all CSS1 and CSS2 properties and values in XSL with the same meaning as in CSS.

CSS levels[edit]

The first CSS specification to become an official W3C Recommendation is CSS level 1, published in December 1996. Among its capabilities is support for:

  • Typeface|Font properties such as typeface and emphasis
  • Color of text, backgrounds, and other elements
  • Text attributes such as spacing between words, letters, and lines of text
  • alignment (typesetting)|Alignment of text, images, tables and other elements
  • Margin, border, padding, and positioning for most elements
  • Unique identification and generic classification of groups of attributes

The W3C maintains the CSS1 Recommendation.

CSS level 2 was developed by the W3C and published as a Recommendation in May 1998. A superset of CSS1, CSS2 includes a number of new capabilities, among them the absolute, relative, and fixed positioning of elements, the concept of media types, support for aural style sheets and bidirectional text, and new font properties such as shadows. The W3C maintains the CSS2 Recommendation.

CSS level 2 revision 1 or CSS 2.1 fixes errors in CSS2, removes poorly-supported features and adds already-implemented browser extensions to the specification. It's currently a Candidate Recommendation.

CSS level 3 is currently under development. The W3C maintains a CSS3 progress report.

CSS Syntax and Properties[edit]

The following section contains a list of some of the most common CSS properties. A complete list can be found here. The syntax for the use of CSS in an XML document is the same as that for HTML. The difference is in how you link your CSS file to the XML document. To do this you have to write <?xml-stylesheet href="X.css" type="text/css"?> before the root element of your XML document, where X.css of course is the name of the CSS file.

As mentioned earlier in this chapter, CSS is a set of rules that determines how elements in a document will be shown. The rule has two parts: a selector and a group of one or more declarations surrounded by braces (curly brackets):

selector { declaration; ...}

The selector is normally the tag you wish to style. Here is an example of a simple rule containing a single declaration:

h1 { color: red; }

Result: All h1-elements in the document are shown with the text color red.

The general syntax[edit]

Rules are usually defined like this:

selector { declaration; ...}

The declaration is formed like this:

property: value;

Remember that there can be several declarations in one rule. A common mistake is to mix up colons, which separate the property and value of a declaration, and semicolons, which separate declarations. A selector chooses the elements for which the rule applies and the declaration sets the value for the different properties of the elements that are chosen.

Back to our example:

h1 { color: red; }

In our example:

selector is the element h1
declaration color: red

The property color gets the value red

Multiple declarations can be written either on a single line or over several lines, because whitespace collapses:

h1 { color:red; background-color:white; }

or

h1 {
color:red;
background-color:white;
}

Details of the properties defined by CSS can be found at CSS Programming#CSS1 Properties.

Summary[edit]

Cascading Style Sheets (CSS), are used with webpages to define the view of information saved in HTML or XML. While XML and HTML create and preserve a documents structure, CSS is used to define the appearance and placement of objects within the document as well as its content. All of this information is saved in a separate file, the .css file. In the CSS file are textsize, background color, text types, e.g defined. The placement of pictures and other animations are also defined in the css file. If CSS is used correctly it would make a webpage a lot easier to create and even more important, to maintain. Because you will only have to make changes in the css file to make the whole website change.

File:Csszengarden nocss.png
CSS Zen Garden without CSS
File:Csszengarden1 css.png
Zen Garden with CSS


References and useful links[edit]

References:

Useful links:

Exercises[edit]

Exercise 1[edit]

Using the CSS file provided below, create a price list for books as an XML document. <?xml version="1.0"?> Exercise1.css:

<book> Lord of the rings</book> book{
  display: block;
  background-color: transparent;
  margin: 20px 10px 10px 200px;
}
<isbn>1.000.56439 </isbn> isbn{
  display: block;
  font: 12pt/15pt georgia, serif;
}
<title> The Two Towers </title> title {
  display: block;
  font: 14pt/18pt verdana, sans-serif;
}
<author> J.R.R. Tolkien </author> author {
  display: block;
  font: italic 12pt/15pt georgia, serif;
}
<publisher> Penguin </author> author {
  display: block;
  font: 12pt/15pt georgia, serif;
}
<price> 48 EUR </price> price{
  display: block;
  font: bold 12pt/15pt georgia, serif;
  color: #ff0000;
  background-color: transparent;
}

Exercise 2[edit]

Create a personal homepage, where you introduce yourself.

The page should contain one header, one footer, and navigation as a list of links.

Solutions[edit]

Solutions

CSS Challenges[edit]

Copy and paste the HTML, then take up the challenge to create a stylesheet to match the picture!



XSLT and Style Sheets



Previous Chapter Next Chapter
CSS Cocoon



Learning objectives

  • Output XML to an XML file
  • Create numbered lists
  • Use parameters in a stylesheet
  • Import multiple stylesheets

In previous chapters, we have introduced the basics of using an XSL stylesheet to convert XML documents into HTML. This chapter will briefly review those concepts and introduce many new ones as well. It is a reference for creating stylesheets.

XML Stylesheets[edit]

The eXtensible Stylesheet Language (XSL) provides a means to transform and format the contents of XML document for display. It includes two parts, XSL Transformation (XSLT) for transforming the XML document, and XSLFO (XSL Formatting Objects) for formatting or applying styles to XML documents. The XSL Transformation Language (XSLT) is used to transform XML documents from one form to another, including new XML documents, HTML, XHTML, and text documents. XSL-FO can create PDF documents, as well as other output formats, from XML. With XSLT you can effectively recycle content, redesigning it for use in new documents, or changing it to fit limitless uses. For example, from a single XML source file, you could extract a document ready for print, one for the Web, one for a Unix manual page, and another for an online help system. You can also choose to extract only parts of a document written in a specific language from an XML source that stores text in many languages. The possibilities are endless!

An XSLT stylesheet is an XML document, complete with elements and attributes. It has two kinds of elements, top-level and instruction. Top-level elements fall directly under the stylesheet root element. Instruction elements represent a set of formatting instructions that dictate how the contents of an XML document will be transformed. During the transformation process, XSLT analyzes the XML document, or the source tree, and converts it into a node tree, a hierarchical representation of the entire XML document, also known as the result tree. Each node represents a piece of the XML document, such as an element, attribute or some text content. The XSL stylesheet contains predefined “templates” that contain instructions on what to do with the nodes. XSLT will use the match attribute to relate XML element nodes to the templates, and transform them into the result document.

Let's review the stylesheet, city.xsl from chapter 2, and examine it in a little more detail:

Exhibit 1: XML stylesheet for city entity

<?xml version="1.0" encoding="UTF-8"?>

<!-- 
      Document:  city.xsl 
-->

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="html"/>
   <xsl:template match="/">
      <html>
          <head>
             <title>Cities</title>
          </head>
          <body>
             <h2>Cities</h2>
             <xsl:apply-templates select="cities"/>
          </body>
      </html>
  </xsl:template>
  <xsl:template match="cities">
      <!-- the for-each element can be used to loop through each node in a specified node set (in this case city) -->
      <xsl:for-each select="city">
         <xsl:text>City: </xsl:text>
           <xsl:value-of select="cityName"/>
           <br/>
         <xsl:text>Population: </xsl:text>
           <xsl:value-of select="cityPop"/>
           <br/>
          <xsl:text>Country: </xsl:text>
          <xsl:value-of select="cityCountry"/>
          <br/>
          <br/>
      </xsl:for-each>
   </xsl:template>
</xsl:stylesheet>
  • Since a stylesheet is an XML document, it begins with the XML declaration. This includes the pseudo-attributes encoding and standalone. They are called pseudo because they are not the same as element attributes. The standalone attribute allows you to directly specify an external DTD
  • The <xsl:stylesheet> tag declares the start of the stylesheet and identifies the version number and the official W3C namespace. Notice the conventional prefix for the XSLT namespace, xsl. Once a prefix is declared, it must be used for all the elements.
  • The <xsl:output> tag is an optional element that determines how to output the result tree.
  • The <xsl:template> element defines the start of a template and contains rules to apply when a specified node is matched. The match attribute is used to associate(match) the template with an XMLelement, in this case the root (/), or whole branch, of the XML source document.
  • If no output method has been specified, the output would default to HTML in this case since the root element is the <html> start tag
  • The apply-templates element is an empty element since it has no character content. It applies a template rule to the current element or the element's child nodes. The select attribute contains a location path telling it which element's content to process.
  • The instruction element value-of extracts the string value of the child of the selected node, in this case, the text node child of cityName

The template element defines the rules that implement a change. This can be any number of things, including a simple plain-text conversion, the addition or removal of XML elements, or simply a conversion to HTML, when the pattern is matched. The pattern, defined in the element’s match attribute, contains an abbreviated XPath location path. This is basically the name of the root element in the doc, in our case, "tourGuide."

When transforming an XML document into HTML, the processor expects that elements in the stylesheet be well-formed, just as with XML. This means that all elements must have an end tag. For example, it is not unusual to see the <p> tag alone. The XSLT processor requires that an element with a start-tag must close with an end tag. With the <br> element, this means either using <br></br> or <br />. As mentioned in Chapter 3, the br element is an empty element. That means it carries no content between tags, but it may have attributes. Although no end tags are output for the HTML output, they still must have end-tags in the stylesheet. For instance, in the stylesheet, you will list: <img src="picture.jpg"></img> or as an empty element <img src="picture.jpg" /> . The HTML output will drop the end-tag so it looks like this: <img src="picture.jpg"> On a side note, the processor will recognize html tags no matter what case they are in - BODY, body, Body are all interpreted the same.

Output[edit]

XSLT can be used to transform an XML source into many different types of documents. XHTML is also XML, if it is well formed, so it could also be used as the source or the result. However, transforming plain HTML into XML won't work unless it is first turned into XHTML so that it conforms to the XML 1.0 recommendation. Here is a list of all the possible type-to-type transformations performed by XSLT:

Exhibit 2: Type-To-Type Transformations

XML XHTML HTML text
XML X X X X
XHTML X X X X
HTML        
text        

The output element in the stylesheet determines how to output the result tree. This element is optional, but it allows you to have more control over the output. If you do not include it, the output method will default to XML, or HTML if the first element in the result tree is the <html> element. Exhibit 3 lists attributes.

Exhibit 3: Element output attributes (from Wiley: XSL Essentials by Michael Fitzgerald)

Attribute Description
cdata-section-elements Specifies a list of whitespace-separated element names that will contain CDATA sections in the result tree.
A CDATA escapes characters that are normally interpreted as markup, such as a < or an &.
doctype-public Places a public identifier in a document type declaration in a result tree.
doctype-system Places a public identifier in a document type declaration in a result tree.
encoding Sets the preferred encoding type, such as UTF-8, ISO-8859, etc. These values are not case sensitive.
indent Indicates that the XSLT processor may indent content in the result tree. Possible values are

"yes" or "no".
The default is no when method="xml".

media-type Sets the media type (MIME type) for the content of the result tree.
method Specifies the type of output. Legal values are xml, html, text, or another qualified name.
omit-xml-declaration Tells the XSLT processor to include or not include an XML declaration
standalone Tells the XSLT processor to include a pseudo-attribute in the XML declaration (if not omitted) with a value of either "yes" or "no".
This indicates whether the document depends on external markup declarations, such as those in an external DTD.
version Sets the version number for the output method such as the version of XML used for output (default is 1.0)

XML to XML[edit]

Since we have had a lot of practice transforming an XML document to HTML, we are going to transform city.xml, used in chapter 2, into another XML file, using host.xsd as the schema.

Exhibit 4: XML document for city entity

<?xml version="1.0" encoding="UTF-8"?>

<!-- 
    Document:  city.xml
-->

<cities xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
  xsi:noNamespaceSchemaLocation='host.xsd'>
<city>
    <cityID>c1</cityID>
    <cityName>Atlanta</cityName>
    <cityCountry>USA</cityCountry>
    <cityPop>4000000</cityPop>
    <cityHostYr>1996</cityHostYr>   
   
</city>

<city>
    <cityID>c2</cityID>
    <cityName>Sydney</cityName>
    <cityCountry>Australia</cityCountry>
    <cityPop>4000000</cityPop>
    <cityHostYr>2000</cityHostYr>   
    <cityPreviousHost>c1</cityPreviousHost >   
</city>

<city>
    <cityID>c3</cityID>
    <cityName>Athens</cityName>
    <cityCountry>Greece</cityCountry>
    <cityPop>3500000</cityPop>
    <cityHostYr>2004</cityHostYr>   
    <cityPreviousHost>c2</cityPreviousHost >   
</city>

</cities>

Exhibit 5: XSL document for city entity that list cities by City ID

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output method="html"/>  
    <xsl:template match="/">
        <xsl:for-each select="//city[count(cityPreviousHost) = 0]">
            <br/><xsl:text>City Name: </xsl:text><xsl:value-of select="cityName"/><br/>
            <xsl:text>            Rank: </xsl:text><xsl:value-of select="cityID"/><br/>
            <xsl:call-template name="output">
                <xsl:with-param name="context" select="."/>
            </xsl:call-template>
        </xsl:for-each>
    </xsl:template>
    <xsl:template name="output">
        <xsl:param name="context" select="."/>
        <xsl:for-each select="//city[cityPreviousHost = $context/cityID]">
            <br/><xsl:text>City Name: </xsl:text> <xsl:value-of select="cityName"/><br/>
            <xsl:text>            Rank: </xsl:text><xsl:value-of select="cityID"/><br/>
            <xsl:call-template name="output">
                <xsl:with-param name="context" select="."/>
            </xsl:call-template>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

Exhibit 6: XML schema for host city entity

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
attributeFormDefault="unqualified">

<xsd:element name="cities">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element name="city" type="cityType" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element>

<xsd:complexType name="cityType">
  <xsd:sequence>
    <xsd:element name="cityID" type="xsd:ID"/>
     <xsd:element name="cityName" type="xsd:string"/>
     <xsd:element name="cityCountry" type="xsd:string"/>
     <xsd:element name="cityPop" type="xsd:integer"/>
     <xsd:element name="cityHostYr" type="xsd:integer"/>
     <xsd:element name="cityPreviousHost" type="xsd:IDREF" minOccurs="0" maxOccurs="1"/>
  </xsd:sequence>
</xsd:complexType>

Exhibit 7: XML stylesheet for city entity

 <?xml version="1.0" encoding="UTF-8"?>

 <!--
     Document:  city2.xsl
 -->

    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="utf-8" indent="yes" />
    <xsl:attribute-set name="date">
       <xsl:attribute name="year">2004</xsl:attribute>
       <xsl:attribute name="month">03</xsl:attribute>
       <xsl:attribute name="day">19</xsl:attribute>
    </xsl:attribute-set>
        <xsl:template match="tourGuide">
        <xsl:processing-instruction name="xsl-stylesheet">  href="style.css" type="text/css"<br />
	    </xsl:processing-instruction>
        <xsl:comment>This is a list of the cities we are visiting this week</xsl:comment>
           <xsl:for-each select="city">
               
<!-- element name creates a new element where the value of the attribute name sets name of 
the new element.  Multiple attribute sets can be used in the same element -->
               
<!-- use-attribute-sets attribute adds all the attributes declared in attribute-set from above -->

               <xsl:element name="cityList"  use-attribute-sets="date">
                   <xsl:element name="city">
                         <xsl:attribute name="country">
                         <xsl:apply-templates select="country"/>  </xsl:attribute>
                         <xsl:apply-templates select="cityName"/>
                   </xsl:element>
                   <xsl:element name="details">Will write up a one page report of the trip</xsl:element>
               </xsl:element>
          </xsl:for-each>
       </xsl:template>
   </xsl:stylesheet>
  • Although the output method is set to "xml", since there is no <html> element as the root of the result tree, it would default to XML output.
  • attribute-set is a top-level element that creates a group of attributes by the name of "date." This attribute set can be reused throughout the stylesheet. The element attribute-set also has the attribute use-attribute-sets allowing you to chain together several sets of attributes.
  • The processing-instruction produces the XML stylesheet processing instructions.
  • The element comment creates a comment in the result tree
  • The attribute element allows you to add an attribute to an element that is created in the result tree.

The stylesheet produces this result tree:

Exhibit 8: XML result tree for city entity

<?xml version="1.0" encoding="utf-8" standalone="no"?>

<!--
    Document:  city2.xsl
-->

  <?xsl-stylesheet href="style.css" type="text/css"?>
    <!--This is a list of the cities we are visiting this week-->
    <cityList year="2004" month="03" day="19">
      <city country="Belize">Belmopan</city>
      <details>Will write up a one page report of the trip</details>
    </cityList>
    <cityList year="2004" month="03" day="19">
      <city country="Malaysia">Kuala Lumpur</city>
      <details>Will write up a one page report of the trip</details>
    </cityList>
  </stylesheet>

The processor automatically inserts the XML declaration at the top of the result tree. The processing instruction, or PI, is an instruction intended for use by a processing application. In this case, the href points to a local stylesheet that will be applied to the XML document when it is processed. We used <xsl:element> to create new content in the result tree and added attributes to it.

There are two other instruction elements for inserting nodes into a result tree. These are copy and copy-of. Unlike apply-templates, which only copies content of the child node (like the child text node), these elements copy everything. The following code shows how the copy element can be used to copy the city element in city.xml:

Exhibit 9: Copy element

       <xsl:template match="city">       
               <xsl:copy />    
       </xsl:template>

The result looks like this:

Exhibit 10: Copy element result

 
       <?xml version="1.0" encoding="utf-8">
       <city />
       <city />

The output isn't very interesting, because copy does not pick up the child nodes, only the current node. In our example, it picks up the two city nodes that are in the city.xml file. The copy element has an optional attribute, use-attribute-sets, which allows you to add attributes to the element. However, it will leave behind any other attributes, except the namespace, if it is present. Here is the result if a namespace is declared in the source document, in this case, the default namespace:

Exhibit 11: Namespace result

       <?xml version="1.0" encoding="utf-8">
       <city xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
       <city xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

If you want to copy more from the source file than just one node, the copy-of element includes the current node, and any attribute nodes that are associated with it. This includes any nodes that might be laying around, such as namespace nodes, text nodes, and child element nodes. When we apply the copy-of element to city.xml, the result is almost an exact replica of city.xml! You can also copy comments and processing instructions using <xsl:copy-of select="comment()"/> and <xsl:copy-of select="processing-instruction(name)"/> where name is the value of the name attribute in the processing instruction you wish to retrieve.

Why would this be useful, you ask? Sometimes you want to just grab nodes and go! For example, if you want to place a copy of city.xml into a SOAP envelope, you can easily do it using copy-of. If you don't already know, Simple Object Access Protocol, or SOAP, is a protocol for packaging XML documents for exchange. This is really useful in a B2B environment because it provides a standard way to package XML messages. You can read more about SOAP at www.w3.org/tr/soap.

Use an XML editor to create the above XML Stylesheets, and experiment with the copy and copy-of elements.

Templates[edit]

Since templates define the rules for changing nodes, it would make sense to reuse them, either in the same stylesheet or in other stylesheets. This can be accomplished by naming a template, and then calling it with a call-template element. Named templates from other stylesheets can also be included. You can quickly see how this is useful in practical applications. Here is an example using named templates:

Exhibit 110: Named templates

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" />
    <xsl:template match=" /">
        <xsl:call-template name="getCity" />
    </xsl:template>

    <xsl:template name="getCity">
        <xsl:copy-of select="city" />
    </xsl:template>
</xsl:stylesheet>

Templates also have a mode attribute. This allows you to process a node more than once, producing a different result each time, depending on the template. Let's create a stylesheet to practice modes.

Exhibit 12: XML template modes

<?xml version="1.0" encoding="UTF-8"?>

<!--
    Document:  cityModes.xsl
-->

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" />
     <xsl:template match="tourGuide">
        <html>
           <head>
              <title>City - Using Modes</title>
           </head>
           <body>
              <xsl:for-each select="city">
                 <xsl:apply-templates select="cityName" mode="title" />
                 <xsl:apply-templates select="cityName" mode="url" />
                 <br />
              </xsl:for-each>
           </body>
        </html>
     </xsl:template>

     <xsl:template match="cityName" mode="title">
        <h2><xsl:value-of select="current()"/></h2>
     </xsl:template>
      <xsl:template match="cityName" mode="message">
         <p>Come visit <b><xsl:value-of select="current()" /></b>!</p>
     </xsl:template>
</xsl:stylesheet>
  • apply-templates select="cityName" mode="title" tells the processor to look for a template that has the same mode attribute value
  • value-of select="current()" returns the current node which is converted to a string with value-of.Using select="." will also return the current node.

The result isn't very flattering since we didn't do much with the file, but it gets the point across.

Exhibit 13: Result from above stylesheet

<h2>Belmopan</h2>
Come visit <b>Belmopan</b>!

<h2>Kuala Lumpur</h2>
Come visit <b>Kuala Lumpur</b>!

By default, XSLT processors have built-in template rules. If you apply a stylesheet without any matching rules, and it fails to match a pattern, the default rules are automatically applied. The default rules output the content of all the elements.

Sorting[edit]

Writing “well formed” code XML is vital. At times, however, simply displaying information (the most elementary level of data management) is not all that is necessary to properly identify a project. As information technology specialists, it is necessary to fully understand that order is vital for interpretation. Order can be attained by putting data in a format that is quickly readable. Such information then becomes quickly usable. Using a comparative model or simply looking for a specific name or item becomes very easy. Finding a specific musical artist, title, or musical type becomes very easy. As an Information Specialist, you must fully be aware that it often becomes necessary to sort information. The basis of sorting in XMLT is the xsl:sort command. The xsl:sort element exemplifies a sort key component. A sort key component identifies how a sort key value is to be identified for each item in the order of information being sorted. A Sort Key Value is defined as “the value computed for an item by using the Nth sort key component” The significance of a sort key component is realized either by its select attribute, or by the contained sequence constructor. A Sequence Constructor is defined as a “sequence of zero or more sibling nodes in the stylesheet that can be evaluated to return a sequence of nodes and atomic values”. There are instances when neither is present. Under these circumstances, the default is select=".", which has the effect of sorting on the actual value of the item if it is an atomic value, or on the typed-value of the item if it is a node. If a select attribute is present, its value must be an Xpath expression.

The following is how the <xsl:sort> element is used to sort the output.

Sort Information is held as Follows: Sorting output in XML is quite easy and is done by adding the <xsl:sort> element after the <xsl:for-each> element in the XSL file.

Exhibit 14: Stylesheet with sort function

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
  <html>
   <body>
    <h2>TourGuide Example</h2>
       <xsl:apply-templates select="cities"/>
   </body>
  </html>
    </xsl:template>
    <xsl:template match="cities"> 
       <xsl:for-each select="city">
         <xsl:sort select="cityName"/>
         <xsl:value-of select="cityName"/>
         <xsl:value-of select="cityCountry"/>
      </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

This example will sort the file alphabetically by artist name. Note: The select attribute indicates what XML element to sort on. Information can be SELECTED and SORTED by “title” or “artist”. These are categories that the XML document will display within the body of the file.

We have used the sort function to sort the results of an if statement before. The sort element has many other uses as well. Essentially, it instructs the processor to sort nodes based on certain criteria, which is known as the sort key. It defaults to sorting the elements in ascending order. Here is a short list of the different attributes that sort takes:

Exhibit 15: Sort attributes

Attribute Description
select Specifies the node on which to process
order Specifies the sort order: "ascending" or "descending"
case-order Determines whether text in uppercase is sorted before lowercase: "upper-first" or
"lower-first"
data-type By default sorts on text data: "text", "number", or QName(qualified name)
lang Indicates the language in use since some languages use different alphabets. "en", "de", "fr", etc. If no value is specified, the language is determined from the system environment.

The sort element can be used in either the apply-templates or the for-each elements. It can also be used multiple times within a template, or in several templates, to create sub-ordering levels.

Numbering[edit]

The number instruction element allows you to insert numbers into your results. Combined with a sort element, you can easily create numbered lists. When this simple stylesheet, hotelNumbering.xsl, is applied to city_hotel.xml, we get the result listed below:

Exhibit 16: Sorting and numbering lists

  <?xml version="1.0" encoding="ISO-8859-1"?>
  <!--
      Document:  hotelNumbering.xsl
  -->

  <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text" omit-xml-declaration="yes"/>
     <xsl:template match="/">
          <xsl:apply-templates select="tourGuide/city/hotel">
             <xsl:sort/>
          </xsl:apply-templates>
     </xsl:template>

     <xsl:template match="hotel">
          <xsl:number value="position()" format="&#xa; 0. "/>
          <xsl:value-of select="hotelName"/>
    </xsl:template>
 </xsl:stylesheet>

Exhibit 17: Result hotelNumbering.xsl

 1. Bull Frog Inn
 2. Mandarin Oriental Kuala Lumpur
 3. Pan Pacific Kuala Lumpur
 4. Pook's Hill Lodge

The expression in value is evaluated and the value for position() is based on the sorted node list. To improve the looks we are adding the format attribute with a linefeed character reference (&#xa;), a zero digit to indicate that the number will be a zero digit to indicate that the number will be an integer type, and a period and space to make it look nicer. The format list can be based on the following sequences:

Exhibit 17: Numbering formats

format=" A. "     –    Uppercase letters       
format=" a. "     –    Lowercase letters       
format=" I. "     –    Uppercase Roman numerals       
format=" i. "     –    Lowercase Roman numerals       
format=" 000. "   –    Numeral prefix       
format=" 1- "     –    Integer prefix/ hyphen prefix  
    

To specify different levels of numbering, such as sections and subsections of the source document, the level attribute is used, which tells the processor the levels of the source tree that should be considered. By default, it is set to single, as seen in the example above. It also can take values of multiple and any. The count attribute is a pattern that tells the processor which nodes to count (for numbering purposes). If it is not specified, it defaults to a pattern matching the same node type as the current node. The from attribute can also be used to specify the node where the counting should start.

When level is set to single, the processor searches for nodes that match the value of count, and if it is not present, it matches the current node. When it finds the match, it creates a node-list and counts all the matching nodes of that type. If the from attribute is listed, it tells the processor where to start counting from, rather than counting all nodes.

When the level is multiple, it doesn't just count a list of one node type, it creates a list of all the nodes that are ancestors of the current node, in the actual order from the source document. After this list is created, it selects all the nodes that match the nodes represented in count. It then maps the number of preceding siblings for each node that matches count. In effect, multiple remembers all the nodes separately. This is where any is different. It will number all the elements sequentially, instead of counting them in multiple levels. As with the other two values, you can use the from attribute to tell the processor where to start counting from, which in effect will separate it into levels.

This is a modification of the example above using the level="multiple":

Exhibit 18: Sorting and numbering lists

  <!--
      Document:  hotelNumbering2.xsl
  --> 
  <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" omit-xml-declaration="yes"/>
       <xsl:template match="/">  
            <xsl:apply-templates select="tourGuide//hotelName"/>
       </xsl:template> 
                  
         <xsl:template match="hotel">
               <xsl:number level="multiple" 
                     count="city|hotel" format="&#xa; 1.1 "/>
               <xsl:apply-templates /> 
       </xsl:template>
  </xsl:stylesheet>

Exhibit 19: Result – hotelNumbering2.xsl

   1.1 Bull Frog Inn 
   1.2 Pook's Hill Lodge  
   2.1 Pan Pacific Kuala Lumpur  
   2.2 Mandarin Oriental Kuala Lumpur

The first template matches the root node and then selects all hotel nodes that have country as an ancestor, creating a node-list. The next template recursively processes the amenityName element, and gives it a number for each instance of amenityName based on the number of elements in the attribute. This is figured out by counting the number of preceding siblings, plus 1.

Formatting[edit]

Formatting numbers is a simple process so this section will be a brief overview of what can be done. Placed within the XML stylesheet, functions can be used to manipulate data during the transformation. In order to make numbers a little easier to read, we need to be able to separate the digits into groups, or add commas or decimals. To do this we use the format-number() function. The purpose of this function is to convert a numeric value into a string using specified patterns that control the number of leading zeroes, separator between thousands, etc. The basic syntax of this function is as follows: format-number (number, pattern)

  • numbers
  • pattern is a string that lays out the general representation of a number. Each character in the string represents either a digit from number or some special punctuation such as a comma or minus sign.

The following are the characters and their meanings used to represent the number format when using the format-number function within a stylesheet:

Exhibit 20: Format-number function

    Symbol              Meaning       
       0               A digit.
       #               A digit, zero shows as absent.       
       . (period)      Placeholder for decimal separator.       
       ,               Placeholder for grouping separator.       
       ;               Separate formats.       
       -               Default prefix for negative.       
       %               Multiply by 100 and show as a percentage.       
       X               Any other characters can be used in the prefix or suffix.       
       ‘               Used to quote special characters in a prefix or suffix.

Conditional Processing[edit]

There are times when it is necessary to display output based on a condition. There are two instruction elements that let you conditionally determine which template will be used based on certain tests. These are the if and choose elements.

The test condition for an if statement must be contained within the test attribute of the <xsl:if> element. Expressions that are testing greater than and less than operators must represent them by “&gt;” and “&lt;” respectively in order for the appropriate transformation to take place. The not() function from XPath is a Boolean function and evaluates to true if its argument is false, and vice versa. The and and or conditions can be used to combine multiple tests, but an if statement can, at most, test only one expression. It can also only instantiate the use of one template.

The when element, is similar to the else statement in Java. By using the when element, the choose element can offer a many alternative expressions. A choose element must contain at least one when statement, but it can have as many as it needs. The choose element can also contain one instance of the otherwise element, which works like the final else in a Java program. It contains the template if none of the other expressions are true.

The for-each element is another conditional processing element. We have used it in previous chapter exercises, so this will be a quick review. The for-each element is an instruction element, which means it must be children of template elements. for-each evaluates to a node-set, based on the value of the select attribute, or expression, and processes through each node in document order, or sorted order.

Parameters and Variables[edit]

XSLT offers two similar elements, variable and param. Both have a required name attribute, and an optional select attribute, and you declare them like this:

Exhibit 21: Variable and parameter declaration

     <xsl:variable name="var1" select=""/>      
     <xsl:param name="par1" select=""/>

The above declarations have bound to an empty string, which is the same effect as if you had left off the select attribute. With parameters, this value is considered only a default, or initial value to be changed either from the command line, or from another template using the with-param element. However, with the variable, as a general rule, the value is set and can't be changed dynamically except under special circumstances. When making declarations, remember that variables can be declared anywhere within a template, but a parameter must be declared at the beginning of the template.

Both elements can also have global and local scope, depending on where they are defined. If they are defined at the top-level under the <stylesheet> elements, they are global in scope and can be used anywhere in the stylesheet. If they are defined in a template, they are local and can only be used in that template. Variables and parameters declared in templates are visible only to the template they are declared in, and to templates underneath them. They have a cascading effect: they can spill down from the top-level into a template, down into a template within that one, etc, but they cannot go back up!

We are going to hard-code a value for the parameter in it's declaration element using the select attribute.

Exhibit 22: HTML results

  <?xml version="1.0" encoding="UTF-8"?>

  <!--
      Document:  countryParam.xsl
  -->

    <xsl:stylesheet version="1.0"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>
      <xsl:param name="country" select="'Belize'"/>
      <xsl:param name="code" />
    <xsl:template match="/">
            <xsl:apply-templates select="country-codes" />
     </xsl:template>
         <xsl:template match="country-codes">
           <xsl:apply-templates select="code" />
      </xsl:template>
      <xsl:template match="code">
           <xsl:choose>
              <xsl:when test="countryName[. = $country]">
                         The country code for
                         <xsl:value-of select="countryName"/> is
                         <xsl:value-of select="countryCode"/>.
              </xsl:when>
              <xsl:when test="countryCode[. = $code]">
                         The country for the code
                         <xsl:value-of select="countryCode"/> is
                         <xsl:value-of select="countryName"/>.
              </xsl:when>
              <xsl:otherwise>
                         Sorry. No matching country name or country code.
              </xsl:otherwise>
            </xsl:choose>
          </xsl:template>
       </xsl:stylesheet>

The value that you pass in does not have to be enclosed in quotes, unless you are passing a value with more than one word. For example, we could have passed either country="United States" or country=Belize without getting an error.

The value of a variable can also be used to set an attribute value. Here is an example setting the countryName element with an attribute of countryCode equal to the value in the$code variable:

Exhibit 23: Attribute of countryCode

  <countryName countryCode="{$code}"></countryName> 

This is known as an attribute value template. Notice the use of braces around the parameter. This tells the processor to evaluate the content as an expression, which then converts the result to a string in the result tree. There are attributes which cannot be set with an attribute value template:

  • Attributes that contain patterns (such as select in apply-templates)
  • Attributes of top-level elements
  • Attributes that refer to named objects (such as the name attribute of template)

Parameters, though not variables, can be passed between templates using the with-param element. This element has two attributes, name, which is required, and select, which is optional. This next example uses with-param as a child of the call-template element, although it can also be used as a child of apply-templates.

Exhibit 24: XSL With-Param

  <?xml version="1.0" encoding="UTF-8"?>

  <!--
      Document:  withParam.xsl
  -->

      <xsl:stylesheet version="1.0"
          xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
          <xsl:output method="text"/>
          <xsl:template match="/">
              <xsl:apply-templates select="tourGuide/city"/>
          </xsl:template>
          <xsl:template match="city">
              <xsl:call-template name="countHotels">
                  <xsl:with-param name="num" select="count(hotel)"/>
              </xsl:call-template>
          </xsl:template>
          <xsl:template name="countHotels">
              <xsl:param name="num" select="''" />
              <xsl:text>City Name: </xsl:text>
              <xsl:value-of select="cityName" />
              <xsl:text>&#xa;</xsl:text>
              <xsl:text>Number of hotels: </xsl:text>
              <xsl:value-of select="$num" />
              <xsl:text>&#xa;&#xa;</xsl:text>
          </xsl:template>
      </xsl:stylesheet>
  • <xsl:template match="city">Here we match the city nodes that were returned in the apply-templates node set.
  • call-template, as discussed earlier, calls the template namedcountHotels
  • The element with-param tells the called template to use the parameter named num, and the select statement sets the expression that will be evaluated.
  • Notice the declaration for the parameter is in the first line of the template. It instantiates num to an empty string, because the value will be replaced by the value of the expression in the with-param element's select attribute.
  • &#xa; outputs a line feed in the result tree to make the output look nicer.

Exhibit 25: Text results – withParam.xsl

     City Name: Belmopan
          Number of hotels: 2
  
     City Name: Kuala Lumpur
          Number of hotels: 2

The Muenchian Method[edit]

The Muenchian Method is a method developed by Steve Muench for performing functions using keys. Keys work by assigning a key value to a node and giving you access to that node through the key value. If there are lots of nodes that have the same key value, then all those nodes are retrieved when you use that key value. Effectively this means that if you want to group a set of nodes according to a particular property of the node, then you can use keys to group them together. One of the more common uses for the Muenchian method is grouping items and counting the number of occurrences in that group, such as number of occurrences of a city

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="html"/>      
    <xsl:key name="Count" match="*/city" use="cityName" /> 
    <xsl:template match="cities">         
        <xsl:for-each 
            select="//city[generate-id()=generate-id(key('Count', cityName)[1])]"> 
            <br/><xsl:text>City Name:</xsl:text><xsl:value-of select="cityName"/><br/>
            <xsl:text>Number of Occurences:</xsl:text>
            <xsl:value-of select="count(key('Count', cityName))"/>
            <br/>
        </xsl:for-each>  
    </xsl:template>   
</xsl:stylesheet>

Text Results – muenchianMethod.xsl

     City Name: Atlanta
          Number of Occurrences: 1
     City Name: Athens
          Number of Occurrences: 1
     City Name: Sydney
          Number of Occurrences: 1

Datatypes[edit]

There are five different datatypes in XSLT: Node-set, String, Number, Boolean, and Result tree fragment. Variables and parameters can be bound to each of these, but the last type is specific to them.

Node-sets are returned everywhere in XSLT. We've seen them returned from apply-templates and for-each elements, and variables. Now we will see how a variable can be bound to a node-set. Examine the following code:

Exhibit 26: Variable bound to a node-set
<xsl:variable name="cityNode"  select="city" />
...
<xsl:template match="/">
     <xsl:apply-templates select="$cityNode/cityName" />
</xsl:template>

Here, we are setting the value of the variable $cityNode to the node-set city from the source tree. The cityName element is a child of city, so the output generated by apply-templates is the text node of cityName. Remember, you can use variable references in expressions but not patterns. This means we cannot use the reference $cityNode as the value of a match attribute.

String types are useful if you are interested only in the text of nodes, rather than in the whole node-set. String types use XPath functions, most notably, string(). This is just a simple example:

Exhibit 27: String types
<xsl:variable name="cityName" select="string('Belmopan')" />

This is in fact, a longer way of saying:

Exhibit 28: Shorter version of above
<xsl:variable name="cityName" select="' Belmopan'" />

It is also possible to declare a variable that has a number value. You do this by using the XPath function number().

Exhibit 29: Declaration of variable with number value
<xsl:variable name="population" select="number(11100)" />

You can use numeric operators such as + - * / to perform mathematic operations on numbers, as well as some built in XPath functions such as sum() and count().

The Boolean type has only two possible values, true or false. As an example, we are going to use a Boolean variable to test to see if a parameter has been passed into the stylesheet.

Exhibit 30: Boolean variable to test
<xsl:param name="isOk" select="''" />
<xsl:template match="city" />
     <xsl:choose>
          <xsl:when test="boolean($isOk)">
               …logic here…
          </xsl:when>
          <xsl:otherwise>
               Error: must use parameter isOk with any value to apply template
          </xsl:otherwise>
    </xsl:choose>
</xsl:template>

We start with an empty-string declaration for the parameter isOk. In the test attribute of when, the boolean() function tests the value of isOk. If the value is an empty string, as we defined by default, boolean() evaluates to false(), and the template is not instantiated. If it does have a value, and it can be any value at all, boolean() evaluates to true().

The final datatype is the result tree fragment. Essentially it is a chunk of text (a string) that can contain markup. Let's look at an example before we dive into the details:

Exhibit 31: Result tree fragment datatype
  <xsl:variable name="fragment">     
  <description>Belmopan is the capital of Belize</description>      
  </xsl:variable>

Notice we didn't use the select attribute to define the variable. We aren't selecting a node and getting its value, rather we are creating arbitrary text. Instead, we declared it as the content of the element. The text in between the opening and closing variable tags is the actual fragment of the result tree. In general, if you use the select attribute as we did earlier, and don't specify content when declaring variables, the elements are empty elements. If you don't use select and you do specify content, the content is a result tree. You can perform operations on it as if it were a string, but unlike a node set, you can't use operators such as / or // to get to the nodes. The way you retrieve the content from the variable and get it into the result tree is by using the copy-of element. Let's see how we would do this:

Exhibit 32: Retrieve and place into result tree
   <xsl:template match="city"       
        <xsl:copy-of select="cityName" />      
        <xsl:copy-of select="$fragment" />       
   </xsl:template>

The result tree would now contain two elements: a copy of the city element and the added element, description.

EXSLT[edit]

EXSLT is a set of community developed extensions to XSLT. The modules include facilities to handle dates and times, math, and strings.

Multiple Stylesheets[edit]

In previous chapters, we have imported and used multiple XML and schema documents. It is also possible to use multiple stylesheets using the import and include elements, which should be familiar. It is also possible to process multiple XML documents at a time, in one stylesheet, by using the XSLT function document().

Including an external stylesheet is very similar to what we have done in earlier chapters with schemas. The include element only has one attribute, which is href. It is required and always contains a URI (Uniform Resource Identifier) reference to the location of the file, which can be local (in the same local directory) or remote. You can include as many stylesheets as you need, as long as they are at the top level. They can be scattered all over the stylesheet if you want, as long as they are children of the <stylesheet> element. When the processor encounters an instance of include, it replaces the instance with all the elements from the included document, including template rules and top-level elements, but not the root <stylesheet> element. All the items just become part of the stylesheet tree itself, and the processor treats them all the same. Here are declarations for including a local and remote stylesheet:

Exhibit 33: Declarations for local and remote stylesheet
<xsl:include href="city.xsl" />
<xsl:include href="http://www.somelocation.com/city.xsl"/>

Since include returns all the elements in the included stylesheet, you need to make sure that the stylesheet you are including does not include your own stylesheet. For example, city.xsl cannot include city_hotel.xsl, if city_hotel.xsl has an include element which includes city.xsl. When including multiple files, you need to make sure that you are not including another stylesheet multiple times. If city_hotel.xsl includes amenity.xsl, and country.xsl includes amenity.xsl, and city.xsl includes both city_hotel.xsl and country.xsl, it has indirectly included amenity.xsl twice. This could cause template rule duplication and errors. These are some confusing rules, but they are easy to avoid if you carefully examine the stylesheets before they are included.

The difference between importing stylesheets and including them is that the template rules imported each have a different import precedence, while included stylesheet templates are merged into one tree and processed normally. Imported templates form an import tree, complete with the root <stylesheet> element so the processor can track the order in which they were imported. Just like include, import has one attribute, href, which is required and should contain the URI reference for the document. It is also a top-level element and can be used as many times as need. However, it must be the immediate child for the <stylesheet> element, otherwise there will be errors. This code demonstrates importing a local stylesheet:

Exhibit 34: Importing local stylesheet
<xsl:import href="city.xsl" />

The order of the import elements dictates the precedence that matching templates will have over one another. Templates that are imported last have higher priority than those that are imported first. However, the template element also has a priority attribute that can affect its priority. The higher the number in the priority attribute, the higher the precedence. Import priority only comes into effect when templates collide, otherwise importing stylesheets is not that much different from including them. Another way to handle colliding templates is to use the apply-imports element. If a template in the imported document collides with a template in the importing document, apply-templates will override the rule and cause the imported template to be invoked.

The document() function allows you to process additional XML documents and their nodes. The function is called from any attribute that uses an expression, such as the select attribute. For example:

Exhibit 35: Document() function
<xsl:template match="hotel">
     <xsl:element name="amenityList">
          <xsl:copy-of select="document('amenity.xml')" />
     </xsl:element>
</xsl:template>

When applied to an xml document that only contains an empty hotel element, such as <hotel></hotel>, the result tree will add a new element called amenityList, and place all the content from amenity.xml (except the XML declaration) in it. The document function can take many other parameters such as a remote URI, and a node-set, just to name a few. For more information on using document(), visit http://www.w3.org/TR/xslt#document

XSL-FO[edit]

XSL-FO stands for Extensible Stylesheet Language Formatting Objects and is a language for formatting XML data. When it was created, XSL was originally split into two parts, XSL and XSL-FO. Both parts are now formally named XSL. XSL-FO documents define a number of rectangular areas for displaying output. XSL-FO is used for the formatting of XML data for output to screen, paper or other media, such as PDF format. For more information, visit http://www.w3schools.com/xslfo/default.asp

Summary[edit]

XML stylesheets can output XML, text, HTML or XHTML. When an XSL processor transforms an XML document, it converts it to a result tree of nodes, each of which can be manipulated, extracted, created, or set aside, depending on the rules contained in the stylesheet. The root element of a stylesheet is the <stylesheet> element. Stylesheets contain top-level and instruction elements. Templates use XPath locations to match a pattern of nodes in the source tree, and then apply defined rules to the nodes when it finds a match. Templates can be named, have a mode, or a priority. Node sets from the source tree can be sorted or formatted. XSLT uses for-each and if elements for conditional processing. XSLT also supports the use of variables and parameters. There are five basic datatypes: a node-set, a string, a number, a Boolean, and a result tree fragment. A stylesheet can also include or import additional stylesheets or even additional XML documents. XSL-FO is used for formatting data into rectangular objects.

Reference Section[edit]

Exhibit 36: XSL Elements (from http://www.w3schools.com/xsl/xsl_w3celementref.asp and http://www.w3.org/TR/xslt#element-syntax-summary)

Element Description Category
apply-imports Applies a template rule from an imported stylesheet instruction
apply-templates Applies a template rule to the current element or to the current element's child nodes instruction
attribute Adds an attribute instruction
attribute-set Defines a named set of attributes top-level-element
call-template Calls a named template instruction
choose Used in conjunction with <when> and <otherwise> to

express multiple conditional tests

instruction
comment Creates a comment node in the result tree instruction
copy Creates a copy of the current node
(without child nodes and attributes)
instruction
copy-of Creates a copy of the current node
(with child nodes and attributes)
instruction
decimal-format Defines the characters and symbols to be used when converting numbers into strings, with the format-number() function top-level-element
element Creates an element node in the output document instruction
fallback Specifies an alternate code to run if  the processor does not support an XSLT element instruction
for-each Loops through each node in a specified node set instruction
if Contains a template that will be applied only if a specified condition is true instruction
import Imports the contents of one stylesheet into another.
Note: An imported stylesheet has lower precedence than the importing stylesheet
top-level-element
include Includes the contents of one stylesheet into another.
Note: An included stylesheet has the same precedence as the including stylesheet
top-level-element
key Declares a named key that can be used in the stylesheet with the key() function top-level-element
message Writes a message to the output (used to report errors) instruction
namespace-alias Replaces a namespace in the stylesheet to a different namespace in the output top-level-element
number Determines the integer position of the current node and formats a number instruction
otherwise Specifies a default action for the <choose> element instruction
output Defines the format of the output document top-level-element
param Declares a local or global parameter top-level-element
preserve-space Defines the elements for which white space should be preserved top-level-element
processing-instruction Writes a processing instruction to the output instruction
sort Sorts the output instruction
strip-space Defines the elements for which white space should be removed top-level-element
stylesheet Defines the root element of a stylesheet top-level-element
template Rules to apply when a specified node is matched top-level-element
text Writes literal text to the output instruction
transform Defines the root element of a stylesheet top-level-element
value-of Extracts the value of a selected node instruction
variable Declares a local or global variable top-level-element or instruction
when Specifies an action for the <choose> element instruction
with-param Defines the value of a parameter to be passed into a template instruction

Exhibit 37: XSLT Functions (from http://www.w3schools.com/xsl/xsl_functions.asp)

Name Description
current() Returns the current node
document() Used to access the nodes in an external XML document
element-available() Tests whether the element specified is supported by the XSLT processor
format-number() Converts a number into a string
function-available() Tests whether the element specified is supported by the XSLT processor
generate-id() Returns a string value that uniquely identifies a specified node
key() Returns a node-set using the index specified by an <xsl:key> element
system-property Returns the value of the system properties
unparsed-entity-uri() Returns the URI of an unparsed entity

Exhibit 38: Inherited XPath Functions (from http://www.w3schools.com/xsl/xsl_functions.asp)
Node Set Functions

Name Description Syntax
count() Returns the number of nodes in a node-set number=count(node-set)
id() Selects elements by their unique ID node-set=id(value)
last() Returns the position number of the last node in the processed node list number=last()
local-name() Returns the local part of a node. A node usually consists of a prefix, a colon, followed by the local name string=local-name(node)
name() Returns the name of a node string=name(node)
namespace-uri() Returns the namespace URI of a specified node uri=namespace-uri(node)
position() Returns the position in the node list of the node that is currently being processed number=position()

String Functions


Name Description Syntax & Example
Concat() Returns the concatenation of all its arguments string=concat(val1, val2, ..)

Example:
concat('The',' ','XML')
Result: 'The XML'

contains() Returns true if the second string is contained within the first

string, otherwise it returns false

bool=contains(val,substr)

Example:
contains('XML','X')
Result: true

normalize-space() Removes leading and trailing spaces from a string string=normalize-space(string)

Example:
normalize-space(' The   XML ')
Result: 'The XML'

starts-with() Returns true if the first string starts with the second string,

otherwise it returns false

bool=starts-with(string,substr)

Example:
starts-with('XML','X')
Result: true

string() Converts the value argument to a string string(value)

Example:
string(314)
Result: '314'

string-length() Returns the number of characters in a string number=string-length(string)

Example:
string-length('Beatles')
Result: 7

substring() Returns a part of the string in the string argument string=substring(string,start,length)

Example:
substring('Beatles',1,4)
Result: 'Beat'

substring-after() Returns the part of the string in the string argument that occurs after the substring in the substr argument string=substring-after(string,substr)

Example:
substring-after('12/10','/')
Result: '10'

substring-before() Returns the part of the string in the string argument that occurs

before the substring in the substr argument

string=substring-before(string,substr)

Example:
substring-before('12/10','/')
Result: '12'

translate() Takes the value argument and replaces all occurrences of string1

with string2 and returns the modified string

string=translate(value,string1,string2)

Example:
translate('12:30',':','!')
Result: '12!30'

Number Functions


Name Description Syntax & Example
ceiling() Returns the smallest integer that is not less than the number argument number=ceiling(number)

Example:
ceiling(3.14)
Result: 4

floor() Returns the largest integer that is not greater than the number

argument

number=floor(number)

Example:
floor(3.14)
Result: 3

number() Converts the value argument to a number number=number(value)

Example:
number('100')
Result: 100

round() Rounds the number argument to the nearest integer integer=round(number)

Example:
round(3.14)
Result: 3

sum() Returns the total value of a set of numeric values in a node-set number=sum(nodeset)

Example:
sum(/cd/price)

Boolean Functions


Name Description Syntax & Example
boolean() Converts the value argument to Boolean and returns true or false bool=boolean(value)
false() Returns false false()

Example:
number(false())
Result: 0

lang() Returns true if the language argument matches the language of the xsl:lang element, otherwise it returns false bool=lang(language)
not() Returns true if the condition argument is false, and false if the condition argument is true bool=not(condition)

Example:
not(false())

true() Returns true true()

Example:
number(true())
Result: 1

Exercises[edit]

In order to learn more about XSL and stylesheets, exercises are provided.

Answers[edit]

In order to learn more about XSL and stylesheets, answers are provided.



Cocoon



Previous Chapter Next Chapter
XSLT and Style Sheets Parsing XML files




Learning objectives

  • Understand the function of Cocoon
  • Create a working sitemap
  • Make available a stylesheet-formatted XML document
  • Create a simple Cocoon form
  • Create a simple XSP

sponsored by:

The University of Georgia

Terry College of Business

Department of Management Information Systems



Introduction[edit]

Cocoon is a product of the Apache Software Foundation. It is a powerful server heavily based on Java and XML technology. While it does have a command line interface, most users will be able to do everything they need to with it simply through careful editing of a few configuration files, formatted as XML documents. If you want to see some examples of what Cocoon can do, go to http://MIST5730.terry.uga.edu:8080/cocoon/.

Assumptions[edit]

This tutorial is set up based on the user having access to an installation of Cocoon on Terry’s Blaze server. If you do not have this access, simply replace file locations and access methods with those provided by your server administrator. Some programs described may be Windows-only; you will need to find out a suitable replacement if you are a Macintosh or Linux user, although these utilities are often included with the operating system. JEdit is a free text editor that can read and save files on an FTP or SFTP server as easily as on a hard disk, and properly manipulate many different types of files, with the proper plugins. It is available for Windows, Macintosh, some Linux distributions and as a platform-independent Java application at http://www.jedit.org/.

The Sitemap[edit]

The primary Cocoon file to be concerned with is sitemap.xmap, located in the root Cocoon directory. It uses XML tags to define things such as different ways to present data, the location of important files, identification of browsers, and the most important aspect, pipelines. The default xmap will be fine for our purposes, and we will only need to look at the last few lines of it, where pipeline matches are defined. This section begins at the tag <map:pipeline>. A pipeline match looks like this:

<map:match pattern=”test”>
	<map:generate type=”file” src=”content/test.xml”/>
	<map:transform type=”xslt” src=”stylesheets/test.xslt”/>
	<map:serialize type=”html”/>
</map:match>

Let’s look at what each line does. The first line tells Cocoon to watch for someone browsing to http://blaze.terry.uga.edu:8080/cocoon/otc/test. When this happens, the actions on the next three lines take place. Cocoon will take the information from the file test.xml within the content directory, and apply the stylesheet test.xslt from the stylesheets directory. It formats this result as an html page, as specified on the fourth line. Cocoon can use different serializers to format data as an html or xhtml page, flash object, pdf, or even OpenOffice document. Unlike when working with XML for other purposes, no XSD schema is needed – simply create and populate fields in the XML file as necessary.

Cocoon Forms[edit]

Cocoon forms, or CForms, are a way to use XML structure to create validating form field objects and then arrange them in a template for use. The primary advantage of CForms over using HTML forms is that fields can be validated either with built-in functionality or simple XML attributes. There are several elements required for this. A definition XML file, which holds the fields, called "widgets":

<fd:field id="email" required="true">
<fd:label>Email address:</fd:label>
<fd:datatype base="string"/>
<fd:validation>
<fd:email/>
</fd:validation>
</fd:field>


A template XML file calls on these widgets, adding HTML code to help with look and feel:

<br/>
<ft:widget-label id="email"/>
<ft:widget id="email"/>

A Javascript file that controls the flow of data from one file to the next:

function registration() {
var form = new Form("registration_definition.xml");
form.showForm("registration-display-pipeline");
var viewData = { "username" : form.getChild("name").getValue() }
cocoon.sendPage("registration-success-pipeline", viewData);
}

Pipelines in the sitemap that also control flow:

<map:match pattern="registration">
<map:call function="registration"/>
</map:match>
...
<map:match pattern="registration-display-pipeline">
<map:generate type="jx" src="registration_template.xml"/>
<map:transform type="i18n">
<map:parameter name="locale" value="en-US"/>
</map:transform>
<map:transform src="forms-samples-styling.xsl"/>
<map:serialize/>
</map:match>
...
<map:match pattern="registration-success-pipeline">
<map:generate type="jx" src="registration_success.jx"/>
<map:serialize/>
</map:match>

An XSP can be used in this flow in order to pass submissions to a database.

XSPs[edit]

XSPs function similarly to JSPs and servlets - they are server-side applications that can support many users at once. Unlike JSPs and servlets, XSPs can use XML tags to accomplish much of their functionality, although they can also use Java code between <xsp:logic></xsp:logic> tags. One good use for XSPs is passing information to a database or recalling and displaying stored data. While JSPs and servlets have to either call a specific database connector or contain all of the code for connecting within them, Cocoon has a configuration file which holds this information, and XSPs just call the name of the database as specified in WEB-INF/cocoon.xconf:

<esql:pool>dbname</esql:pool>

XSP code to enter data from a form might look like this:

  <esql:execute-query>
  <esql:query>
  INSERT into otc_users (name,email,password,age,spam) values  ('<xsp:expr>esc_name</xsp:expr>','<xsp-request:get-parameter name="email"/>','<xsp-request:get-parameter name="password"/>','<xsp-request:get-parameter name="age"/>','<xsp-request:get-parameter name="spam"/>')
  </esql:query>
  </esql:execute-query>

Exercises[edit]

  1. Create a basic XML file and accompanying html stylesheet. Upload them into the proper folders (content and stylesheets respectively) on the Blaze server, and write a pipeline match that would enable you to view the XML content with your stylesheet applied in a browser. Files and match pattern should be named after your own name, for example Bob Jones would use “bjones.” It is not necessary to upload the pipeline code - simply browse to http://blaze.terry.uga.edu:8080/cocoon/otc/yourname and it should be visible.
  2. Follow along with the CForms example located at http://cocoon.apache.org/2.1/userdocs/basics/sample.html. Create and implement at least one widget of your own making. You can view this at work by browsing to http://blaze.terry.uga.edu:8080/cocoon/cforms/registration.
  3. Browse to opt/tomcat5/webapps/cocoon/cforms on Blaze. Examine sitemap-modified.xmap to see how the pipelines could be modified to pass CForm data to an XSP. Test.xsp shows how that data could be inserted into or called from a database.

Appendix - Accessing the Blaze server[edit]

When you have an account set up on the Blaze server, there are several steps you will need to take in order to be able to work with files in the Cocoon directory. Generally, new user accounts are set up with the user’s UGA MyId as the username, and social security number as the password. This password must be changed at the user’s first login, which requires using an SSH client to accomplish. UGA students can download Secure Shell Utilities 3.1 at http://sitesoft.uga.edu/. Two programs are installed by this download, Secure Shell Client and Secure File Transfer Client.

Open the Secure Shell client and click the “Quick Connect” button located near the top of the window. In the resulting window, enter “blaze.terry.uga.edu” as the Host Name, and your specified username as User Name. Port Number should be set to “22”, and Authentication Method should be “Passworded”. Click “Connect”. In the resulting window, enter your given password and click “Ok”. You may see a window asking to save the new host key, click “Yes”. You will now be presented with a text box. It will notify you that your password has expired and must be changed. You will need to enter your given password once, hit enter, enter your desired new password, hit enter, and again enter your desired new password and hit enter. Be aware that nothing you type will show up for security purposes, and you will not be able to delete any typos - you'll have to log in and start over if you mess up. This is all we will be using the Secure Shell Client application for; you can click the “Disconnect” button in the row of small buttons at the top of the screen, and then exit the program.

In order to actually access files on the Blaze server, the Secure File Transfer Client is used. Open it and click the “Quick Connect” button located near the top of the window, entering the same Host Name as with the Secure Shell Client, your new password, and make sure the other settings are the same. Click “Connect.” You will be presented with a Windows Explorer-type screen where you can browse through the files on the Blaze server. To access our Cocoon installation go to the “opt” folder, then the “tomcat5” folder, then the “webapps” folder, then the cocoon folder. Most of our work will be done in the “otc” folder within. To download a file for editing, simply highlight it and click the “Download” button in the row of small buttons at the top of the screen. Once you select a download location and click “Download.” You can then open it in your editor of choice. To upload a file to the server, simply do the reverse – click the “Upload” button in the row of small buttons as the top of the screen, select a file to upload, and click “Upload,” which will put the file in the folder you are currently viewing on the Blaze server.



Parsing XML files



Previous Chapter Next Chapter
Cocoon XUL



Learning objectives

  • Understand the concept of parsing XML files
  • Use different APIs for processing XML files
  • Be aware of the differences between different approaches for parsing XML files
  • Decide when to use a particular technique


In the earlier chapters we were taught how to create XML files in detail. This involved the development of XML documents, Style sheets and Schema and their validation. In this chapter, we will focus on different approaches for parsing XML files and when to use them.

But first, it is time to refresh what we have learned about parsing.

The Process of Parsing XML files[edit]

One goal of the XML format was to enhance raw data formats like plain text by including detailed descriptions of the meaning of the content. Now, in order to be able to read XML files, we use a parser which basically exposes the document’s content through a so-called API (application programming interface). In other words, a client application accesses the content of the XML document through an interface, instead of having to interpret the XML code on its own!

Simple Text Parsing[edit]

One way to extract data from an XML document is simple text parsing – browsing all characters in the document and check for a desired pattern:


<house>
<value><int>150,000</int></value>
</house>

Let’s say we are interested in the value of the house. Using straight text parsing, we would scan the file for the character sequence <value><int> and call it the start pattern. Then, we would further scan the document for the end pattern (i.e. </int></value>). Finally, we declare the text string in between these two patterns to be the value of the surrounding <house> tag.

Why it doesn't work that way[edit]

Obviously, this approach is not suitable for extracting information from large and complex XML documents, since we would have to know exactly what the file looks like and where the information needed is located. From a more general point of view, the structure and semantics of an XML file is determined by the makeup of the document, its tags and attributes – hence, we need a device that is able to recognize and understand this structure and can point out any errors in it. Moreover, it has to provide the content of the document through an interface, so that other applications can access it without difficulty. This device is known as an XML parser.

What a parser does[edit]

Almost all programs that need to process XML documents use an XML parser to extract the information stored in the XML document in order to avoid any of the difficulties that occur when reading and interpreting raw XML data. The parser usually is a class library (e.g. a set of Java class files) that reads a given document and checks if it is well-formed according to the W3C specification. Then, any client software can use methods of the interface provided by the parser API to access the information the parser retrieved from the XML file.

All in all, the parser shields the user from dealing with the complex details of XML like assembling information distributed over several XML files, checking for well-formedness constraints, and so on.

Parsing: an Example[edit]

To illustrate more clearly what parsing an XML file really means, the following example was created which contains information about some cities. It also keeps track of who is on vacation and demonstrates the parsing process with the currently most common parsing methods.

Example: cities.xml[edit]

<?xml version="1.0" encoding="UTF-8" ?>
<cities>
<city vacation="Sam">
<cityName>Atlanta</cityName>
<cityCountry>USA</cityCountry> 
</city>
<city vacation="David">
<cityName>Sydney</cityName>
<cityCountry>Australia</cityCountry> 
</city>
<city vacation="Pune">
<cityName>Athens</cityName>
<cityCountry>Greece</cityCountry> 
</city>
</cities>

Based on the information stored in this XML document, we can easily check who is on vacation and where. The parser will read the file using one of the various techniques presented later in this chapter.

This process is very complicated and prone to errors of all kinds. Luckily, we will never have to write code for it, because there are plenty of free, fully-functional parsers on the Web. All we do is download a parser class library and access the XML document through the interface provided by the parser software. With more recent builds of Java, most parsers do not even have to be downloaded. In other words, we use the functions or methods included in the class library for extracting the information.

Basically, a parser reads the XML document and tries to recognize the structure of the file itself while checking for errors. It simply checks for start/end tags, attributes, namespaces, prefixes, and so on. Then, the client software can access the information derived from this structure using methods provided by the parser software (i.e. the interface).

The best way to learn about the functionality of a parser is to actually use them; therefore, the next section demonstrates the different methods of parsing.

Parser APIs (Application Programming Interface)[edit]

Overview[edit]

There are two “traditional” approaches that dominate the market right now, an event-based push-model as represented by SAX (Simple API for XML) and a tree-based model using the DOM (document object model) approach.

However, there is a movement towards newer approaches and techniques that try to overcome the flaws inherent in these traditional models – an event-based pull-model and a “cursor model”, such as VTD-XML, which allows us to browse the XML document just like in the tree-based approach, but simpler and easier to use.

SAX (Simple API for XML)[edit]

Description[edit]

The push model, typically the exemplified by SAX (www.saxproject.org) is the “gold standard” of XML parsing, since it is probably the most complete and accurate method so far. The SAX classes provide an interface between the input streams from which XML documents are read and the client software which receives the data made available by the parser. The parser browses through the whole document and fires events every time it recognizes an XML construct (e.g. it recognizes a start tag and fires an event – the client software is notified and can use this information… or not).

Evaluation[edit]

The advantage of such a model is that we don’t need to store the whole XML document in memory, since we are only reading one piece of information at a time. If you recall that the XML structure is a set of nodes of various types (like an element node) – parsing the document with a SAX parser means going through each node one at a time. This makes it possible to read even very large XML documents in a memory-efficient way. However, the fact that the parser only provides information about the node currently read also implies that the programmer of the client software is in charge of saving certain information in a separate data structure (e.g. the parents or children of the currently processed node). Moreover, the SAX approach is pretty much read-only, since it is hard to modify the XML structure when we do not have some sort of global view.

In fact, the parser is in control of what is read when. The user can only wait until a certain event has occurred and then use the information stored in the currently processed node.

Example: TGSAXParser.java[edit]

As mentioned before, the best way to fully understand the concept of the parsing process is to actually use it. In the following code sample, the information about the name and country of the cities that people are vacationing in will be displayed. The SAX API that is part of the Xerces parser package was used for the implementation ((Xerces 2 Homepage):


// import the basic SAX API classes
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;

public class TGSAXParser extends DefaultHandler
{
    public boolean onVacation = false;

    // what to do when a start-element event was triggered
    public void startElement(String uri, String name, String qName, Attributes atts)
    {
        // stores the string in the XML file          
        String vacationer = atts.getValue("vacation");
        String cityName = atts.getValue("cityName");
        String cityCountry = atts.getValue("cityCountry");

        // if the start tag is "city" set vacationer to true
        if (qName.equals("city") && (vacationer != null))
        {
            onVacation = true;
            System.out.print("\n" + vacationer + " is on vacation in ");
        }
        if (qName.equals("cityName") && onVacation)
            {                       
            }
        if (qName.equals("cityCountry") && onVacation)
        {                       
        }
    }

    /**This method is used to stop printing information once the element has
    *been read.  It will also reset the onVacation variable for the next
    *element.
    */
    public void endElement(String uri, String name, String qName)
    {
        //reset flag
        if (qName.equals("city"))
        {
            onVacation = false;
        }
    }

    /**This method is triggered to store and print the values between
    *the XML tags.  It will only print those values if onVacation == true.
    */
    public void characters(char[] ch, int start, int length)
    {
        if (onVacation)
        {
            for (int i = start; i < start + length; i++)
            System.out.print(ch[i]);
        }
    }

    public static void main(String[] args)
    {
        System.out.println("People on vacation in the following cities:");

        try
        {
            // create a SAX parser from the Xerces package
            XMLReader xml = XMLReaderFactory.createXMLReader();
            TGSAXParser handler = new TGSAXParser();
            xml.setContentHandler(handler);
            xml.setErrorHandler(handler);
            FileReader r = new FileReader("cities.xml");
            xml.parse(new InputSource(r));
        }
        catch (SAXException se)
        {
            System.out.println("XML Parsing Error: " + se);
        } 
        catch (IOException io) 
        {
            System.out.println("File I/O Error: " + io);
        }
    }
}

The DefaultHandler: As mentioned before, SAX is completely event-driven. Therefore, we need a handler that “listens” to the input stream coming from the input file (cities.xml in this case).

The SAX API provides interface classes, which we have to extend with our own code to read our own specific XML document. In order to include our code in the SAX API, we just have to extend the DefaultHandler interface with our own class and set the content handler to our custom handler class (which consists of three methods: startElement, endElement and characters)

The startElement() and endElement() methods: These methods are invoked whenever the SAX parser finds a start or end tag respectively. The SAX API provides blank stubs for both methods and we have to fill them with code of our own.

In this case, we want our program to do something whenever the vacation attribute is set, so we set a Boolean variable to true whenever we find such an element and process the node by printing out the character sequence in between the start and end tag. The character method is automatically called whenever a startElement and endElement event was triggered, but prints out the character string only if the onVacation attribute is set.

DOM (Document Object Model)[edit]

Description[edit]

The other popular approach is the tree-based model as represented by the DOM (document object model, see W3C Recommendation). This method actually works similarly to a SAX parser, since it reads the XML document from an input stream by browsing through the file and recognizing XML structures.

This time, instead of returning the content of the document in a series of small fragments, the DOM method maps the XML hierarchy to a DOM tree object that contains everything from the original XML document. Everything from elements, comments, textual information or processing instructions is stored in the tree object as nodes, starting with the document itself as the root node.

Now that all the information we need is stored in memory, we access the data by using methods provided by the parser software to read or modify objects within the tree. This facilitates random access to the content of the XML document and provides the possibility to modify the data it contains or even create new XML files by transforming a DOM back to an XML document.

Evaluation[edit]

However, the major downside of this approach is that it requires much more memory and is therefore not suitable for situations where large XML files are used. More importantly, it is somewhat more complex than the simplistic SAX method even for small and simple problems.

Example: MyDOMParser.java[edit]

In the following code sample, a list of cities with people on vacation is again created but this time with the tree-based approach:

// import all necessary DOM API classes
import org.apache.xerces.parsers.*;
import org.apache.xerces.dom.*;
import org.w3c.dom.*;
public class MyDOMParser{
public static void main(String[] args) {
System.out.println("People on vacation in the following cities:");  
try {
// creates a DOM parser object
DOMParser parser = new DOMParser();
parser.parse("cities.xml"); 

// stores the tree object in a variable
         org.w3c.dom.Document doc  = parser.getDocument();

// returns a list of all city elements in my city list
	 NodeList list = doc.getElementsByTagName("city");

// now, for every element in the city list, check if the
// "vacation" attribute is set and if yes, print out the   
// information about the vacationer.
for(int i = 0, length = list.getLength(); i < length; i++){
Element city  = (Element)list.item(i);
Attr vacationer = city.getAttributeNode("vacation");
if(vacationer!= null){
String v = vacationer.getValue();
System.out.print(v + " is vacationing in ");

// grab information about city name and country
// directly from the DOM tree object
ParentNode cityname = (ParentNode)
doc.getElementsByTagName("cityName").item(0);
ParentNode country = (ParentNode)
doc.getElementsByTagName("cityCountry").item(0);
System.out.println(cityname.getTextContent() + ", " + country.getTextContent());
}
}
} catch (Exception e) {         
System.out.println(e.getMessage());
}     
}
}

parser.getDocument(): Once we parsed the XML document, the tree object is temporarily stored in the parser variable. In order to work with the DOM object, we have to create a variable holding it (of type org.w3c.dom.Document).

Then, we create a list of nodes holding all elements with the tag name city. The parser finds these nodes by browsing through the DOM tree. Then, we just go through each one of the city-elements and check if the vacation attribute is set and display all the information about the vacationer if so.

Xerces provides a helpful method called getTextContent() that lets us directly access the text node of an element node, avoiding all difficulties emerging from unneeded white space and the like.

Summary[edit]

Choosing an API at the beginning of your XML project is a very important decision. Once you decide which one to use, it is easy to try different vendors without having much trouble, but switching to a different API will be a very time-consuming and costly process, since you will have to redesign your whole program code.

The SAX API is a widely accepted and well-working parser that is easy to implement and works especially well with streaming content (e.g. an online XML source). Because it is a read-only API, you would not be able to modify the underlying XML data source. Since it only reads one node at a time, it is very memory-efficient and fast. However, this implies that your application expects the information to be close together and ordered.

If you want to randomly access the entire document at any point of time, then the DOM approach might be a better choice for you. The DOM API is more complex and harder to implement, but gives you full control over the whole document and lets you modify the data, also. However, it reads the whole XML document into memory, so the DOM API is not suitable for projects with very large XML files.

Exercise[edit]

Recommended optional exercise[edit]

Use the code sample for the SAX and DOM parser from this chapter and play around with it. You probably want to print out different nodes or add more constraints. This absolutely optional, but will give you an idea of the main differences between SAX and DOM.

Now for the exercise[edit]

  • Create a SAX parser to parse the file movies.xml. The output simply needs to come from your IDE, it does not need to be sent onto a webpage.


TO HELP YOU download this, it provides a structure of the problem so that you can more easily run the app in NetBeans 5.0.

If you’re interested in using Xerces – just download the following file:

           http://www.apache.org/dist/xml/xerces-j/Xerces-J-bin.2.8.0.zip

If the above link is dead. Go to http://www.apache.org/dist/xml/xerces-j/ and download the latest zip binary file. It should be in the format of "Xerces-J-bin.#.#.#.zip"

Then put the content into the \lib\ext subfolder of your NetBeans directory and start up NetBeans IDE. Now, the Xerces package is successfully installed on your machine.

Useful Links[edit]

  1. http://www.cafeconleche.org
  2. http://www.xml.com
  3. http://www.xmlpull.org
  4. http://workshop.bea.com/xmlbeans/reference/com/bea/xml/XmlCursor.html
  5. http://workshop.bea.com/xmlbeans/reference/com/bea/xml/XmlCursor.html


If this text appears blue, the answers to the examples to this page may be found by clicking here.



XUL



Previous Chapter Next Chapter
Parsing XML files AJAX



Learning objectives
  • Get a brief overview of what XUL is.
  • Learn about the basic tag/widget library.
  • Create some simple, static XUL web pages.
  • Add event handlers to a XUL page.

Introduction[edit]

XUL (pronounced zool and rhymes with cool), which stands for eXtensible User interface Language, is an XML-based user interface language originally developed for use in the Netscape browser. It is now maintained by Mozilla. It is a part of Mozilla Firefox and many other Mozilla applications, and is available as part of Gecko, the rendering engine developed by Mozilla. In fact, XUL is powerful enough that the entire user interface in the Firefox application is implemented in XUL.

Like HTML, in XUL you can create an interface using a relatively simple markup language, define the appearance with CSS style sheets, and use JavaScript to manipulate behavior. Unlike HTML, however, XUL provides a rich set of user interface widgets to create, for example, menus, toolbars and tabbed panels.

To put it in simple terms, XUL can be used to create lightweight, cross-platform, cross-device user interfaces.

Many applications are developed using features of a specific platform that makes building cross-platform software time-consuming and costly. Some users may want to use an application on technologies other than traditional computers, such as small handheld devices. To date, there have been some cross-platform solutions already developed. Java, for example, was created just for such a purpose. However, creating GUIs with Java is cumbersome at best. Alternatively, XUL has been designed for building portable user interfaces easily and quickly. It is available on most versions of Windows, Mac OS X, Linux and Unix. Yahoo! currently uses XUL and related technologies for its Yahoo! tool bar (a Firefox extension) and Photomail application.

To illustrate XUL’s potential, this chapter will work through a few examples. Potential is the correct word here. The full capabilities of XUL are beyond the scope of this chapter but it is designed to give the reader a first look at the power of XUL. One more thing needs to be noted: you’ll need a Gecko-based browser (such as Firefox or the Mozilla Suite) or XULRunner to work with XUL.

The Basics[edit]

XUL is XML, and like all good XML files, a good XUL file begins with the standard XML version declaration. Currently, XUL is using the XML version 1.0.

To make your XUL page look good, you must include a global stylesheet in it. The URI of the default stylesheet is href = "chrome://global/skin/". While you can load as many stylesheets as you like, it is best practice to load the global stylesheet initially. Look at Fig.1. Notice the reference to “chrome”. ‘The chrome is the part of the application window that lies outside of a window's content area. Toolbars, menu bars, progress bars, and window title bars are all examples of elements that are typically part of the chrome.’(1) Chrome is the descriptive term used to name all of the elements in a XUL application. Think of it like the chrome on the outside of a car. It’s what catches your eye. The elements in a XUL file are what you see in the browser window.

All XML documents must have a namespace declaration. The developers of XUL have provided a namespace that shows where they came up with the name XUL. (The reference is from the movie ‘Ghostbusters’ for the uninitiated)

<?xml version="1.0"?>
<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>
<window
   id="window identifier"
   title="XUL page"
   orient="horizontal"
   xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">
   . . . (add elements here)
</window>

The next thing to note is the tag <window>. This tag is analogous to the <body> tag in HTML. All the elements will live inside the window tag. In Fig. 1 the window tag has three attributes that are very important. The ‘id’ attribute is important in that it is the way to identify the window so that scripts can refer to it. While the title attribute is not necessary, it is good practice to provide a descriptive name. The value of title will be displayed in the title bar of the window. The next attribute is very important. This tells the browser in what direction to lay out the elements described in the XUL file. Horizontal means just that. Lay out in succession across the window. Vertical is the opposite; it adds the elements in column format. Vertical is the default value so if you do not declare this attribute you’ll get vertical orientation.

As was stated earlier, a XUL document is used to create user interfaces. UI's are generally full of interactive components such as text boxes, buttons and the like. A XUL document accomplishes this with the use of widgets, which are self-contained components with pre-defined behavior. For example buttons will respond to mouse clicks and menu bars can hold buttons. All the normally accepted actions of GUI components are built in to the widgets. There is already a rich library of predefined widgets, but because this is open source, any one can define a widget or a set of widgets for themselves.

The widgets are ‘disconnected’ until they are programmed to work together. This can be done simply with JavaScript or a more complex application can be made using something like C++ or Java. In this chapter we will use JavaScript to illustrate XUL’s uses and potential.

Also, a XUL file should have .xul extension. The Mozilla browser will automatically recognize it and know what to do with it when you click on it. Optionally, an .xml extension could be used but you would have to open the file within the browser.

One more thing needs to be mentioned. There are a few syntax rules to follow and they are:

  • All events and attributes must be written in lowercase.
  • All strings must be double quoted.
  • Every XUL widget must use close tags (either <tag></tag> or <tag/>) to be well-formed.
  • All attributes must have a value.

A First Example[edit]

What better way to start then with the good old ‘Hello World’ example. Open up a text editor (not MS Word) like notepad or TextPad and type in:


<?xml version="1.0"?>
<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>

<window
id="Hello"
title="Hello World Example"
orient="vertical"
persist="screenX screenY width height"
xmlns= "http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">

<description style='font-size:24pt'>Hello World</description>
<description value='Hello World' style='font-size:24pt'/>
<label value = 'Hello World'  style='font-size:24pt'/>
</window>

Save it anywhere but be sure to give the file the .xul extension. Now just double click on it and it should open in your Mozilla or Netscape browser. You should get ‘Hello World’ three times, one on top of the other. Notice the different ways that ‘Hello World’ was printed: twice from a description tag and once from a label tag. Both <description> and <label> are text related tags. Using the description tag is the only way to write text that is not contents of a ‘value’ attribute. This means that you can write text that isn't necessarily assigned to a variable. In the second and third examples the text is expressed as an attribute to the tag description or label, respectively. You can see here that the orient attribute in window is set to ‘vertical’. That is why the text is output in a column. Otherwise, if orient was set to ‘horizontal’, all the text would be on one line. Try it.

Now let’s start adding some more interesting elements.

Adding Widgets[edit]

As stated earlier, XUL has an existing rich library of elements fondly called widgets. These include buttons, text boxes, progress bars, sliders and a host of other useful items. One good listing is the XUL Programmer's Reference.

Let us take a look at some simple buttons. Enter the following code and place it into a Notepad or other text editor that is not MS Word.

<?xml version="1.0"?>
<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>

<window
id="findfile-window"
title="Find Files"
orient="horizontal"
xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">

<button id="find-button" label="Find" default="true"/>
<button id="cancel-button" label="Cancel"/>

</window>


Save it and give the file the .xul extension. Open a Mozilla or Netscape browser and open the file from the browser. You should see a "find" button and a "cancel button". From here it is possible to add more functionality and build up elaborate interfaces.

There has to be some place to put all of these things and like the <body> tag in HTML, the <box> tag in XUL is used to house the widgets. In other words, boxes are containers that encapsulate other elements. There are a number of different <box> types. In this example we’ll use <hbox>, <vbox>, <toolbox> and <tabbox>.

<hbox> and <vbox> are synonymous with the attributes 'orient = "horizontal"' and 'orient = "vertical"', which respectively form the <window> tag. By using these two boxes, discrete sections of the window can have their own orientation. These two elements can hold all of the other elements and can even be nested.

The tags <toolbox> and <tabbox> serve special purposes. <toolbox> is used to create tool bars at the top or bottom of the window while <tabbox> sets up a series of tabbed sheets in the window.

Take the XUL framework from Fig. 1 and replace ". . .( add elements here)" with a <vbox> tag pair (that's both open and close tags). This will be the outside container for the rest of the elements. Remember, the <vbox> means that elements will be positioned vertically in order of appearance. Add the attribute 'flex="1"'. This will make the menu bar extend all the way across the window.

<?xml version="1.0"?>
<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>

<window
id="findfile-window"
title="Find Files"
orient="horizontal"
xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">

<vbox flex="1">
    (... add elements here)
</vbox>

</window>

The 'flex' attribute needs some explanation since it is a primary way of sizing and positioning the elements on a page. Flex is a dynamic way of sizing and positioning widgets in a window. The higher the flex number (1 being highest), the more that widget gets priority sizing and placement over widgets with lower flex settings. All elements have size attributes, such as width and/or height, that can be set to an exact number of pixels but using flex insures the same relative sizing and positioning when resizing a window occurs.

Now put a pair each of <toolbox> and <tabbox> tags inside of the <vbox> tags with <toolbox> first. As was said <toolbox> is used to create tool bars so lets add a toolbar similar to the one at the top of the browser.

This is the code so far:

<?xml version="1.0"?>
<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>

<window
id="findfile-window"
title="Find Files"
orient="horizontal"
xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">

<vbox flex="1">

<toolbox>

<menubar id="MenuBar">
<menu id="File" label="File" accesskey="f">
<menupopup id="FileMenu">
<menuitem label="New" accesskey="n"/>
<menuitem label="Open..." accesskey="o"/>
<menuitem label="Save" accesskey="s"/> 
<menuitem label="Save As..." accesskey="s"/>  
<menuitem label=" ... "/> 
<menuseparator/>
<menuitem label="Close" accesskey="c" />
</menupopup>
</menu>

<menu id="Edit" label="Edit" accesskey="e">
<menupopup id="EditMenu">
<menuitem label="Cut" accesskey="t" acceltext="Ctrl + X"/>
<menuitem label="Copy" accesskey="c"  acceltext="Ctrl + C"/>
<menuitem label="Paste" accesskey="p" disabled="true"/>
</menupopup>
</menu>

<menu id="View" label="View" accesskey="v">
<menupopup id="ViewMenu">
<menuitem id="Tool Bar1" label="Tool Bar1"
type="checkbox" accesskey="1" checked="true"/>
<menuitem id="Tool Bar2" label="Tool Bar2"
type="checkbox" accesskey="2" checked="false"/>
</menupopup>
</menu>
</menubar>

</toolbox>

<tabbox>

</tabbox>

</vbox>

</window>


There should now be a menu bar with “File Edit View” in it and they should each expand when you click on them. Let’s examine the elements and their attributes more closely to see how they work.

First the <menubar> holds all of the menu items (File, Edit ,View). Next there are the three different menu items. Each menu has a set of elements and attributes. The <menupopup> does just it says. It creates the popup menu that occurs when the menu label is clicked. In the popup menu is the list of menu items. Each of these has an 'accesskey' attribute. This attribute underlines the letter and provides the reference for making a hot key for that menu item. Notice in the Edit menu, both 'Cut' and 'Copy' have accelerator text labels. In the File menu there is a <menuseperator/> tag. This places a line across the menu that acts as a visual separator. In the Edit menu, notice the menu item labeled 'Paste' has an attribute: disabled="true". This causes the Paste label to be grayed out in that menu and finally in the View menu the menu items there are actually checkboxes. The first one is checked by default and the second one is not.

Now on to the <tabbox>. Let's make three different sheets with different elements on them. Put this code in between the <tabbox> tags:

<tabbox flex="1">
<tabs>
   <tab id="Tab1" label="Sheet1" selected="true"/>
   <tab id="Tab2" label="Sheet2"/>
   <tab id="Tab3" label="Sheet3"/>
</tabs>

<tabpanels flex="1">
   <tabpanel flex="1" id="Tab1Sheet" orient="vertical" >
   <description style="color:teal;">
      This doesn't do much.
      Just shows some of the style attributes.
   </description>
   </tabpanel>

   <tabpanel flex="1" id="Tab2Sheet" orient="vertical">
   <description class="normal">
      Hey, the slider works (for free).
   </description>
   <scrollbar/>
   </tabpanel>

   <tabpanel flex="1" id="Tab3Sheet" orient="vertical">
   <hbox>
      <text value="Progress Meter" id="txt" style="display:visible;"/>
      <progressmeter id="prgmeter" mode="undetermined"
         style="display:visible;" label="Progress Bar"/>		  
   </hbox>
   <description value="Wow, XUL! I mean cool!"/>   
   </tabpanel>
</tabpanels>
</tabbox>


The tabs are first defined with <tab>. They are given an id and label. Next, a set of associated panels is created, each with different content. The first one is to show that like HTML style sheets can be applied in line. The second two sheets have component type elements in them. See how the slider works and the progress bar is running on its own.

XUL has a number of types of elements for creating list boxes. A list box displays items in the form of a list. Any item in such a particular list can be selected. XUL provides two types of elements to create lists, a listbox element to create multi-row list boxes, and a menulist element to create drop-down list boxes, as we have already seen.

The simplest list box uses the listbox element for the box itself, and the listitem element for each item. For example, this list box will have four rows, one for each item.

<listbox>
  <listitem label="Butter Pecan"/>
  <listitem label="Chocolate Chip"/>
  <listitem label="Raspberry Ripple"/>
  <listitem label="Squash Swirl"/>
</listbox>

Like with the HTML option element, you a value can be assinged using the value attribute. The list box will set to a normal size, but you can alter the size to a certain level using the row attributes. Set it to the number of rows to display in the list box. A scroll bar will automatically come up to let the user be able to see the rest of the items in the list box if the box is too small.

<listbox rows="3">
  <listitem label="Butter Pecan" value="bpecan"/>
  <listitem label="Chocolate Chip" value="chocchip"/>
  <listitem label="Raspberry Ripple" value="raspripple"/>
  <listitem label="Squash Swirl" value="squash"/>
</listbox>

Assigning values to each of the listitems lets the user be able to reference them later using script. This way, other elements can be reference this items to be used for alternative purposes.

All these elements are very nice and easy to put into a window, but by themselves they don't do anything. Now we have to connect things with some other code.

Adding Event Handlers and Responding to Events[edit]

To make things really useful, some type of scripting or application level coding has to be done. In our example, JavaScript will be used to add functionality to the components. This is done in a similar fashion as to scripting with HTML. With HTML, an event handler is associated with an element and some action is initiated when that handler is activated. Most of the handlers used with HTML are also found in XUL, in addition to some unique ones. Scripting can be done in additional lines of code, but a more efficient way is to create a separate file with the needed scripts inside of it. This allows the page to load faster since the rendering engine doesn’t have to decide what to do with the embedded script tags.

That being said, we’ll first add a simple script, in line, as a first example.

Let’s add an ‘onclick’ event handler to fire an alert box when an element is selected. Inside the <window> tag add the line beginning with onclick:


<window
   onclick="alert(event.target.tagName); return false;"
   id="findfile-window"
   title="Find Files"
   orient="horizontal"
   xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">
       (... add elements here)
</window>


Now when you click on any element in the window, you created an alert box that pops up telling you the name of the element. One interesting thing to note: When you click on the text enclosed by the description tag the response is undefined but when you click on the text wrapped by the label tag you get the tabName label.

This implies that a description tag is not really an element. After playing with the alert box, delete that line and add this inside the opening tag of the ‘Close’ menu item in the ‘File’ menu:

oncommand="window.close()"


Now when you click on ‘Close’ or use the ‘C’ as a hot key, the entire window will close. The oncommand event handler is actually preferred over onclick because oncommand can handle hot keys and other non-mouse events.

Let’s try one more thing. Add this right after the opening <window> tag.

<script>
function show()
{
  var meter=document.getElementById('prgmeter');
  meter.setAttribute("style","display: visible;");
  var tx=document.getElementById('txt');
  tx.setAttribute("style","display: visible;");
}

function hide()
{
  var meter=document.getElementById('prgmeter');
  meter.setAttribute("style","display: none;");
  var tx=document.getElementById('txt');
  tx.setAttribute("style","display: none;");
}

</script>


These two functions first retrieve a reference to the progress meter and the text element using their ids. Then both functions set the style attributes of the progress meter and text element to have a display of 'visible' or ‘none’ which will do just that: hide or display those two elements. (The tabpanel for the progress meter has to be displayed in order to see these actions)

Now add two buttons that will provide the event to fire these two methods. First, add a new box element to hold the buttons. The width attribute of the box needs to be set otherwise the buttons will be laid out to extend the length of the window.

<box width="200px">
  <button id="show" label="Show" default="true" oncommand="show();"/>
  <button id="hide" label="Hide" default="true" oncommand="hide();"/>
</box>


Style Sheets[edit]

Style sheets may be used both for creating themes, as well as modifying elements for a more elaborate user interfaces. XUL uses CSS (Cascading Style Sheets) for this. A style sheet is a file which contains style information for elements. The style sheet makes it possible to apply certain fonts, colors, borders, and size to the elements of your choice. Mozilla applies a default style sheet to each XUL window. So far, this is the style sheet that has been used for all the XUL documents:

<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>


That line gives the XUL document the default chrome://global/skin/ style sheet. In Mozilla, this will be translated as the file global.css, which contains default style information for XUL elements. The file will still show is this line is left out but it will not be as aesthetically pleasing. The style sheet applies theme-specific fonts, colors and borders to make the elements look more suitable. Even though style sheets can provide a better looking file, adding styles cannot always provide a better view. Some CSS properties do not affect the appearance of a widget, such as those that change the size or margins. In XUL, the use of the "flex: attribute should be used instead of using specific sizes. There are other ways that CSS does not apply, and may be to advanced for this tutorial.

Using a style sheet that you perhaps have already made, you just have to insert one extra line of code pointing to the CSS file you have already made.

<?xml-stylesheet href="chrome://global/skin/" type="text/css"?>
<?xml-stylesheet href="findfile.css" type="text/css"?>


This second line of code references the style sheet, and will take over as the default style sheet used for the XUL document. Sometimes it is desired not to have the style that comes with the default CSS file.

Answers[edit]

Conclusion[edit]

The examples shown in this chapter merely scratch the surface of XUL’s capabilities. Even though these examples are very simple, one can see how easy it would be to create more complex UI’s with XUL. With a complete set of the standard components such as buttons and text boxes at the programmer’s disposal, the programmer can code anything in XUL that can be coded in HTML. The cross-platform ability of XUL is another bonus but the fact that it doesn’t work with Microsoft’s Internet Explorer may suppress XUL’s widespread use. There is some hope that due to the delay in the development of the next version of IE that XUL may find it’s way into IE, but don’t hold your breath..

References[edit]

  1. 'Configurable Chrome' by Dave Hyatt (hyatt@netscape.com) (Last Modified 4/7/99)
  2. XML User Interface Language (XUL) - The Mozilla Organization
  3. XulPlanet
  4. XUL Programmer's Reference Manual, Fifth Draft: Updated for XUL 1.0



AJAX



Previous Chapter Next Chapter
XUL Web Services



AJAX is nowadays one of the most common used words in the WEB 2.0 era. While the historic remains of it are not really clear (similar logic to manipulate parts of a webpage was already thought of as DHTML, long before the term AJAX existed and suprisingly even using some type of DOM later on) it is now one of the most important technologies used by modern webdesigners.
But what does AJAX mean? - In short, AJAX stands for Asynchronous JavaScript and XML. It describes a concept of asynchronous data transfer (here: data encapsulated in XML) between the client (usually a webbrowser) and a server to only exchange/ alter a part of the webpage without the need of a full pagereload. That means the browser will issue an XMLHttpRequest in the background and receive only a part of the page - usually tied to one or more html-tags holding uids.

The following are the main components of the Ajax programming pattern.

  • JavaScript - The most popular scripting language on the Web and supported by all major browsers. Ajax applications are built in JavaScript.
  • Document Object Model (DOM) - Defines the structure of a web page as a set of programmable objects. In Ajax programming, the DOM allows us to redraw portions of the page.
  • Cascading Style Sheets (CSS) - Provides a way to define the visual appearance of elements on a web page.
  • XMLHttpRequest - Allows a client-side script to perform an HTTP request, effectively eliminating a full-page refresh or postback in Ajax applications.
  • XML - It is sometimes used as the format for transferring data between the server and client, but other text-based formats work as well.

Ajax offers a number of advantages. Some of the most important are listed below.

  • Lower demand of bandwidth: The fact that it is not necessary to re-load a page completely when additional information is requested allows minimizing the data transfer. The demand of bandwidth is also reduced by producing HTML locally within the browser (note: however as we have additional overhead produced by the embedding of the JavaScript this is only true in case of more than one or two pagerequests from the same site).
  • Browser plug-in not necessary: Ajax runs with every browser which supports JavaScript. There is no additional plug-in needed. This is an advantage over technologies like Shockwave or Flash (note: in some cases however it is possible that some browsers behave different; especially IE prior to version 6 is known for odd behaviour).
  • Separation of data and formats: This allows the web application to be more efficient. Programmers can separate the methods and formats for delivering information over the web. So they can use a language they are familiar with (note: the reason for this is CSS and not AJAX).
  • Websites more user-friendly: Because of the minimized data transfer the response to the actions of the user is much faster. Furthermore interfaces built with Ajax can be more user-friendly (note: however, AJAX without fallback to plain-vanilla HTML request-response cycle is infamous for beeing a big drawback to barrier free web design).

Classic RequestResponse vs. AJAX Cycle[edit]

As you can see in the image, the AJAX cycle embedds an additional JavaScript library into the client side. The JS lib therefor is used to communicate with the server (in case the AJAX is in use) as well as manipulating the HTML page it is embedded on. For a small example well now take a look at a so called AutoComplete (we take a look at basic procesing there and skip the detailed JS-DOM manipulating). The traditional approach on the other hand allways requires a full request-response cycle that sends the whole page from the server to the browser.

A Simple Example: AutoComplete[edit]

This example shows a simple AutoComplete textfield from the wicket examples (wicket is a component oriented JavaWebFramework under the hood of the ASF - http://wicket.apache.org/). The example is online live here so you can not only follow the code but rather see it live in action.

Xml-autocomplete.png


The idea behind an AutoComplete textfield is to aid users by showing useful possibilites during the filling of the field. Imagine you are at amazon.com looking for a product "foo" and you fill it into the search bar just to find out that it doesnt exist after submitting it - with an AutoComplete aware field you would already have known that after some letters. To have an easy example we now will look at a single field where you can enter the names of countries like "England", "Germany" or "Austria".

The HTML behind it is rather easy (the necessary JavaScript is automatically provided by wicket; similar to what pure JS libs like prototype do; you could of course provide your own implementation, even if this would not really make sense):


... header containing CSS + HTML-head left out...
The textfield below will autocomplete country names. It utilizes AutoCompleteTextField in wicket-extensions.<br/><br/>

        <form wicket:id="form">
            Country: <input type="text" wicket:id="ac" size="50"/>
        </form>
...footer left out...

So we currently only got a simple form holding a plain <input />. The "wicket:id" is only for tying it to the Java code and has no impact on AJAX (in fact in production mode it will be stripped out).

The Java is also not too complicated:

public class AutoCompletePage extends BasePage
{
    /**
     * Constructor of the AutoCompletePage
     */
    public AutoCompletePage()
    {
        Form form = new Form("form");
        add(form);

        final AutoCompleteTextField field = new AutoCompleteTextField("ac", new Model(""))
        {
            protected Iterator getChoices(String input)
            {
                if (Strings.isEmpty(input))
                {
                    return Collections.EMPTY_LIST.iterator();
                }

                List choices = new ArrayList(10);

                Locale[] locales = Locale.getAvailableLocales();

                for (int i = 0; i < locales.length; i++)
                {
                    final Locale locale = locales[i];
                    final String country = locale.getDisplayCountry();

                    if (country.toUpperCase().startsWith(input.toUpperCase()))
                    {
                        choices.add(country);
                        if (choices.size() == 10)
                        {
                            break;
                        }
                    }
                }

                return choices.iterator();
            }
        };
        form.add(field);

        ...more Java here, but not needed for this simple example case...

    }
}

So we see here a plain page that gets a Form attached. That Form on the other side holds an Ajaxified version of a TextField. The protected Iterator getChoices(String input) is called after hitting a key (entering some value into the field by using your keyboard) by an AJAX call (we see this later) - meaning this function is the representation of the business logic for the AJAX. Here we only check if we already have sth. entered (user may delete sth.) and if it is, then if there are countries existing that start with the already entered letters (e.g.: if you enter Aus it will find coutries like Austria and Australia).

The resulting WebPage will be this:

<html>
<head>
<script type="text/javascript"><!--/*--><![CDATA[/*><!--*/

var clientTimeVariable = new Date().getTime();

/*-->]]>*/</script>


	...title + css stripped out...
       <script type="text/javascript" src="resources/org.apache.wicket.markup.html.WicketEventReference/wicket-event.js"></script>
       <script type="text/javascript" src="resources/org.apache.wicket.ajax.WicketAjaxReference/wicket-ajax.js"></script>
       <script type="text/javascript" src="resources/org.apache.wicket.ajax.AbstractDefaultAjaxBehavior/wicket-ajax-debug.js"></script>
  
<script type="text/javascript" src="resources/org.apache.wicket.extensions.ajax.markup.html.autocomplete.AutoCompleteBehavior/wicket-autocomplete.js"></script>
<script type="text/javascript" ><!--/*--><![CDATA[/*><!--*/
Wicket.Event.add(window, "domready", function() { new Wicket.AutoComplete('i1','?wicket:interface=:1:form:ac::IActivePageBehaviorListener:1:&amp;wicket:ignoreIfNotActive=true',false);;});
/*-->]]>*/</script>

</head>
<body>
    
   ...head stripped out...
    

		The textfield below will autocomplete country names. It utilizes AutoCompleteTextField in wicket-extensions.<br/><br/>

		<form action="?wicket:interface=:1:form::IFormSubmitListener::" method="post" id="i2"><div style="display:none"><input type="hidden" name="i2_hf_0" id="i2_hf_0" /></div>
			
			Country: <input value="" autocomplete="off" type="text" size="50" name="ac" onchange="var 
                        wcall=wicketSubmitFormById('i2', '?wicket:interface=:1:form:ac::IActivePageBehaviorListener:3:&amp;wicket:ignoreIfNotActive=true', null,null,null, function() 
                        {return Wicket.$$(this)&amp;&amp;Wicket.$$('i2')}.bind(this));;" id="i1"/>
		</form>

		
<script type="text/javascript"><!--/*--><![CDATA[/*><!--*/

window.defaultStatus='Server parsetime: 0.0070s, Client parsetime: ' + (new Date().getTime() - clientTimeVariable)/1000 +  's';

/*-->]]>*/</script>

</body>
</html>

So we now got our html decorated with a bunch of JS resources (holding the DOM parser, transformer and so on) as well as a JS behaviour to our <input> field using the onchange="..." JS method.

If you now start entering some chars like "au" into the field, the onchange event is triggered and will call the wicketSubmitFormById() method issueing a call to the server and receiving XML:


1  INFO: focus set on i4
2  INFO:
3  INFO: Initiating Ajax GET request on ?wicket:interface=:1:form:ac::IActivePageBehaviorListener:1:&wicket:ignoreIfNotActive=true&q=au&random=0.9530900388300743
4  INFO: Invoking pre-call handler(s)...
5  INFO: Received ajax response (85 characters)
6  INFO:
<ul><li textvalue="Austria">Austria</li><li textvalue="Australia">Australia</li></ul>
7  INFO:

In line 1 the focus on the field (here with uid i4) was set. After we entered "au" into the field in line 3 an AJAX request to the server is issued. Line 4+5 illustrate the pre-call handlers and the receiving of the AJAX response. Line 6 displays the already decoded content of the response, holding a <ul> with the 2 expected countries that are starting with "au" and that are now placed by the JS header libs at the appropriate place on the page.

You now have seen a small, simple example in the big world of AJAX. To really understand it you should watch and use it live (dont forget to hit the "wicket AJAX debug" link on the right lower corner, so you can see the communication). Under http://wicketstuff.org/wicket13/ajax/ you'll find plenty more running examples all with code.



Web Services



Previous Chapter Next Chapter
AJAX XMLHTTP



Learning objectives

  • To understand web services, and what they can do.
  • To know what SOAP is, and what web services use it for.
  • To know what Web Services Description Language is, and how to read a WSDL file.
  • To know what the Universal Description, Discovery, and Integration (UDDI) standard is.
  • To understand how to use Java to connect to a web service.

sponsored by:

The University of Georgia

Terry College of Business

Department of Management Information Systems

Web Services Overview[edit]

Web Services are a new breed of Web application. They are self-contained, self-describing, modular applications that can be published, located, and invoked across the Web. Web services perform functions, which can be anything from simple requests to complicated business processes. Once a Web service is deployed, other applications (and other Web services) can discover and invoke the deployed service. Web services make use of XML to describe the request and response, and HTTP as its network transport.

The primary difference between a Web Service and a web application relates to collaboration. Web applications are simply business applications which are located or invoked using web protocols. Similarly, Web Services also perform computing functions remotely over a network. However, Web Services use internet protocols with the specific intent of enabling inter operable machine to machine coordination.

Web Services have emerged as a solution to problems associated with distributed computing. Distributed computing is the use of multiple systems to perform a function rather than having a single system perform it. The previous technologies used in distributed computing, primarily Common Object Request Broker Architecture (CORBA) and Distributed Component Object Model (DCOM), had some limitations. For example, neither has achieved complete platform independence or easy transport over firewalls. Additionally, DCOM is not vendor independent, being a Microsoft product.

Some of the primary needs for a distributed computing standard were:

  • Cross-platform support for Business to Business, as well as internal, communication.
  • Concordance with existing Internet infrastructure as much as possible.
  • Scalability, both in number and complexity of nodes.
  • Internalization.
  • Tolerance of failure.
  • Vendor independence.
  • Suitability for trivial and non-trivial requests.

Over time, business information systems became highly configured and differentiated. This inevitably made system interaction extremely costly and time consuming. Developers began realizing the benefits of standardizing Web Service development. Using web standards seemed to be an intuitive and logical step toward attaining these goals. Web standards already provided a platform independent means for system communication and were readily accepted by information system users.

The end result was the development of Web Services. A Web Service forms a distributed environment, in which objects can be accessed remotely via standardized interfaces. It uses a three-tiered model, defining a service provider, a service consumer, and a service broker. This allows the Web Service to be a loose relationship, so that if a service provider goes down, the broker can always direct consumers to another one. Similarly, there are many brokers, so consumers can always find an available one. For communication, Web Services use open Web standards: TCP/IP, HTTP, and XML based SOAP.

At higher levels technologies such as XAML, XLANG, (transactional support for complex web transactions involving multiple web services) and XKMS (ongoing work by Microsoft and Verisign to support authentication and registration) might be added.

SOAP[edit]

SOAP structure

Simple Object Access Protocol (SOAP) is a method for sending information to and from Web Services in an extensible format. SOAP can be used to send information or remote procedure calls encoded as XML. Essentially, SOAP serves as a universally accepted method of communication with web services. Businesses adhere to the SOAP conventions in order to simplify the process of interacting with Web Services.

<SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/">
 <SOAP:Header>

  <!-- SOAP header -->

 </SOAP:Header>
 <SOAP:Body SOAP:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">

  <!-- SOAP body -->

 </SOAP:Body>
</SOAP:Envelope>

A SOAP message contains either a request method for invoking a Web Service, or contains response information to a Web Service request.

Adhering to this layout when developing independent Web Services provides notable benefits to the businesses. Due to the fact that Web Applications are designed to be utilized by a myriad of actors, developers want them to be easily adoptable. Using established and familiar standards of communication ultimately reduces the amount of effort it takes users to effectively interact with a Web Service.

The SOAP Envelope is used for defining and organizing the content contained in Web Service messages. Primarily, the SOAP envelope serves to indicate that the specified document will be used for service interaction. It contains an optional SOAP Header and a SOAP Body. Messages are sent in the SOAP body, and the SOAP head is used for sending other information that wouldn't be expected in the body. For example, if the SOAP:actor attribute is present in the SOAP header, it indicates who the recipient of the message should be.

A web service transaction involves a SOAP request and a SOAP response. The example we will be using is a Web Service provided by Weather.gov. The input is latitude, longitude, a start date, how many days of forecast information desired, and the format of the data. The SOAP request will look like this:

  <?xml version="1.0" encoding="UTF-8" standalone="no"?/>
  <SOAP-ENV:Envelope
      SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" 
      xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
     
     <SOAP-ENV:Body>
        <m:NDFDgenByDayRequest xmlns:SOAPSDK1="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl">
           <latitude xsi:type="xsd:decimal">33.955464</latitude>
           <longitude xsi:type="xsd:decimal">-83.383245</longitude>
           <startDate xsi:type="xsd:date"></startDate>
           <numDays xsi:type="xsd:integer">1</numDays>
           <format>24 Hourly</format>
        </m:NDFDgenByDayRequest>
     </SOAP-ENV:Body>
     
  </SOAP-ENV:Envelope>

The startDate was left empty because this will automatically get the most recent data. The format data type is not defined because it is defined in the WSDL document.

The response SOAP looks like this.

  <?xml version="1.0" encoding="UTF-8" standalone="no"?/>
  <SOAP-ENV:Envelope
      SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" 
      xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
     
     <SOAP-ENV:Body>
        <NDFDgenByDayResponse xmlns:SOAPSDK1="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl">
           <dwmlByDayOut xsi:type="xsd:string">.....</dwmlByDayOut>
        </NDFDgenByDayResponse>
     </SOAP-ENV:Body>
     
  </SOAP-ENV:Envelope>

SOAP handles data by encoding it on the sender side and decoding it on the receiver side. The data types handled by SOAP are based on the W3C XML Schema specification. Simple types include strings, integers, floats, and doubles, while compound types are made up of primitive types.

  <element name="name" type="xsd:string" />
  <SOAP:Array SOAP:arrayType="xsd:string[2]">
     <string>Web</string>
     <string>Services</string>
  </SOAP:Array>

Because they are text based, SOAP messages generally have no problem getting through firewalls or other barriers. They are the ideal way to pass information to and from web services.

Service Description - WSDL[edit]

Web Service Description Language (WSDL) was created to provide information about how to connect to and query a specific Web Service. This document also adheres to strict formatting and organizational guidelines. However, the methods, parameters, and service information are application specific. Web Services perform different functionality and contain independent information, however they are all organized the same way. By creating a standard organizational architecture for these services, developers can effectively invoke and utilize them with little to no familiarization. To use a web service, a developer can follow the design standards of the WSDL to easily determine all the information and procedures associated with its usage.

Essentially, a WSDL document serves as an instruction for interacting with a Web Service. It contains no application logic, giving the service a level of autonomy. This enables users to effectively interact with the service without having to understand its inner workings.

The following is an example of a WSDL file for a web service that provides a temperature, given a U.S. zip code.

<?xml version="1.0"?>
<definitions xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" 
xmlns:si="http://soapinterop.org/xsd" xmlns:tns="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
xmlns:typens="http://www.weather.gov/forecasts/xml/DWMLgen/schema/DWML.xsd" xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" 
xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns="http://schemas.xmlsoap.org/wsdl/" 
targetNamespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl>
 
 <types>
    <xsd:schema targetNamespace="http://www.weather.gov/forecasts/xml/DWMLgen/schema/DWML.xsd">
       <xsd:import namespace="http://schemas.xmlsoap.org/soap/encoding/" />
       <xsd:import namespace="http://schemas.xmlsoap.org/wsdl/" />
       <xsd:simpleType name="formatType">
          <xsd:restriction base="xsd:string">
             <xsd:enumeration value="24 hourly" />
             <xsd:enumeration value="12 hourly" />
          </xsd:restriction>
       </xsd:simpleType>
       <xsd:simpleType name="productType">
          <xsd:restriction base="xsd:string">
             <xsd:enumeration value="time-series" />
             <xsd:enumeration value="glance" />
          </xsd:restriction>
       </xsd:simpleType>
       <xsd:complexType name="weatherParametersType">
          <xsd:all>
             <xsd:element name="maxt" type="xsd:boolean" />
             <xsd:element name="mint" type="xsd:boolean" />
             <xsd:element name="temp" type="xsd:boolean" />
             <xsd:element name="dew" type="xsd:boolean" />
             <xsd:element name="pop12" type="xsd:boolean" />
             <xsd:element name="qpf" type="xsd:boolean" />
             <xsd:element name="sky" type="xsd:boolean" />
             <xsd:element name="snow" type="xsd:boolean" />
             <xsd:element name="wspd" type="xsd:boolean" />
             <xsd:element name="wdir" type="xsd:boolean" />
             <xsd:element name="wx" type="xsd:boolean" />
             <xsd:element name="waveh" type="xsd:boolean" />
             <xsd:element name="icons" type="xsd:boolean" />
             <xsd:element name="rh" type="xsd:boolean" />
             <xsd:element name="appt" type="xsd:boolean" />
          </xsd:all>
       </xsd:complexType>
    </xsd:schema>
</types>

<message name="NDFDgenRequest">  
   <part name="latitude" type="xsd:decimal"/>
   <part name="longitude" type="xsd:decimal" />
   <part name="product" type="typens:productType" />
   <part name="startTime" type="xsd:dateTime" />
   <part name="endTime" type="xsd:dateTime" />
   <part name="weatherParameters" type="typens:weatherParametersType" />
</message>

<message name="NDFDgenResponse">
   <part name="dwmlOut" type="xsd:string" />
</message>

<message name="NDFDgenByDayRequest">  
   <part name="latitude" type="xsd:decimal" />
   <part name="longitude" type="xsd:decimal" />
   <part name="startDate" type="xsd:date" />
   <part name="numDays" type="xsd:integer" />
   <part name="format" type="typens:formatType" />
</message>

<message name="NDFDgenByDayResponse">
   <part name="dwmlByDayOut" type="xsd:string" />
</message>

<portType name="ndfdXMLPortType">
   <operation name="NDFDgen">
      <documentation> Returns National Weather Service digital weather forecast data </documentation>
      <input message="tns:NDFDgenRequest" />
      <output message="tns:NDFDgenResponse" />
   </operation>
   <operation name="NDFDgenByDay">
      <documentation> Returns National Weather Service digital weather forecast data summarized over either 24- or 12-hourly periods </documentation>
      <input message="tns:NDFDgenByDayRequest" />
      <output message="tns:NDFDgenByDayResponse" />
   </operation>
</portType>

<binding name="ndfdXMLBinding" type="tns:ndfdXMLPortType">
   <soap:binding style="rpc" transport="http://schemas.xmlsoap.org/soap/http" />
   <operation name="NDFDgen">
      <soap:operation soapAction="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl#NDFDgen" style="rpc" />
      <input>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </input>
      <output>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </output>
   </operation>
   <operation name="NDFDgenByDay">
      <soap:operation soapAction="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl#NDFDgenByDay" style="rpc" />
      <input>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </input>
      <output>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </output>
   </operation>
</binding>

<service name="ndfdXML">
   <documentation>The service has two exposed functions, NDFDgen and NDFDgenByDay. 
                        For the NDFDgen function, the client needs to provide a latitude and 
                        longitude pair and the product type. The client also needs to provide 
                        the start and end time of the period that it wants data for. For the 
                        time-series product, the client needs to provide an array of boolean values 
                        corresponding to which weather values should appear in the time series product. 
                        For the NDFDgenByDay function, the client needs to provide a latitude and longitude 
                        pair, the date it wants to start retrieving data for and the number of days worth 
                        of data. The client also needs to provide the format that is desired.</documentation>
   <port name="ndfdXMLPort" binding="tns:ndfdXMLBinding">
      <soap:address location="http://www.weather.gov/forecasts/xml/SOAP_server/ndfdXMLserver.php" />
   </port>
  </service>
</definitions>

The WSDL file defines a service, made up of different endpoints, called ports. The port is made up of a network address and a binding.

<service name="ndfdXML">
   <documentation>The service has two exposed functions, NDFDgen and NDFDgenByDay. 
                        For the NDFDgen function, the client needs to provide a latitude and 
                        longitude pair and the product type. The client also needs to provide 
                        the start and end time of the period that it wants data for. For the 
                        time-series product, the client needs to provide an array of boolean values 
                        corresponding to which weather values should appear in the time series product. 
                        For the NDFDgenByDay function, the client needs to provide a latitude and longitude 
                        pair, the date it wants to start retrieving data for and the number of days worth 
                        of data. The client also needs to provide the format that is desired.</documentation>
   <port name="ndfdXMLPort" binding="tns:ndfdXMLBinding">
      <soap:address location="http://www.weather.gov/forecasts/xml/SOAP_server/ndfdXMLserver.php" />
   </port>
</service>

The binding identifies the binding style and protocol for each operation. In this case, it uses Remote Procedure Call style binding, using SOAP.

   <binding name="ndfdXMLBinding" type="tns:ndfdXMLPortType">
   <soap:binding style="rpc" transport="http://schemas.xmlsoap.org/soap/http" />
   <operation name="NDFDgen">
      <soap:operation soapAction="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl#NDFDgen" style="rpc" />
      <input>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </input>
      <output>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </output>
   </operation>
   <operation name="NDFDgenByDay">
      <soap:operation soapAction="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl#NDFDgenByDay" style="rpc" />
      <input>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </input>
      <output>
         <soap:body use="encoded" namespace="http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" 
             encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />
      </output>
   </operation>
</binding>

Port Types are abstract collections of operations. In this case, the operation is getTemp.

   <portType name="ndfdXMLPortType">
      <operation name="NDFDgen">
         <documentation> Returns National Weather Service digital weather forecast data     </documentation>
         <input message="tns:NDFDgenRequest" />
         <output message="tns:NDFDgenResponse" />
      </operation>
      <operation name="NDFDgenByDay">
         <documentation> Returns National Weather Service digital weather forecast data summarized over either 24- or 12-hourly periods </documentation>
         <input message="tns:NDFDgenByDayRequest" />
         <output message="tns:NDFDgenByDayResponse" />
      </operation>
   </portType>

Finally, messages are used by the operations to communicate - in other words, to pass parameters and return values.

   <message name="NDFDgenByDayRequest">  
      <part name="latitude" type="xsd:decimal" />
      <part name="longitude" type="xsd:decimal" />
      <part name="startDate" type="xsd:date" />
      <part name="numDays" type="xsd:integer" />
      <part name="format" type="typens:formatType" />
   </message>

   <message name="NDFDgenByDayResponse">
      <part name="dwmlByDayOut" type="xsd:string" />
   </message>

From the WSDL file, a consumer should be able to access data in a web service.

For a more detailed analysis of how this particular web service, please visit Weather.gov

Service Discovery - UDDI[edit]

You've seen how WSDL can be used to share interface definitions for Web Services, but how do you go about finding a Web Service in the first place? There are countless independent Web Services that are developed and maintained by just as many different organizations. Upon adopting Web Service practices and methodologies, developers sought to foster the involvement and creative reuse of their systems. It soon became apparent that there was a need for an enumerated record of these services and their respective locations. This information would empower developers to leverage the best practices and processes of Web Services quickly and easily. Additionally, having a central reference of current Web Service capabilities enables developers avoid developing redundant applications.

UDDI defines registries in which services can be published and found. The UDDI specification was creaed by Microsoft, Ariba, and IBM. UDDI defines a data structure and Application Programming Interface (API).

In the three-tier model mentioned before, UDDI is the service broker. Its function is to enable service consumers to find appropriate service providers.

Connecting to UDDI registries using Java can be accomplished through the Java API for XML Registries (JAXR). JAXR creates a layer of abstraction, so that it can be used with UDDI and other types of XML Registries, such as the ebXML Registry and Repository standard.

Using Java With Web Services[edit]

To execute a SOAP message, an application must be used to communicate with the service provider. Due to its flexibility, almost any programming language can be used to execute SOAP message. For our purposes, however, we will be focusing on using Java to interact with Web Services.

Using Java with web services requires some external libraries.

  • Apache SOAP Toolkit
  • Java Mail Framework
  • JavaBeans Activation Framework
  • Xerces XML parser

Let's go through using Java to query the Temperature Web Service we talked about earlier.

import java.io.*;
import java.net.*;
import java.util.*;
import org.apache.soap.util.xml.*;
import org.apache.soap.*;
import org.apache.soap.rpc.*;

public class TempClient
{

 public static float getTemp (URL url, String zipcode) throws Exception 
 {

  Call call = new Call ();

  // Service uses standard SOAP encoding
  String encodingStyleURI = Constants.NS_URI_SOAP_ENC;
  call.setEncodingStyleURI(encodingStyleURI);

  // Set service locator parameters
  call.setTargetObjectURI ("urn:xmethods-Temperature");
  call.setMethodName ("getTemp");

  // Create input parameter vector
  Vector params = new Vector ();
  params.addElement (new Parameter("zipcode", String.class, zipcode, null));
  call.setParams (params);

  // Invoke the service ....
  Response resp = call.invoke (url,"");

  // ... and evaluate the response
  if (resp.generatedFault ()) 
  {
   throw new Exception();
  } 
  else 
  {
   // Call was successful. Extract response parameter and return result
   Parameter result = resp.getReturnValue ();
   Float rate=(Float) result.getValue();
   return rate.floatValue();
  }
 }

 // Driver to illustrate service invocation
 public static void main(String[] args)
 {
  try
  {
   URL url=new URL("http://services.xmethods.net:80/soap/servlet/rpcrouter");
   String zipcode= "30605";
   float temp = getTemp(url,zipcode);
   System.out.println(temp);
  }
  catch (Exception e) 
  {
   e.printStackTrace();
  }
 }
}

This Java code effectively hides all the SOAP from the user. It invokes the target object by name and URL, and sets the parameter zipcode. But what does the underlying SOAP Request look like?

  <?xml version="1.0" encoding="UTF-8"?>
  <soap:Envelope xmlns:n="urn:xmethods-Temperature"
      xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
      xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
      xmlns:xs="http://www.w3.org/2001/XMLSchema" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

     <soap:Body soap:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
        <n:getTemp>
           <zipcode xsi:type="xs:string">30605</zipcode>
        </n:getTemp>
     </soap:Body>

  </soap:Envelope>

As you see, the SOAP request uses the parameters passed in by the Java Call to fill out the SOAP envelope and direct the message. Similarly, the response comes back into the Java program as '70.0'. The response SOAP is also hidden by the Java program.

  <?xml version='1.0' encoding='UTF-8'?>
  <SOAP-ENV:Envelope 
      xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xmlns:xsd="http://www.w3.org/2001/XMLSchema">

     <SOAP-ENV:Body>
        <ns1:getTempResponse xmlns:ns1="urn:xmethods-Temperature" 
            SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
           <return xsi:type="xsd:float">70.0</return>
        </ns1:getTempResponse>
     </SOAP-ENV:Body>

  </SOAP-ENV:Envelope>

Here's an additional example of using Java and SOAP to interact with Web Services. This particular Web Service is called the "US Zip Validator" and takes a ZipCode as a parameter, which then returns a corresponding latitude and longitude. When developing applications to interact with Web Services, the first step should be to review the WSDL document.

The WSDL document for this service is located here: http://www.webservicemart.com/uszip.asmx?WSDL

This document will contain all the necessary instructions for interacting with the "US Zip Validator" Web Service.

SOAPClient4XG

Modified by - Duncan McAllister From: http://www.ibm.com/developerworks/xml/library/x-soapcl/

import java.io.*;
import java.net.*;
import java.util.*;

public class SOAPClient4XG {
    
    public static void main(String[] args) throws Exception {

        args = new String[2];
        
        args[0] = "http://services.xmethods.net:80/soap/servlet/rpcrouter";
        args[1] = "SOAPrequest.xml";
        
        if (args.length  < 2) {
            System.err.println("Usage:  java SOAPClient4XG " +
                               "http://soapURL soapEnvelopefile.xml" +
                               " [SOAPAction]");
				System.err.println("SOAPAction is optional.");
            System.exit(1);
        }

        String SOAPUrl      = args[0];
        String xmlFile2Send = args[1];

		  String SOAPAction = "";

				
        // Create the connection where we're going to send the file.
        URL url = new URL(SOAPUrl);
        URLConnection connection = url.openConnection();
        HttpURLConnection httpConn = (HttpURLConnection) connection;

        // Open the input file. After we copy it to a byte array, we can see
        // how big it is so that we can set the HTTP Cotent-Length
        // property. (See complete e-mail below for more on this.)

        FileInputStream fin = new FileInputStream(xmlFile2Send);

        ByteArrayOutputStream bout = new ByteArrayOutputStream();
    
        // Copy the SOAP file to the open connection.
        copy(fin,bout);
        fin.close();

        byte[] b = bout.toByteArray();

        // Set the appropriate HTTP parameters.
        httpConn.setRequestProperty( "Content-Length",
                                     String.valueOf( b.length ) );
        httpConn.setRequestProperty("Content-Type","text/xml; charset=utf-8");
		  httpConn.setRequestProperty("SOAPAction",SOAPAction);
        httpConn.setRequestMethod( "POST" );
        httpConn.setDoOutput(true);
        httpConn.setDoInput(true);

        // Everything's set up; send the XML that was read in to b.
        OutputStream out = httpConn.getOutputStream();
        out.write( b );    
        out.close();

        // Read the response and write it to standard out.

        InputStreamReader isr =
            new InputStreamReader(httpConn.getInputStream());
        BufferedReader in = new BufferedReader(isr);

        String inputLine;

        while ((inputLine = in.readLine()) != null)
            System.out.println(inputLine);

        in.close();
    }

  // copy method from From E.R. Harold's book "Java I/O"
  public static void copy(InputStream in, OutputStream out) 
   throws IOException {

    // do not allow other threads to read from the
    // input or write to the output while copying is
    // taking place

    synchronized (in) {
      synchronized (out) {

        byte[] buffer = new byte[256];
        while (true) {
          int bytesRead = in.read(buffer);
          if (bytesRead == -1) break;
          out.write(buffer, 0, bytesRead);
        }
      }
    }
  } 
}

This Java class refers to an XML document(SOAPRequest.xml), which is used as the SOAP message. This document should be included in the same project folder as the Java application invoking the service.

After reviewing the "US Zip Validator" WSDL document, it is clear that we would like to invoke the "getTemp" method. This information is contained within the SOAP body and includes the appropriate parameters.

SOAPRequest.xml

<?xml version="1.0" encoding="UTF-8"?>
  <soap:Envelope xmlns:n="urn:xmethods-Temperature"
      xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
      xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
      xmlns:xs="http://www.w3.org/2001/XMLSchema" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

     <soap:Body soap:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
        <n:getTemp>
           <zipcode xsi:type="xs:string">30605</zipcode>
        </n:getTemp>
     </soap:Body>

  </soap:Envelope>

Following a successful interaction, the Web Service provider will provide a response that is similar in format to the user request. When developing in NetBeans, run this project and examine the subsequent SOAP message response in the Tomcat output window.

Web Services with Netbeans[edit]

The Netbeans version used for this explanation is 5.0.

After Netbeans is open, click on the "Runtime" tab on the left pane, then right-click "Web Services" and select "Add Web Service." In the "URL" field, enter the address of the web service WSDL file, in our example above it is "http://www.weather.gov/forecasts/xml/DWMLgen/wsdl/ndfdXML.wsdl" and click Get Web Service Description. This will bring up the information of the web service.



Summary[edit]

Web services are applications that use XML to communicate with many different systems to perform a task. To facilitate the use of web services, protocols were developed that allow them to be flexible and scalable. SOAP is used to send and define information and WSDL was created to provide information about how to connect to and query a web service. UDDI describes where these web services can be found.

Exercises[edit]

Answers[