XML - Managing Data Exchange/Print version
This is the print version of XML - Managing Data Exchange You won't see this message or any elements not part of the book's content when you print or preview this page. |
The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/XML_-_Managing_Data_Exchange
To do
Game theory is the study of mathematical models of strategic interaction among rational decision-makers .
[1] It has applications in all fields of social science , as well as in logic,
systems science and computer science . Originally, it addressed zero-sum games , in which each participant's gains or losses are exactly balanced by those of the other participants. Today [ when? ] , game theory applies to a wide range of behavioral relations, and is now an umbrella term for the science of logical decision making in humans, animals, and computers.
Modern game theory began with the idea of mixed-strategy equilibria in two-person zero-sum games and its proof by John von Neumann . Von Neumann's original proof used the Brouwer fixed-point theorem on continuous mappings into compact convex sets , which became a standard method in game theory and
mathematical economics . His paper was followed by the 1944 book Theory of Games and Economic Behavior , co-written with Oskar Morgenstern, which considered cooperative games of several players. The second edition of this book provided an axiomatic theory of expected utility, which allowed mathematical statisticians and economists to treat decision-making under uncertainty.
Game theory was developed extensively in the 1950s by many scholars. It was explicitly applied to biology in the 1970s, although similar developments go back at least as far as the 1930s. Game theory has been widely recognized as an important tool in many fields. As of 2014, with the Nobel Memorial Prize in Economic Sciences going to game theorist Jean Tirole , eleven game theorists have won the economics Nobel Prize. John Maynard Smith was awarded the Crafoord Prize for his application of game theory to biology.
Current To-Dos (January 28, 2007 and later)
[edit | edit source]Add template to all subpages, using the following code:{{XML-MDE}}
- Come up with a better design for template.
Make sure navigation links are added to the top of every chapter.(Navigation fixed in book template)- Group chapters by topic -- any suggestions for grouping schemes?
- I was thinking "Principles of XML," "Languages derived from XML," and "XML in Applications" (the last category referencing mainly AJAX) -- Runnerupnj
- Provide links from a chapter to the exercises it covers, and vice-versa.
- Mend links to previous module main page.
- Separate exercise questions and answers.
- Break chapters into shorter sections.
- Create a glossary with links from within the book.
- Create a Ajax Page. - There is no page here for Ajax help with XML!
- We can link the AJAX book here.
- Create a PDF version available from Wikibooks
To-Dos Previous to January 28, 2007
[edit | edit source]These to-dos remained from before the project was reinvigorated, and rested for a brief period on the Talk page. -- Runnerupnj 08:41, 29 January 2007 (UTC)
The list is in no particular priority
- Convert all code examples to the format specified in Author guidelines
- A print version to make reading easier
- Breaking chapters into shorter sections
- Hints for common problems (hint box)
- FAQs
- a "Common Errors" section near the exercise section. That way when future students run into problems in the exercises, especially the stylesheets, they can hopefully find a common error and fix their problem quickly
- Glossary with links from within the book
- Chapter 2 on XHTML (move later chapter and make complete)
- Exercises and answers on separate pages (tell people how to open a second copy and use it – end of chapter 1)
- Good XML editor
- Check all answers (also indicate who validated the answer with person’s email)
- Major league baseball exercise
- Develop an XML schema to show the organization of Major League Baseball. There are many teams within MLB and the teams are all composed of different athletes.
- Set up the XML Document with a Division of either the American League or the National League. Enter a representative data into the document to justify your answer.
- Organize the XML stylesheet to nicely display the data.
- Move all Java parsing to a separate chapter
- Write BlueJ as per database access for XML parsing
- Move exercise 4 from chapter 3 (one-to-many relationship) and place it in chapter 5 (many-to-many relationship). My reason for this is as follows:
- This problem asks you to create a personal library. As we learned earlier a library can have many books and books have many copies. There can be many different people who check out books, however, what they actually check out are copies of books making this a many-to-many relationship since a borrower can check out many copies of a book. I feel like this exercise is a little misleading and would be better off in ch. 5. Most people who have had any experience in data modeling and are trying to learn XML from this book would be confused by this exercise (i.e. myself). It's hard to do something that you haven't learned how to do yet.
- Comments in the code and not elsewhere
- Instead of giving a complete explanation for an example of an xml/xls/xsd after the problem, explain each piece of the code as you go through it. Or after a given solution, repeat the entire line of code that you are trying to explain. I've found this layout in other technology related books, and it has been easier to follow along. Also, when referring to a table or different section of the book, create a bookmark or link to that section. Could this be in the instructions for authors and what else could we add.
- Instructions on how to convert XML to HTML with NetBeans and any other editor
- Convert all slides to DocBook slide format
- Chapter on XQuery
- Chapter on Lenya
- Spellcheck the book on a regular basis
Preface
![]() |
XML - Managing Data Exchange
|
Related Topics
|
Get Involved
|
Next Chapter | |
Introduction to XML → |
Goals
[edit | edit source]Book
[edit | edit source]The goal of this book is to provide a comprehensive coverage of eXtensible Markup Language (XML) in a textbook format. This book is written and edited by students for students. Each student who uses the book should improve its quality by correcting errors, adding exercises, adding examples, starting new chapters and so forth.
Chapters 2 through 6 take the perspective that an XML schema is a representation of a data model, and thus these chapters deal with mapping the complete range of relationships that occur between entities. As you learn how to convert each type of relationship into a schema, other aspects of XML are introduced. For example, stylesheets are initially introduced in Chapter 2 and progressively more stylesheet features are added in Chapters 3 through 6.
Consolidation chapters (e.g., Chapter 7 "Data schemas") bring together the material covered across previous chapters; in this case, Chapters 2 through 6. This means students see key skills twice: once in the context of gradually developing their broad understanding of XML and then again in the specific context of one dimension of XML.
Application chapters cover particular uses of XML (e.g., SVG for scalable vector graphics) to give the reader examples of the use of XML to solve particular types of problems. This part of the book is expected to grow as the use of XML extends.
Project
[edit | edit source]Professors typically throw away their students’ projects at the end of the term. This is a massive waste of intellectual resources that can be harnessed for the betterment of many by creating an appropriate infrastructure. In our case, we use wiki technology as the infrastructure to create a free open content textbook.
University students are an immense untapped global resource. They can be engaged in creating open textbooks if the right infrastructure is in place to sustain renewable student projects. This book is an example of how waste can be avoided.
History
[edit | edit source]- Graduate students at the University of Georgia started writing this book in January 2004. They were students in an Advanced Data Management class, and most were studying for a Masters in Internet Technology.
- Students at two German Universities, the University of Passau and the Martin-Luther University Halle-Wittenberg, added material to the first few chapters in May, 2004.
- A Chinese translation was started in mid 2004 by Dr. Xu Zhengchuan of Fudan University in Shanghai.
- An Italian translation was started in late 2004 by Jubjub68.
- Students in Data Management classes at the University of Georgia use the book each semester and continue to improve it.
- In the first semester of 2006, the Advanced Data Management class at the University of Georgia undertook a complete review of the book to improve quality and consistency.
- 2006-Aug-31: "Global Text Project aims to create free, Wiki-based textbooks for developing nations": press release links directly to http://en.wikibooks.org/wiki/XML .
- http://globaltext.org/ links directly to http://en.wikibooks.org/wiki/XML .
Software
[edit | edit source]To complete the exercises in the book and view the slides, you will need access to the following software (or a suitable alternative):
- Java for NetBeans
- NetBeans for XML editing, validation, and transformation
- MySQL
- OpenOffice
- Firefox
Introduction to XML
Learning Objectives
|
There are four central problems in data management: capture, storage, retrieval, and exchange of data.
The purpose of this book is to address XML, a technology for managing data exchange. The foundational XML chapters in this book are structured by a 'data model' approach. The first chapter introduces the reader to the XML document, XML schema, and XML stylesheet with a single entity example. Subsequent chapters expand upon the XML basics with multiple-entity examples and a one-to-one relationship, a one-to-many relationship, or a many-to-many relationship.
XML is a tool used for data exchange. Data exchange has long been an issue in information technology, but the Internet has elevated its importance. Electronic data interchange (EDI), the traditional data exchange standard for large organizations, is giving way to XML, which is likely to become the data exchange standard for all organizations, irrespective of size.
EDI supports the electronic exchange of standard business documents and is currently the major data format for electronic commerce. A structured format is used to exchange common business documents (e.g., invoices and shipping orders) between trading partners. In contrast to the free form of e-mail messages, EDI supports the exchange of repetitive, routine business transactions. Standards mean that routine electronic transactions can be concise and precise. The main standard used in the United States and Canada is known as X.12, and the major international standard is UN/EDIFACT. Firms adhering to the same standard can share data electronically.
The Internet is a global network potentially accessible by nearly every firm, with communication costs typically less than those of traditional EDI. Consequently, the Internet has become the electronic transport path of choice between trading partners. The simplest approach is to use the Internet as a means of transporting EDI documents. But because EDI was developed in the 1960s, another approach is to reexamine the technology of data exchange. A result of this rethinking is XML, but before considering XML we need to learn about SGML, the parent of XML.
SGML
[edit | edit source]For a typical U.S. firm, it is estimated that document management consumes up to 15 percent of its revenue, nearly 25 percent of its labour costs, and anywhere between 10 and 60 percent of an office worker’s time. The Standard Generalized Markup Language (SGML) is designed to reduce the cost and increase the efficiency of document management.
A markup language embeds information about a document within the document's text. In the following example, the markup tags indicate that the text contains details of a city. Note also that the city's name, state, and population are identified by specific tags. Thus, the reader—a person or a computer—is left in no doubt as to meaning of Athens, Georgia, or 100,000. Note also the latitude and location of the city are explicitly identified with appropriate tags. SGML’s usefulness is based upon both recording text and the meaning of that text.
Exhibit 1: Markup language
<city>
<cityname>Athens</cityname>
<state>GA</state>
<description> Home of the University of Georgia</description>
<population>100,000</population>
<location>Located about 60 miles Northeast of Atlanta</location>
<latitude>33 57' 39" N</latitude>
<longitude>83 22' 42" W</longitude>
</city>
SGML is a vendor-independent International Standard (ISO 8879) that defines the structure of documents. Developed in 1986 as a meta language, SGML is the parent of both HTML and XML. Because SGML documents are standard text files, SGML provides cross-system portability. When technology is rapidly changing, SGML provides a stable platform for managing data exchange. Furthermore, SGML files can be transformed for publication in a variety of media. The use of SGML preserves textual information independent of how and when it is presented. Organizations reap long-term benefits when they can store documents in a single, independent standard that can then be converted for display in any desired media.
SGML has three major advantages for data management:
- Reuse: Information can be created once and reused many times.
- Flexibility: SGML documents can be published in any format. The same content can be printed, presented on the Web, or delivered with a text synthesis. Because SGML is content-oriented, presentation decisions can be delayed until the output format is decided.
- Revision: SGML supports revision and version control. With content version control, a firm can readily track the changes in documents.
A short section of SGML demonstrates clearly the features and strength of SGML (see Exhibit 2). The tags surrounding a chunk of text describe its meaning and thus support presentation and retrieval. For example, the pair of tags <airline> and </airline> surrounding “Delta” identify the airline making the flight.
Exhibit 2: SGML example
<flight>
<airline>Delta</airline>
<flightno>22</flightno>
<origin>Atlanta</origin>
<destination>Paris</destination>
<departure>5:40pm</departure>
<arrival>8:10am</arrival>
</flight>
The preceding SGML code can be presented in several ways by applying a style sheet to the file. For example, it might appear as
Delta flight 22 flies from Atlanta to Paris leaving 5:40pm and arriving 8:10am
or as
Airline | Flight | Origin | Destination | Departure | Arrival |
Delta | 22 | Atlanta | Paris | 5:40pm | 8:10am |
If the data are stored in HTML format and rendered on a Web site (as in Exhibit 3), then the meaning of the data has to be inferred by the reader. This is generally quite easy for humans, but impossible for machines. Furthermore, the presentation format is fixed and can only be altered by rewriting the HTML. If you are not familiar with HTML, you should read the WikiBooks chapter on XHTML, an extension of HTML, before reading the next chapter.
Exhibit 3: HTML rendering example
Delta flight 22 flies from Atlanta to Paris leaving 5:40pm and arriving 8:10am
Meaning and presentation should be independent, and this is an important reason why SGML is more powerful than HTML.
SGML is a markup language that defines the structure of documents and is preferred to HTML as it can be transformed into a variety of media.
XML
[edit | edit source]Many computer systems contain data in incompatible formats. A time-consuming challenge is to exchange data between such systems. XML is a generic data storage format that comes bundled with a number of tools and technologies that should make it easier to exchange specific XML 'applications' between incompatible systems. Since XML is open and generic, it is expected that as time progresses, more and more organizations and people will jump onto the XML bandwagon, both developers and data users. This should make XML the ultimate viable technology for certain types of data exchange.
XML is used not only for exchanging information, but also for publishing Web pages. XML's very strict syntax allows for smaller and faster Web browsers and as such is well suited for use with Personal Digital Assistants (PDAs) and cellphones. Web browsers that interpret HTML documents, on the other hand, are bloated with programming code to compensate for HTML’s not so strict coding.
The types of data generally well suited for encoding as XML are those where field lengths are unknown and unpredictable and where field contents are predominantly textual.
An XML schema allows for the exchange of information in a standardized structure. A schema defines custom markup tags that can contain attributes to describe the content that is enclosed by these tags. Information from the tagged data in the XML document can be extracted using an application called a “parser”, and with the use of an XML stylesheet the data can be formatted for a Web page.
XML's power lies in the combination of custom markup tags and content in a defined XML document. The purpose of eXtensible Markup Language (XML) is to make information self-describing. Based on SGML, XML is designed to support electronic commerce. The definition of XML, completed in early 1998 by the World Wide Web Consortium (W3C), describes it as a meta language — a language to generate languages. XML should steadily replace HTML on many Web sites because of some key advantages. The major differences between XML and HTML are captured in the following table.
Exhibit 4: XML vs HTML
XML | HTML |
Information content | Information presentation |
Extensible set of tags | Fixed set of tags |
Data exchange language | Data presentation language |
Greater hypertext linking | Limited hypertext linking |
The eXtensible in XML means that a new data exchange language can be created by defining its structure and tags. For example, the OpenGIS Consortium designed a Geography Markup Language (GML) to facilitate the electronic exchange of geographic information. Similarly, the Open Tourism Consortium is working on the definition of TourML to support exchange of tourism information. The insurance industry uses data corresponding to the XML based standard ACORD for electronic data exchange. Another good example of XML in action is NewsML™.
In this text we will cover all the features of XML, but at this point let us introduce a few of the key features.
Applications of XML:
Before we start learning more about how an XML document is structured, let us point out what XML can be used for. The four major implementations of XML are:
Publication: Database content can be converted into XML and afterwards into HTML by using an XSLT stylesheet. Making use of this technique, complex websites as well as print media like PDF files can be generated. Information no longer has to be stored in different formats (i.e. RTF, DOC, PDF, HTML). Content can be stored in the neutral XML format and then, using appropriate layout style sheets and transformations, brochures, websites, or datalists can be generated (See more in Chapter 17.)
An example of the capability of XML and XSLT can be found at http://www.emimusic.de: This website contains approximately 20,000 pages with profiles of the artists, their products and the titles of the songs. These pages are generated using a XSLT script. Based on the script used it will also be possible to create a catalog in PDF format. Please see below for more details.
Interaction: XML can be used for accessing and changing data interactively. This man<->machine communication usually happens via a web browser (see Chapter 12).
Integration: Using XML, homogenous and heterogenous applications can be integrated. In this case, XML is used to describe data, interfaces, and protocols. This machine-machine communication helps integrate relational databases (i.e. by importing and exporting different formats).
Transaction: XML helps to process transactions in applications like online marketplaces, supply chain management, and e-procurement systems.
Key features of XML
[edit | edit source]- Elements have both an opening and a closing tag
- Elements follow a strict hierarchy, with documents containing only one root element
- Elements cannot overlap other elements
- Element names must obey XML naming conventions
- XML is case sensitive
XML will improve the efficiency of data exchange in several important ways, which include:
- write once and format many times: Once an XML file is created it can be presented in multiple ways by applying different XML stylesheets. For instance, the information might be displayed on a web page or printed in a book.
- hardware and software independence: XML files are standard text files, which means they can be read by any application.
- write once and exchange many times: Once an industry agrees on a XML standard for data exchange, data can be readily exchanged between all members using that standard.
- Faster and more precise web searching: When the meaning of information can be determined by a computer (by reading the tags), web searching will be enhanced. For example, if you are looking for a specific book title, it is far more efficient for a computer to search for text between the pair of tags <booktitle> and </booktitle> than search an entire file looking for the title. Furthermore, spurious results should be eliminated.
- data validation XML allows data validation using XSD or DTD which is a contractual agreement between two interacting parties.
10 reasons to use XML
[edit | edit source]- XML is a widely accepted open standard.
- XML allows to clearly separate content from form (appearance).
- XML is text-oriented.
- XML is extensible.
- XML is self-describing.
- XML is universal; meaning internationalization is no problem.
- XML is independent from platforms and programming languages.
- XML provides a robust and durable format for information storage.
- XML is easily transformable.
- XML is a future-oriented technology.
The major XML elements
[edit | edit source]The major XML elements are:
- XML document: An XML file containing XML code.
- XML schema: An XML file that describes the structure of a document and its tags.
- XML stylesheet: An XML file containing formatting instructions for an XML file.
In the next few chapters you will learn how to create and use each of these elements of XML.
Creating a markup file
[edit | edit source]Any text editor can be used to create a markup file (e.g. an HTML file). In this book, we use the text editor within NetBeans, an open source Integrated Development Environment (IDE) for Java, because NetBeans supports editing and validation of XML files. Before proceeding, you should download and install NetBeans from http://www.NetBeans.org/.
The examples in this book use NetBeans to illustrate proper XML code. For an alternative to NetBeans, see Exchanger XML Lite
Case Studies in XML Implementation
[edit | edit source]XML at United Parcel Service (UPS)
[edit | edit source]“UPS is a service company and it is all about scale and speed,” says Geoff Chalmers, Project Leader at UPS eSolutions Department. In 2003, UPS had $33.5 billion annual revenue and 357,000 employees worldwide. Six percent of the United States' Gross Domestic Product (GDP) on any given day is in the UPS system.
UPS uses technology extensively. The Information Systems department employs 4,000 people. The company's web site has 166 different country home pages and is supported by 44 applications.
UPS delivers around 13 million packages every day, and customers can track these shipments via the UPS Web site, which receives around 200 million hits daily. Nineteen of the applications within ups.com are XML OnLine Tool (Web services) applications.
UPS’s online tools are developed specifically to be integrated with customers’ applications. This makes the customer’s task simpler, easier, and faster. UPS verified the importance of simplicity and speed, via “CampusShip,” a product that has been one of the UPS’s most successful in the last 10 years. UPS CampusShip® is a Web-based, UPS-hosted shipping system. Using an Internet connection, employees can ship their own packages and letters from any desktop, while management maintains overall control of shipping activities. UPS CampusShip® allows simultaneous shipper autonomy and managerial cost-control within the organization. This product has been successful because no installation or software maintenance is required and it is quick to implement. XML Online Tools enabled cheap and fast evolution of CampusShip®.
UPS favors XML especially because it is agnostic; platform and language independent. These features make XML very flexible and powerful. It is also decoupled and scalable. XML has enabled UPS to target a broader market and reduce customer interaction, and thus the cost of customer service. Another positive feature of XML is that it is backward compatible. The adoption of XML has reduced maintenance, implementation, and usage costs significantly within UPS.
However these advantages don’t come without a price. “XML is inefficient in so many ways” says Chalmers. XML unfortunately takes more CPU and bandwidth than the other technologies. Yet bandwidth and CPU are cheap and getting cheaper everyday, so this is a gradually disappearing problem.
Nevertheless, Chalmers also thinks that XML doesn’t work well in databases. He says that it is too wordy and it is an exchange medium rather than a database medium. There were some early attempts to tightly integrate XML and databases. Because databases do supply structure and identification to data as does XML, the value-add of XML-database integration is limited to applying hierarchical structure. On the other hand, if data is to be stored as a blob, then XML makes sense. Another problem that he points out about XML is that business rules cannot be expressed in XML schemas.
Finally, raw XML programming and debugging can be challenging. Therefore, UPS’s enterprise customers are starting to explore the code generators and embedded facilities to be found in .NET and BEA. However, hand coding by experienced in-house engineers is a must for the high availability, scalability, and performance that UPS requires for the UPS OnLine Tools.
XML at EMI Music
[edit | edit source]How is it used?
EMI Music Germany GmbH & Co. KG, a famous German record label, displays information about the artists it is affiliated with on its website. Visitors are able to explore all their audio or video productions. The whole website consists of nearly 20,000 pages that contain information about artists and their products (CD, DVD, LP). Everything is properly linked and systematically grouped.
After all, there is data to be provided for every artist, albums, samples, pictures, descriptions or article codes. The site is updated on a daily basis and is subject to change by a web editor whenever it’s necessary. Now this is a fairly complex and large amount of data to be handled.
This is where XML comes into play. The data, which is stored in a database, has been transformed into XML code. Now an XSLT stylesheet converts this data into HTML code, which can be easily read by any web browser (e.g. Internet Explorer or Firefox).
What's the benefit?
The advantage of XML is that the programming effort is considerably lower as compared to other formats. This is because XML lies at the point of intersection of XSLT and HTML.
It’s also no problem for the web editor to update the website. Using XML makes it easy for the person in charge to deal with this large amount of data.
Going beyond… On the basis of the XML scripts thus far produced by EMI Music, the company could easily produce a PDF-formatted catalog or design i-Mode pages for the current mobile phone generation. Thanks to XML, this can be done with little extra effort.
A brief history of XML
[edit | edit source]In the late 60s Charles Goldfarb, Raymond Lorie and Edward Mosher all working for IBM started to develop GML (Generalized Markup Language), a text formatting language. The language was successfully applied for internal documentation procedures. As it used to be common, the document editing was performed in the batch-mode. GenCode, another procedure to define generic formatting codes for the typesetting systems of various software producers, was developed by the GCA (Graphic Communications Association) at about the same time. Both of these technologies, GML syntactically and GenCode semantically, served as basis for the development of SGML (Standard Generalized Markup Language). The process of standardization started at the U.S. Standardization institute ANSI in the early 80s and in 1986 SGML finally passed as ISO standard ISO2879:1986.
SGML is reckoned to be a complex and comprehensive language (the specification extends 500 pages). However, the success of HTML (Hyper Text Markup Language) proved that the concepts of SGML were appropriate. SGML-based HTML was developed by Tim Berners-Lee in Geneva, in the early 90s in order to illustrate and link documents in the Internet. Meanwhile, HTML developed as the most successful format for all electronical documents. The Internet was originally designed as a space for human-human and human-machine communication but lately machine-machine communication has gained tremendous importance, putting a completely new challenge on the computer languages used.
HTML is a descriptive language for the presentation of documents. The main focus is on the presentation, meaning that an HTML-document mixes the presented data and its formatting instruction. A human being may recognize the displayed semantic by means of the presentation and the context meaning; a machine or (better-said) software is unable to.
In 1996 a team under the guidance of Jos Bosak attending the W3C-consortium was established to make SGML web-suitable. The result was a 30-page specification, which received in February 1998 the status of a "W3C-recommendation" and was named "Extensible Markup Language (XML)".
The most important goals developing XML were:
- XML should be compatible with SGML
- XML should be easy to use in the Internet
- The number of optional characteristics should be minimized
- XML-documents should be easy to generate and human-readable
- XML should be supported by a variety of application
- It should be easy to write programs for XML
- XML should be put into practice on time
In the terminology of markup languages, a description formulated in XML is called a XML-document, albeit the content has nothing to do with text processing.
Why is this book not an XML document?
[edit | edit source]If you have accepted the ideas presented in this chapter, the question is very pertinent. The simple answer is that we have been unable to find the technology to support the creation of an open text book in XML. We need several pieces of technology
- An XML language for describing a book. DocBook is such a language, but the structure of a book is quite complex, and DocBook (reflecting this complexity) cannot be quickly mastered
- A Wiki that works with a language such as DocBook
- A XML stylesheet that converts XML into HTML for displaying the book's content
There is a project to create WikiMl (Wiki MarkupLanguage), and this might be used at some point.
References
[edit | edit source]Initiating author Richard T. Watson, University of Georgia
A single entity
![]() |
XML - Managing Data Exchange
|
Related Topics
|
Get Involved
|
Previous Chapter | Next Chapter |
← Introduction to XML | Basic data structures → |
Learning objectives
|
Introduction
[edit | edit source]In this chapter, we start to practice working with XML using XML documents, schemas, and stylesheets. An XML document organizes data and information in a structured, hierarchical format. An XML schema provides standards and rules for the structure of a given XML document. An XML schema also enables data transfer. An XSL (XML stylesheet) allows unique presentations of the material found within an XML document.
In the first chapter, Introduction to XML, you learned what XML is, why it is useful, and how it is used. So, now you want to create your very own XML documents. In this chapter, we will show you the basic components used to create an XML document. This chapter is the foundation for all subsequent chapters--it is a little lengthy, but don't be intimidated. We will take you through the fundamentals of XML documents.
This chapter is divided into three parts:
- XML Document
- XML Schema
- XML Stylesheets (XSL)
As you learned in the previous chapter, the XML Schema and Stylesheet are essentially specialized XML Documents. Within each of these three parts we will examine the layout and components required to create the document. There are links at the end of the XML document, schema, and stylesheet sections that show you how to create the documents using an XML editor. At the bottom of the page there is a link to Exercises for this chapter and a link to the Answers.
The first thing you will need before starting to create XML documents is a problem--something you want to solve by using XML to store and share data or information. You need some entity you can collect information about and then access in a variety of formats. So, we created one for you.
To develop an XML document and schema, start with a data model depicting the reality of the actual data that is exchanged. Once a high fidelity model has been created, the data model can be readily converted to an XML document and schema. In this chapter, we start with a very simple situation and in successive chapters extend the complexity to teach you more features of XML.
Our starting point is a single entity, CITY, which is shown in the following figure. While our focus is on this single entity, to map CITY to an XML schema, we need to have an entity that contains CITY. In this case, we have created TOURGUIDE. Think of a TOURGUIDE as containing many cities, and in this case TOURGUIDE has no attributes nor an identifier. It is just a container for data about cities.
Exhibit 1: Data model - Tourguide

XML document
[edit | edit source]An XML document is a file containing XML code and syntax. XML documents have an .xml file extension.
We will examine the features & components of the XML document.
- Prologue (XML Declaration)
- Elements
- Attributes
- Rules to follow
- Well-formed & Valid XML documents
Below is a sample XML document using our TourGuide model. We will refer to it as we describe the parts of an XML document.
Exhibit 2: XML document for city entity
<?xml version="1.0" encoding="UTF-8"?>
<tourGuide xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:noNamespaceSchemaLocation='city.xsd'>
<city>
<cityName>Belmopan</cityName>
<adminUnit>Cayo</adminUnit>
<country>Belize</country>
<population>11100</population>
<area>5</area>
<elevation>130</elevation>
<longitude>88.44</longitude>
<latitude>17.27</latitude>
<description>Belmopan is the capital of Belize</description>
<history>Belmopan was established following the devastation of the
former capital, Belize City, by Hurricane Hattie in 1965. High
ground and open space influenced the choice and ground-breaking
began in 1966. By 1970 most government offices and operations had
already moved to the new location.
</history>
</city>
<city>
<cityName>Kuala Lumpur</cityName>
<adminUnit>Selangor</adminUnit>
<country>Malaysia</country>
<population>1448600</population>
<area>243</area>
<elevation>111</elevation>
<longitude>101.71</longitude>
<latitude>3.16</latitude>
<description>Kuala Lumpur is the capital of Malaysia and the largest
city in the nation</description>
<history>The city was founded in 1857 by Chinese tin miners and
preceded Klang. In 1880 the British government transferred their
headquarters from Klang to Kuala Lumpur, and in 1896 it became the
capital of Malaysia.
</history>
</city>
<city>
<cityName>Winnipeg</cityName>
<adminUnit>St. Boniface</adminUnit>
<country>Canada</country>
<population>618512</population>
<area>124</area>
<elevation>40</elevation>
<longitude>97.14</longitude>
<latitude>49.54</latitude>
<description>Winnipeg has two seasons. Winter and Construction.</description>
<history>The city was founded by people at the forks (Fort Garry)
trading in pelts with the Hudson Bay Company. Ironically,
The Bay was bought by America.
</history>
</city>
</tourGuide>
Prologue (XML declaration)
[edit | edit source]The XML document starts off with the prologue. The prologue informs both a reader and the computer of certain specifications that make the document XML compliant. The first line is the XML declaration (and the only line in this basic XML document).
Exhibit 3: XML document - prologue
<?xml version="1.0" encoding="UTF-8"?>
xml = this is an XML document
version="1.0" = the XML version (XML 1.0 is the W3C-recommended version)
encoding="UTF-8" = the character encoding used in the document - UTF 8 corresponds to 8-bit encoded Unicode characters (i.e. the standard way to encode international documents) - Unicode provides a unique number for every character.
Another potential attribute of the XML declaration:
standalone="yes" = the dependency of the document ('yes' indicates that the document does not require another document to complete content)
Elements
[edit | edit source]The majority of what you see in the XML document consists of XML elements. Elements are identified by their tags that open with < or </ and close with > or />. The start tag looks like this: <element attribute="value">, with a left angle bracket (<) followed by the element type name, optional attributes, and finally a right angle bracket (>). The end tag looks like this: </element>, similar to the start tag, but with a slash (/) between the left angle bracket and the element type name, and no attributes.
When there's nothing between a start tag and an end tag, XML allows you to combine them into an empty element tag, which can include everything a start tag can: <img src="Belize.gif" />. This one tag must be closed with a slash and right angle bracket (/>), so that it can be distinguished from a start tag.
The XML document is designed around a major theme, an umbrella concept covering all other items and subjects; this theme is analyzed to determine its component parts, creating categories and subcategories. The major theme and its component parts are described by elements. In our sample XML document, 'tourGuide' is the major theme; 'city' is a category; 'population' is a subcategory of 'city'; and the hierarchy may be carried even further: 'males' and 'females' could be subcategories of 'population'. Elements follow several rules of syntax that will be described in the Rules to Follow section.
We left out the attributes within the <tourGuide> start tag — that part will be explained in the XML Schema section.
Exhibit 4: Elements of the city entity XML document
<tourGuide>
<city>
<cityName>Belmopan</cityName>
<adminUnit>Cayo</adminUnit>
<country>Belize</country>
<population>11100</population>
<area>5</area>
<elevation>130</elevation>
<longitude>88.44</longitude>
<latitude>17.27</latitude>
<description>Belmopan is the capital of Belize</description>
<history>Belmopan was established following the devastation of the
former capital, Belize City, by Hurricane Hattie in 1965. High
ground and open space influenced the choice and ground-breaking
began in 1966. By 1970 most government offices and operations had
already moved to the new location.
</history>
</city>
</tourGuide>
Element hierarchy
[edit | edit source]- root element - This is the XML document's major theme element. Every document must have exactly one and only one root element. All other elements are contained within this one root element. The root element follows the XML declaration. In our example, <tourGuide> is the root element.
- parent element - This is any element that contains other elements, the child elements. In our example, <city> is a parent element.
- child element - This is any element that is contained within another element, the parent element. In our example, <population> is a child element of <city>.
- sibling element - These are elements that share the same parent element. In our example, <cityName>, <adminUnit>, <country>, <population>, <area>, <elevation>, <longitude>, <latitude>, <description>, and <history> are all sibling elements.
Attributes
[edit | edit source]Attributes aid in modifying the content of a given element by providing additional or required information. They are contained within the element's opening tag. In our sample XML document code we could have taken advantage of attributes to specify the unit of measure used to determine the area and the elevation (it could be feet, yards, meters, kilometers, etc.); in this case, we could have called the attribute 'measureUnit' and defined it within the opening tag of 'area' and 'elevation'.
<adminUnit class="state">Cayo</adminUnit>
<adminUnit class="region">Selangor</adminUnit>
The above attribute example can also be written as:
1. using child elements
<adminUnit>
<class>state</class>
<name>Cayo</name>
</adminUnit>
<adminUnit>
<class>region</class>
<name>Selangor</name>
</adminUnit>
2. using an empty element
<adminUnit class="state" name="Cayo" />
<adminUnit class="region" name="Selangor" />
Attributes can be used to:
- provide more information that is not defined in the data
- define a characteristic of the element (size, color, style)
- ensure the inclusion of information about an element in all instances
Attributes can, however, be a bit more difficult to manipulate and they have some constraints. Consider using a child element if you need more freedom.
Rules to follow
[edit | edit source]These rules are designed to aid the computer reading your XML document.
- The first line of an XML document must be the XML declaration (the prologue).
- The main theme of the XML document is established in the root element and all other elements must be contained within the opening and closing tags of this root element.
- Every element must have an opening tag and a closing tag - no exceptions
(e.g. <element>data stuff</element>).
- Tags must be nested in a particular order
=> the parent element's opening and closing tags must contain all of its child elements' tags; in this way, you close first the tag that was opened last:
<parentElement> <childElement1>data</childElement1> <childElement2> <subChildElementA>data</subChildElementA> <subChildElementB>data</subChildElementB> </childElement2> <childElement3>data</childElement3> </parentElement>
- Attribute values should have quotation marks around them and no spaces.
- Empty tags or empty elements must have a space and a slash (/) at the end of the tag.
- Comments in the XML language begin with "<!--" and end with "-->".
XML Element Naming Convention
[edit | edit source]Any name can be used but the idea is to make names meaningful to those who might read the document.
- XML elements may only start with either a letter or an underscore character.
- The name must not start with the string "xml" which is reserved for the XML specification.
- The name may not contain spaces.
- The ":" should not be used in element names because it is reserved to be used for namespaces (This will be covered in more detail in a later chapter).
- The name may contain a mixture of letters, numbers, or other characters.
XML documents often have a corresponding database. The database will contain fields which correspond to elements in the XML document. A good practice is to use the naming rules of your database for the elements in the XML documents.
DTD (Document Type Definition) Validation - Simple Example
[edit | edit source]Simple Internal DTD
[edit | edit source] <?xml version="1.0"?>
<!DOCTYPE cdCollection [
<!ELEMENT cdCollection (cd)>
<!ELEMENT cd (title, artist, year)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT artist (#PCDATA)>
<!ELEMENT year (#PCDATA)>
]>
<cdCollection>
<cd>
<title>Dark Side of the Moon</title>
<artist>Pink Floyd</artist>
<year>1973</year>
</cd>
</cdCollection>
Every element that will be used MUST be included in the DTD. Don’t forget to include the root element, even though you have already specified it at the beginning of the DTD. You must specify it again, in an <!ELEMENT> tag. <!ELEMENT cdCollection (cd)> The root element, <cdCollection>, contains all the other elements of the document, but only one direct child element: <cd>. Therefore, you need to specify the child element (only direct child elements need to be specified) in the parentheses. <!ELEMENT cd (title, artist, year)> With this line, we define the <cd> element. Note that this element contains the child elements <title>, <artist>, and <year>. These are spelled out in a particular order. This order must be followed when creating the XML document. If you change the order of the elements (with this particular DTD), the document won’t validate. <!ELEMENT title (#PCDATA)> The remaining three tags, <title>, <artist>, and <year> don’t actually contain other tags. They do however contain some text that needs to be parsed. You may remember from an earlier lecture that this data is called Parsed Character Data, or #PCDATA. Therefore, #PCDATA is specified in the parentheses. So this simple DTD outlines exactly what you see here in the XML file. Nothing can be added or taken away, as long as we stick to this DTD. The only thing you can change is the #PCDATA text part between the tags.
Adding complexity
[edit | edit source]There may be times when you will want to put more than just character data, or more than just child elements into a particular element. This is referred to as mixed content. For example, let’s say you want to be able to put character data OR a child element, such as the <b> tag into a <description> element:
<!ELEMENT description (#PCDATA | b | i )*>
This particular arrangement allows us to use PCDATA, the <b> tag, or the <i> tag all at once. One particular caveat though, is that if you are going to mix PCDATA and other elements, the grouping must be followed by the asterisk (*) suffix. This declaration allows us to now add the following to the XML document (after defining the individual elements of course)
<cd>
<title>Love. Angel. Music. Baby</title>
<artist>Gwen Stefani</artist>
<year>2004</year>
<genre>pop</genre>
<description>
This is a great album from former
<nowiki><i>No Doubt</i> singer <b>Gwen Stephani</b>.</nowiki>
</description>
</cd>
With attributes this is done a little differently than with elements. Please see following example:
<cd remaster_date=”1992”>
<title>Dark Side of the Moon</title>
<artist>Pink Floyd</artist>
<year>1973</year>
</cd>
In order for this to validate, it must be specified in the DTD. Attribute content models are specified with:
<!ATTLIST element_name attribute_name attribute_type default_value>
Let’s use this to validate our CD example:
<!ATTLIST cd remaster_date CDATA #IMPLIED>
Choices
[edit | edit source]<ATTLIST person gender (male|female) “male”>
Grouping Attributes for an Element
[edit | edit source]If a particular element is to have many different attributes, group them together like so:
<!ATTLIST car horn CDATA #REQUIRED seats CDATA #REQUIRED steeringwheel CDATA #REQUIRED price CDATA #IMPLIED>
Adding STATIC validation, for items that must have a certain value
[edit | edit source]<!ATTLIST classList classNumber CDATA #IMPLIED building (UWINNIPEG_DCE|UWINNIPEG_MAIN) "UWINNIPEG_MAIN" originalDeveloper CDATA #FIXED "Khal Shariff">
Suffixes=
[edit | edit source]So what happens with our last example with the CD collection, when we want to add more CDs? With the current DTD, we cannot add any more CDs without getting an error. Try it and see. When you specify a child element (or elements) the way we did, only one of each child element can be used. Not very suitable for a CD collection is it? We can use something called suffixes to add functionality to the <!ELEMENT> tag. Suffixes are added to the end of the specified child element(s). There are 3 main suffixes that can be used:
- ( No suffix ): Only 1 child can be used.
- ( + ): One or more elements can be used.
- ( * ): Zero or more elements can be used.
- ( ? ): Zero or one element may be used.
Validating for multiple children with a DTD
[edit | edit source]So in the case of our CD collection XML file, we can add more CDs to the list by adding a + suffix:
<!ELEMENT cd_collection(cd+)>
Using more internal formatting tags
[edit | edit source]Bold tags, B's for example are also defined in the DTD as elements, that are optional like thus:
<ELEMENT notes (#PCDATA | b | i)*> <!ELEMENT b (#PCDATA)*> <!ELEMENT i (#PCDATA)*> ]>
_______________
<classList classNumber="303" building="UWINNIPEG_DCE" originalDeveloper="Khal Shariff"> <student> <firstName>Kenneth </firstName> <lastName>Branaugh </lastName> <studentNumber> </studentNumber> <notes><b>Excellent </b>, Kenneth is doing well. </notes> etc
Case Study on BMEcat[edit | edit source]One of the first major national projects for the use of XML as a B2B exchange format was initiated by the federal association for material management, purchasing and logistics (BME) in cooperation with leading German companies, e.g. Bayer, BMW, SAP and Siemens. They all created a standard for the exchange of product catalogues. This project was named BMEcat. The result of this initiative is a DTD collection for the description of product catalogues and related transactions (new catalogue, updating of product data and updating of prices). Companies operating in the electronic commerce (suppliers, purchasing companies and market places) exchange increasingly large amounts of data. They quickly reach their limits here by the variety of data exchange formats. The BMEcat solution creates a basis for a straightforward transfer of catalogue data from various data formats. This lays the foundation to bringing forward the goods traffic through the Internet in Germany. The use of the BMEcat reduces the costs for all parties as standard interfaces can be used. The XML-based standard BMEcat was successfully implemented in many projects. Nowadays a variety of companies applies BMEcat and use it for the exchange of their product catalogs in this established standard.
A BMEcat catalogue (Version 1.2) consists of the following main elements: CATALOG This element contains the essential information of a shopping catalog, e.g. language version and validity. BMEcat expects exactly one language per catalog. SUPPLIER This element includes identification and address of the catalog suppliers. BMEcat expects exactly one supplier per catalog. BUYER This element contains the name and address of the catalogue recipient. BMEcat expects no more than one recipient per catalog. AGREEMENT This element contains one or more framework agreement IDs associated with the appropriate validity period. BMEcat expects all prices of a catalogue belonging to the contract mentioned above. CLASSIFICATION SYSTEM This element allows the full transfer of one or more classification systems, including feature definitions and key words. CATALOG GROUP SYSTEM This element originates from version 1.0. It is mainly used for the transfer of tree-structures which facilitate the navigation of a user in the target system (Browser). ARTICLE (since 2005 PRODUCT) This element represents a product. It contains a set of standard attributes. ARTICLE PRICE (since 2005 PRODUCT PRICE) This element represents a price. The support of different pricing models is very powerful in comparison with other exchange formats. Season prices, country prices, different currencies and different validity periods, etc. will be supported. ARTICLE FEATURE (since 2005 PRODUCT FEATURE) This element allows the transfer of characteristic values. You can either record predefined group characteristics or individual product characteristics. VARIANT This element allows listing of product variants, without having to duplicate them. However, the variations of BMEcat only apply to individual changes in value, leading to a change of Article ID. Otherwise there can’t exist any dependences on other attributes (especially at prices). MIME This element includes any number of additional documents such as product images, data sheets, or websites. ARTICLE REFERENCE (since 2005 REFERENCE PRODUCT) This element allows cross-referencing between articles within a catalogue as well as between catalogues. These references may used restrictedly for mapping product bundles. USER DEFINED EXTENSION This element enables transportation of data at the outside the BMEcat standards. The transmitter and receiver have to be coordinated. You can find a typical BMEcat file here. |
ONLINE Validator
[edit | edit source]Well-formed and valid XML
[edit | edit source]Well-formed XML - An XML document that correctly abides by the rules of XML syntax.
Valid XML - An XML document that adheres to the rules of an XML schema (which we will discuss shortly). To be valid an XML document must first be well-formed.
A Valid XML Document must be Well-formed. But, a Well-formed XML Document might not be valid - in other words, a well-formed XML document, that meets the criteria for XML syntax, might not meet the criteria for the XML schema, and will therefore be invalid.
For example, think of the situation where your XML document contains the following (for this schema):
<city> <cityName>Boston</cityName> <country>United States</country> <adminUnit>Massachusetts</adminUnit> : : : </city>
Notice that the elements do not appear in the correct sequence according to the schema (cityName, adminUnit, country). The XML document can be validated (using validation software) against its declared schema – the validation software would then catch the out of sequence error.
Using an XML Editor
[edit | edit source]Check chapter XML Editor for instructions on how to start an XML editor. Once you have followed the steps to get started you can copy the code in the sample XML document and paste it into the XML editor. Then check your results. Is the XML document well-formed? Is the XML document valid? (you will need to have copied and pasted the schema in order to validate - we will look at schemas next)
XML schema
[edit | edit source]An XML schema is an XML document. XML schemas have an .xsd file extension.
An XML schema is used to govern the structure and content of an XML document by providing a template for XML documents to follow in order to be valid. It is a guide for how to structure your XML document as well as indicating your XML document's components (elements and attributes - and their relationships). An XML editor will examine an XML document to ensure that it conforms to the specifications of the XML schema it is written against - to ensure it is valid.
XML schemas engender confidence in data transfer. With schemas, the receiver of data can feel confident that the data conforms to expectations. The sender and the receiver have a mutual understanding of what the data represent.
Because an XML schema is an XML document, you use the same language - standard XML markup syntax - with elements and attributes specific to schemas.
A schema defines:
- the structure of the document
- the elements
- the attributes
- the child elements
- the number of child elements
- the order of elements
- the names and contents of all elements
- the data type for each element
For more detailed information on XML schemas and reference lists of: Common XML Schema Primitive Data Types, Summary of XML Schema Elements, Schema Restrictions and Facets for data types, and Instance Document Attributes, click on this wikibook link => http://en.wikibooks.org/wiki/XML_Schema
Schema reference
[edit | edit source]This is the part of the XML Document that references an XML Schema:
Exhibit 5: XML document's schema reference
<tourGuide
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:noNamespaceSchemaLocation='city.xsd'>
This is the part we left out when we described the root element in the basic XML document from the previous section. The additional attributes of the root element <tourGuide> reference the XML schema (it is the schemaLocation attribute).
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' - references the W3C Schema-instance namespace
xsi:noNamespaceSchemaLocation='city.xsd' - references the XML schema document (city.xsd)
Schema document
[edit | edit source]Below is a sample XML schema using our TourGuide model. We will refer to it as we describe the parts of an XML schema.
Exhibit 6: XML schema document for city entity
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="unqualified">
<xsd:element name="tourGuide">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="city" type="cityDetails" minOccurs = "1" maxOccurs="unbounded" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="cityDetails">
<xsd:sequence>
<xsd:element name="cityName" type="xsd:string"/>
<xsd:element name="adminUnit" type="xsd:string"/>
<xsd:element name="country" type="xsd:string"/>
<xsd:element name="population" type="xsd:integer"/>
<xsd:element name="area" type="xsd:integer"/>
<xsd:element name="elevation" type="xsd:integer"/>
<xsd:element name="longitude" type="xsd:decimal"/>
<xsd:element name="latitude" type="xsd:decimal"/>
<xsd:element name="description" type="xsd:string"/>
<xsd:element name="history" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
<!--
Note: Latitude and Longitude are decimal data types.
The conversion is from the usual form (e.g., 50º 17' 35")
to a decimal by using the formula degrees+min/60+secs/3600.
-->
Prolog
[edit | edit source]Remember that the XML schema is essentially an XML document and therefore must begin with the prolog, which in the case of a schema includes:
- the XML declaration
- the schema element declaration
The XML declaration:
<?xml version="1.0" encoding="UTF-8"?>
The schema element declaration:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
The schema element is similar to a root element - it contains all other elements in the schema.
Attributes of the schema element include:
xmlns - XML NameSpace - the URL for the site that describes the XML elements and data types used in the schema.
You can find more about namespaces here => Namespace.
xmlns:xsd - All the elements and attributes with the 'xsd' prefix adhere to the vocabulary designated in the given namespace.
elementFormDefault - elements from the target namespace are either required or not required to be qualified with the namespace prefix. This is mostly useful when more than one namespace is referenced. In this case, 'elementFormDefault' must be qualified, because you must indicate which namespace you are using for each element. If you are referencing only one namespace, then 'elementFormDefault' can be unqualified. Perhaps, using qualified as the default is most prudent, this way you do not accidentally forget to indicate which namespace you are referencing.
Element declarations
[edit | edit source]Define the elements in the schema.
Include:
- the element name
- the element data type (optional)
Basic element declaration format: <xsd:element name="name" type="type">
Simple type
[edit | edit source]declares elements that:
- do NOT have Child Elements
- do NOT have Attributes
example: <xsd:element name="cityName" type="xsd:string" />
Default Value
If an element is not assigned a value then the default value is assigned.
example: <xsd:element name="description" type="xsd:string" default="really cool place to visit!" />
Fixed Value
An attribute that is defined as fixed must be empty or contained the specified fixed value. No other values are allowed.
example: <xsd:element name="description" type="xsd:string" '''fixed="you must visit this place - it is awesome!"''' />
Complex type
[edit | edit source]declares elements that:
- can have Child Elements
- can have Attributes
examples:
1. The root element 'tourGuide' contains a child element 'city'. This is shown here:
Nameless complex type
<xsd:element name="tourGuide">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="city" type="cityDetails" minOccurs = "1" maxOccurs="unbounded" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Occurrence Indicators:
- minOccurs = the minimum number of times an element can occur (here it is 1 time)
- maxOccurs = the maximum number of times an element can occur (here it is an unlimited number of times, 'unbounded')
2. The parent element 'city' contains many child elements: 'cityName', 'adminUnit', 'country',
'population', etc. Why does this complex element set not start with the line: <xsd:element name="city" type="cityDetails">
? The element 'city' was already defined above within the complex element 'tourGuide' and it was given the type, 'cityDetails'. This data type, 'cityDetails', is utilized here in identifying the sequence of child elements for the parent element 'city'.
Named Complex Type - and therefore can be reused in other parts of the schema
<xsd:complexType name="cityDetails">
<xsd:sequence>
<xsd:element name="cityName" type="xsd:string"/>
<xsd:element name="adminUnit" type="xsd:string"/>
<xsd:element name="country" type="xsd:string"/>
<xsd:element name="population" type="xsd:integer"/>
<xsd:element name="area" type="xsd:integer"/>
<xsd:element name="elevation" type="xsd:integer"/>
<xsd:element name="longitude" type="xsd:decimal"/>
<xsd:element name="latitude" type="xsd:decimal"/>
<xsd:element name="description" type="xsd:string"/>
<xsd:element name="history" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
The <xsd:sequence> tag indicates that the child elements must appear in the order, the sequence, specified here.
Compare the sample XML Schema and the sample XML Document - try to observe patterns in the code and how the XML Schema sets up the XML Document.
3. Elements that have attributes are also designated as complex type.
a. this XML Document line: <adminUnit class="state" name="Cayo" />
would be defined in the XML Schema as:
<xsd:element name="adminUnit">
<xsd:complexType>
<xsd:attribute name="class" type="xsd:string" />
<xsd:attribute name="name" type="xsd:string" />
</xsd:complexType>
</xsd:element>
b. this XML Document line: <adminUnit class="state">Cayo</adminUnit>
would be defined in the XML Schema as:
<xsd:element name="adminUnit">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:string">
<xsd:attribute name="class" type="xsd:string" />
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
Attribute declarations
[edit | edit source]Attribute declarations are used in complex type definitions. We saw some attribute declarations in the third example of the Complex Type Element.
<xsd:attribute name="class" type="xsd:string" />
Data type declarations
[edit | edit source]These are contained within element and attribute declarations as: type=" " .
Common XML Schema Data Types
XML schema has a lot of built-in data types. The most common types are:
string | a string of characters |
decimal | a decimal number |
integer | an integer |
boolean | the values true or false or 1 or 0 |
date | a date, the date pattern can be specified such as YYYY-MM-DD |
time | a time of day, the time pattern can be specified such as HH:MM:SS |
dateTime | a date and time combination |
anyURI | if the element will contain a URL |
For an entire list of built-in simple data types see http://www.w3.org/TR/xmlschema-2/#built-in-datatypes
Using an XML Editor => XML Editor
This link will take you to instructions on how to start an XML editor. Once you have followed the steps to get started you can copy the code in the sample XML schema document and paste it into the XML editor. Then check your results. Is the XML schema well-formed? Is the XML schema valid?
XML stylesheet (XSL)
[edit | edit source]An XML Stylesheet is an XML Document. XML Stylesheets have an .xsl file extension.
The eXtensible Stylesheet Language (XSL) provides a means to transform and format the contents of an XML document for display. Since an XML document does not contain tags a browser understands, such as HTML tags, browsers cannot present the data without a stylesheet that contains the presentation information. By separating the data and the presentation logic, XSL allows people to view the data according to their different needs and preferences.
The XSL Transformation Language (XSLT) is used to transform an XML document from one form to another, such as creating an HTML document to be viewed in a browser. An XSLT stylesheet consists of a set of formatting instructions that dictate how the contents of an XML document will be displayed in a browser, with much the same effect as Cascading Stylesheets (CSS) do for HTML. Multiple views of the same data can be created using different stylesheets. The output of a stylesheet is not restricted to a browser.
During the transformation process, XSLT analyzes the XML document and converts it into a node tree – a hierarchical representation of the entire XML document. Each node represents a piece of the XML document, such as an element, attribute or some text content. The XSL stylesheet contains predefined “templates” that contain instructions on what to do with the nodes. XSLT will use the match attribute to relate XML element nodes to the templates, and transform them into the resulting document.
Exhibit 7: XML stylesheet document for city entity
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:template match="/">
<html>
<head>
<title>Tour Guide</title>
</head>
<body>
<h2>Cities</h2>
<xsl:apply-templates select="tourGuide"/>
</body>
</html>
</xsl:template>
<xsl:template match="tourGuide">
<xsl:for-each select="city">
<br/><xsl:value-of select="continentName"/><br/>
<xsl:value-of select="cityName"/><br/>
<xsl:text>Population: </xsl:text>
<xsl:value-of select='format-number(population, "##,###,###")'/><br/>
<xsl:value-of select="country"/>
<br/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The output of the city.xsl stylesheet in Table 2-3 will look like the
following:
Cities
Europe
Madrid
Population: 3,128,600
Spain
Asia
Shanghai
Population: 18,880,000
You will notice that the stylesheet consists of HTML to inform the media tool (a web browser) of the presentation design. If you do not already know HTML this may seem a little confusing. Online resources such as the W3Schools tutorials can help with the basic understanding you will need =>(http://www.w3schools.com/html/default.asp).
Incorporated within the HTML is the XML that supplies the data, the information, contained within our XML document. The XML of the stylesheet indicates what information will be displayed and how. So, the HTML constructs a display and the XML plugs in values within that display. XSL is the tool that transforms the information into presentational form, but at the same time keeps the meaning of the data.
XML at Bertelsmann - a case study The German Bertelsmann Inc. is a privately owned media conglomerate operating in 56 countries. It has interests in such businesses as TV broadcast (RTL), magazine (Gruner & Jahr), books (Random House) etc. In 2005 its 89 000 employees generated 18 billion € of revenue.
|
Prolog
[edit | edit source]- the XML declaration;
- the stylesheet declaration;
- the namespace declaration;
- the output document format.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
The XML declaration
<?xml version="1.0" encoding="UTF-8"?>
The stylesheet & namespace declarations
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
- identifies the document as an XSL style sheet;
- identifies the version number;
- refers to the W3C XSL namespace - the URL for the site that describes the XML elements and data types used in the schema. You can find more about namespaces here => Namespace. Every time the xsl: prefix is used it references the given namespace.
The output document format
<xsl:output method="html"/>
This element designates the format of the output document and must be a child element of <xsl:stylesheet>
Templates
[edit | edit source]The <xsl:template> element is used to create templates that describe how to display elements and their content. Above, in the XSL introduction, we mentioned that XSL breaks up the XML document into nodes and works on individual nodes. This is done with templates. Each template within an XSL describes a single node. To identify which node a given template is describing, use the 'match' attribute. The value given to the 'match' attribute is called a pattern. Remember: (node tree – a hierarchical representation of the entire XML document. Each node represents a piece of the XML document, such as an element, attribute or some text content). Wherever there is branching in the node tree, there is a node. <xsl:template> defines the start of a template and contains rules to apply when a specified node is matched.
the match attribute
<xsl:template match="/">
This template match attribute associates the XML document root (/), the whole branch of the XML source document, with the HTML document root. Contained within this template element is the typical HTML markup found at the beginning of any HTML document. This HTML is written to the output. The XSL looks for the root match and then outputs the HTML, which the browser understands.
<xsl:template match="tourGuide">
This template match attribute associates the element 'tourGuide' with the display rules described within this element.
Elements
[edit | edit source]Elements specific to XSL:
XSL Element | Meaning |
(from our sample XSL) | |
<xsl:text> | Prints the actual text found between this element's tags |
---|---|
<xsl:value-of> | This element is used with a 'select' attribute to look up the value of the node selected and plug it into the output. |
<xsl:for-each> | This element is used with a 'select' attribute to handle elements that repeat by looping through all the nodes in the selected node set. |
<xsl:apply-templates> | This element will apply a template to a node or nodes. If it uses a 'select' attribute then the template will be applied only to the selected child node(s) and can specify the order of child nodes. If no 'select' attribute is used then the template will be applied to the current node and all its child nodes as well as text nodes. |
For more XSL elements => http://www.w3schools.com/xsl/xsl_w3celementref.asp .
Language-Specific Validation and Transformation Methods
[edit | edit source]PHP Methods of XML Dom Validation
[edit | edit source]Using the DOM DocumentObjectModel to validate XML and with a DTD DocumentTypeDeclaration and the PHP language on a server and more http://wiki.cc/php/Dom_validation
Browser Methods
[edit | edit source]Place this line of code in your .xml document after the XML declaration (prologue).
<?xml-stylesheet type="text/xsl" href="tourGuide.xsl"?>
PHP XML Production
[edit | edit source] <?php
$xmlData = "";
mysql_connect('localhost','root','')
or die('Failed to connect to the DBMS');
// make connection to database
mysql_select_db('issd')
or die('Failed to open the requested database');
$result = mysql_query('SELECT * from students') or die('Query to like get the records failed');
if (mysql_num_rows($result)<1){
die ('');
}
$xmlString = "<classlist>\n";
$xmlString .= "\t<student>";
while ($row = mysql_fetch_array($result)) {
$xmlString .= "
\t<firstName>
".$row['firstName']."
</firstName>\n
\t<lastName>
".$row['lastName']."
\t</lastName>\n";
}
$xmlString .= "</student>\n";
$xmlString .= "</classlist>";
echo $xmlString;
$myFile = "classList.xml"; //any file
$fh = fopen($myFile, 'w') or die("can't open file"); //create filehandler
fwrite($fh, $xmlString); //write the data into the file
fclose($fh); //ALL DONE!
?>
PHP Methods of XSLT Transformation
[edit | edit source]This one is good for PHP5 and wampserver (latest). Please ensure that *xsl* is NOT commented out in the php.ini file.
<?php
// Load the XML source
$xml = new DOMDocument;
$xml->load('tourguide.xml');
$xsl = new DOMDocument;
$xsl->load('tourguide.xsl');
// Configure the transformer
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl); // attach the xsl rules
echo $proc->transformToXML($xml);
?>
Example 1, Using within PHP itself (use phpInfo() function to check XSLT extension; enable if needed)
This example might produce XHTML. Please note it could produce anything defined by the XSL.
<?php
$xhtmlOutput = xslt_create();
$args = array();
$params = array('foo' => 'bar');
$theResult = xslt_process(
$xhtmlOutput,
'theContentSource.xml',
'theTransformationSource.xsl',
null,
$args,
$params
);
xslt_free($xhtmlOutput); // free that memory
// echo theResult or save it to a file or continue processing (perhaps instructions)
?>
Example 2:
<?php
if (PHP_VERSION >= 5) {
// Emulate the old xslt library functions
function xslt_create() {
return new XsltProcessor();
}
function xslt_process($xsltproc,
$xml_arg,
$xsl_arg,
$xslcontainer = null,
$args = null,
$params = null) {
// Start with preparing the arguments
$xml_arg = str_replace('arg:', '', $xml_arg);
$xsl_arg = str_replace('arg:', '', $xsl_arg);
// Create instances of the DomDocument class
$xml = new DomDocument;
$xsl = new DomDocument;
// Load the xml document and the xsl template
$xml->loadXML($args[$xml_arg]);
$xsl->loadXML($args[$xsl_arg]);
// Load the xsl template
$xsltproc->importStyleSheet($xsl);
// Set parameters when defined
if ($params) {
foreach ($params as $param => $value) {
$xsltproc->setParameter("", $param, $value);
}
}
// Start the transformation
$processed = $xsltproc->transformToXML($xml);
// Put the result in a file when specified
if ($xslcontainer) {
return @file_put_contents($xslcontainer, $processed);
} else {
return $processed;
}
}
function xslt_free($xsltproc) {
unset($xsltproc);
}
}
$arguments = array(
'/_xml' => file_get_contents("xml_files/201945.xml"),
'/_xsl' => file_get_contents("xml_files/convertToSql_new2.xsl")
);
$xsltproc = xslt_create();
$html = xslt_process(
$xsltproc,
'arg:/_xml',
'arg:/_xsl',
null,
$arguments
);
xslt_free($xsltproc);
print $html;
?>
PHP file writing code
[edit | edit source] $myFile = "testFile.xml"; //any file
$fh = fopen($myFile, 'w') or die("can't open file"); //create filehandler
$stringData = "<foo>\n\t<bar>\n\thello\n"; // get a string ready to write
fwrite($fh, $stringData); //write the data into the file
$stringData2 = "\t</bar>\n</foo>";
fwrite($fh, $stringData2); //write more data into the file
fclose($fh); //ALL DONE!
XML Colors
[edit | edit source]For use in your stylesheet: these colors can be used for both background and font
http://www.w3schools.com/html/html_colors.asp
http://www.w3schools.com/html/html_colorsfull.asp
http://www.w3schools.com/html/html_colornames.asp
Using an XML Editor => XML Editor
This link will take you to instructions on how to start an XML editor. Once you have followed the steps to get started you can copy the code in the sample XML stylesheet document and paste it into the XML editor. Then check your results. Is the XML stylesheet well-formed?
XML at Thomas Cook - a case study
[edit | edit source]
As the leading travel company and most widely recognized brands in the world, Thomas Cook works across the travel value chain - airlines, hotels, tour operators, travel and incoming agencies, providing its customers with the right product in all market segments across the globe. Employing over 11,000 staff, the Group has 33 tour operators, around 3,600 travel agencies, a fleet of 80 aircraft and a workforce numbering some 26,000. Thomas Cook operates throughout a network of 616 locations in Europe and overseas. The company is now the second largest travel group in Europe and the third largest in the world. As Thomas Cook sells other companies´ products, ranging from packaged holidays to car hires, it needs to regularly change its online brochure. Before Thomas Cook started using XML, it put information into HTML format, and would take upto six weeks to get an online brochure up and running online. XML helps do this job in about three days. This helps provide all of Thomas Cook´s current and potential customers and its various agencies in different geographical locations with updated information, instead of having to wait six weeks for new information to be released.
|
Summary
[edit | edit source]From the previous chapter Introduction to XML, you have learned the need for data exchange and the usefulness of XML in data exchange. In this chapter, you have learned more about the three major XML files: the XML document, the XML schema, and the XML stylesheet. You learned the correct documentation required for each type of file. You learned basic rules of syntax applicable for all XML documents. You learned how to integrate the three types of XML documents. And you learned the definition and distinction between a well-formed document and a valid document. By following the XML Editor links, you were able to see the results of the sample code and learn how to use an XML Editor.
Below are Exercises and Answers for further practice. Good Luck! |
XML SGML Dan Connelly RSS XML Declaration parent child sibling element attribute
*Well-formed XML
PCDATA
Exercise 1.
a)Using "tourguide" above as a good example, create an XML document whose root is "classlist" . This CLASSLIST is created from a starting point of single entity, STUDENT. Any number of students contain elements: firstname, lastname, emailaddress.
Basic data structures
![]() |
XML - Managing Data Exchange
|
Related Topics
|
Get Involved
|
Previous Chapter | Next Chapter |
← A single entity | The one-to-many relationship → |
Learning objectives
|
Introduction
[edit | edit source]In reviewing the four central problems in data management, (capture, storage, retrieval, and exchange) the typical user of XML encounters recurring fundamental structural patterns that apply to all sorts of data throughout the storage and exchange phases. These patterns recur consistently because their use transcends the particular contexts in which the underlying data are processed. We call these patterns "data structures" (or datatypes).
In this section, we discuss a few of the most fundamental "basic data structures" and explain why they are useful, as well as how to work with them using XML.
We start our introduction with a simple example. Consider an ordinary grocery shopping list for a single-person household.
Introductory Shopping List Example:
Andy's shopping list: * eggs * cough syrup(pick up for granny) * orange juice * bread * laundry detergent **don't forget this**
When analyzing aspects of the information contained in this shopping list, we can make some basic generalizations:
- Portability: the shopping list can be represented and transferred easily. If necessary, it could be stored in a database and processed by custom-designed software, but it could just as easily be written on a scrap of paper;
- Comprehensibility: the shopping list is readily understood by its intended audience (in this instance, the sole person who wrote the list) and therefore needs no additional information or structure in order to be immediately usable;
- Adaptability: if any changes become necessary (such as additions or removals to the list) there is an existing and well-known methodology for accomplishing this (e.g., in the case of a handwritten list, simply write down new entries or cross out unwanted entries).
The fundamental concept of basic data structures
[edit | edit source]Given that we have the previous example for background, we can now introduce the fundamental concept of "basic data structures".
Basic data structures defined
[edit | edit source]Now that we have introduced our concept of data structures, we can start with some concrete definitions, and then review those definitions in the context of our shopping list example.
Overview of "core" data structures
[edit | edit source]The following terms define some "core" data structures[1] that we use throughout this chapter. This list is ordered in ascending degrees of complexity:
- SimpleBoolean: Any value capable of being expressed as either "True" or "False".
- SimpleString: A contiguous sequence of characters, including both alphanumeric and non-alphanumeric.
- SimpleSequence: An enumeration of items generally accessible by numeric indexing.
- Name-value pair: An arbitrary singular name attached to a singular value.
- SimpleDictionary: An enumeration of items generally accessible by alphanumeric indexing.
- SimpleTable: An ordered arrangement of columns and rows. A SimpleTable can be classified as a "composite" data structure (e.g., SimpleSequence where each item in the sequence is a single SimpleDictionary).
An important point to remember while reviewing these "core" data structures is that they are elemental and complementary. That is, the core structures, when used in combination, can form even more complex structures. Once the reader comes to understand this fact, it will become apparent that there is no conceivable application or data specification that cannot be wholly described in XML using nothing more than these "core" data structures.
![]() |
Once we understand the "core" data structures, we can use them in combination to represent any conceivable kind of structured information. |
Now review the "Introductory Shopping List Example" above. When we compare it with the "core" data structures that we've just defined, we can make some fairly straightforward observations:
- The entire shopping list cannot be represented using a SimpleBoolean data structure, because the information is more complex than either "True" or "False".
- The entire shopping list can be represented using a SimpleString.
- There may be reasons why we would not want to use a SimpleString to represent the entire shopping list. For example, we might want to transfer the list into a database or other software application and then be able to sort, query, duplicate or otherwise process individual items on the list. Treating the entire list as a SimpleString would therefore complicate our processing requirements.
SimpleString
[edit | edit source]Different ways to represent a SimpleString in XML:
<Example>
<String note="This XML attribute contains a SimpleString.">
This XML Text Node represents a SimpleString.
</String>
<!-- This XML comment contains a SimpleString -->
<![CDATA[ This XML CDATA section contains a SimpleString. ]]>
</Example>
SimpleSequence
[edit | edit source]Different ways to represent a SimpleSequence in XML:
<Example>
<!-- use a single XML attribute with a space-delimited list of items -->
<ShoppingList items="bread eggs milk juice" />
<!-- use a single XML attribute with a semicolon-delimited list of items
(this allows us to add items with spaces in them) -->
<ShoppingList items="bread;cough syrup;milk;juice;laundry detergent" />
<!-- yet another way (but not necessarily a good way)
using multiple XML attributes -->
<ShoppingList item00="bread" item01="eggs" item02="cough syrup" />
<!-- yet another way
using XML child elements -->
<ShoppingList>
<item>eggs</item><item>milk</item><item>cough syrup</item>
</ShoppingList>
</Example>
Name-value pair
[edit | edit source]SimpleDictionary
[edit | edit source]SimpleTable
[edit | edit source]Side-by-side examples
[edit | edit source]SimpleTable (XML_Elem):
<table>
<tr><item>eggs</item><getfor>andy</getfor><notes></notes></tr>
<tr><item>milk</item><getfor>andy</getfor><notes></notes></tr>
<tr><item>laundry detergent</item><getfor>andy</getfor><notes></notes></tr>
<tr><item>cough syrup</item><getfor>granny</getfor><notes>try to get grape flavor</notes></tr>
</table>
SimpleTable (XML_Attr):
<table>
<tr item="eggs" getfor="andy" notes="" />
<tr item="milk" getfor="andy" notes="" />
<tr item="laundry detergent" getfor="andy" notes="" />
<tr item="cough syrup" getfor="granny" notes="try to get grape flavor" />
</table>
SimpleTable (XML_Mixed):
<table>
<tr>
<item getfor="andy" >eggs</item><notes></notes>
</tr>
<tr>
<item getfor="andy" >milk</item><notes></notes>
</tr>
<tr>
<item getfor="andy" >laundry detergent</item><notes></notes>
</tr>
<tr>
<item getfor="granny">cough syrup</item><notes>try to get grape flavor</notes>
</tr>
</table>
Basic data structures in programming
[edit | edit source]To further illustrate how basic data structures apply in many different contexts, some of the basic data structures enumerated previously are examined and compared here in the context of computer programming.
For the first part of the comparison, we examine the generic terminology against that used commonly in programming languages:
- SimpleBoolean: is commonly called a
boolean
and can usually take the valuestrue
orfalse
,0
or1
, or other values, depending on the language. - SimpleString: commonly called a
string
orstringBuffer
. - SimpleSequence: numerically indexed variables in programming are commonly represented with an
array
. - Name-value pair: (explained in more detail below)
- SimpleDictionary: these are commonly represented with a
dictionary
, or anassociative array
. - SimpleTable: (explained in more detail below)
Technical considerations
[edit | edit source]Now that we've introduced and discussed specific examples of the basic data structures, there are a few technical considerations that apply to all of the data structures, and are particularly important to those who may be responsible for implementing and designing XML schemas to deal with specific implementation scenarios.
- Exact terminology depends on context: Although the "basic" structures described here apply to many different scenarios, the terms used to describe them can overlap or conflict. For example, the term "SimpleSequence" as used here closely coincides with what is called an "array" in many programming languages. Similarly, the term "SimpleDictionary" is shorthand for what some programming languages call an "associative array". Although this close correlation is intentional, one must always remember that the specific nuances of an application or programming language will require additional attention. Sometimes minor conflicts or discrepancies arise when one digs into the details for any specific data structure in any given project or technology.
- Basic structures are flexible concepts: Structures can be defined in terms of one another, and some structures can be applied recursively. For example, one could easily define a SimpleSequence using a SimpleString along with some basic assumptions. (e.g., a SimpleSequence is a string of alphanumeric characters where each item in the sequence is separated by one or more whitespace characters: "eggs bread butter milk").
- Abstract structures tend to hide tricky details: For example, the term "SimpleString" describes the abstract notion of a sequence of characters (e.g., "ISBN 0-596-00327-7"). The abstract notion is fairly intuitive and uncomplicated. Nevertheless, the precise notation used to implement that abstract notion, and represent it in real-live working code is a different matter entirely. Different programming languages and different environments may use different conventions for representing the same "string". Because of this variability, one can also recognize that the abstract notion of a "SimpleString" in XML is also subject to differing representations, based on the needs of any given project.
Notes and references
[edit | edit source]- ↑ An important note: the basic terms used here are generalizations. Although they may coincide with terms used in specific software, specific programming languages, or specific applications, these are not intended as technically precise definitions. The concepts described here are presented to help emphasize the context-neutral principle of interoperability in XML.
The one-to-many relationship
![]() |
XML - Managing Data Exchange
|
Related Topics
|
Get Involved
|
Previous Chapter | Next Chapter |
← Basic data structures | The one-to-one relationship → |
Learning objectives
|
Introduction
[edit | edit source]In a one-to-many relationship, one object can reference several instances of another. A model is mapped into a schema whereby each data model entity becomes a complex element type. Each data model attribute becomes a simple element type, and the one-to-many relationship is recorded as a sequence.
Exhibit 1:Data model for 1:m relationship
In the previous chapter, we introduced a simple XML schema, XML document, and an XML stylesheet for a single entity data model. We now include more features of each of the key aspects of XML.
Implementing a one-to-many relationship
[edit | edit source]There are three different techniques for implementing a one-to-many relationship:
Containment relationship: A structure is defined where one element is contained within another. The "contained" element ceases to exist when the "container" element is removed. For instance, where a city has many hotels, the hotels are "contained" in the city.
<cityDetails>
<cityName>Belmopa</cityName>
<hotelDetails>
<hotelName>Bull Frog Inn</hotelName>
</hotelDetails>
<hotelDetails>
<hotelName>Pook's Hill Lodge</hotelName>
</hotelDetails>
</cityDetails>
<cityDetails>
<cityName>Kuala Lumpur</cityName>
<hotelDetails>
<hotelName>Pan Pacific Kuala Lumpur</hotelName>
</hotelDetails>
<hotelDetails>
<hotelName>Mandarin Oriental Kuala Lumpur</hotelName>
</hotelDetails>
</cityDetails>
Intra-document relationships: In a case where you have one city with many hotels, rather than a city containing hotels, a hotel will have a "location in" relationship to a city. A city id is used as a reference on the hotel element. Therefore, rather than the hotels being contained in the city, they now just reference the city's id via the cityRef attribute. This is very similar to a foreign key in a relational database.
<cityDetails>
<city ID="c1">
<cityName>Belmopa</cityName>
</city ID>
<city ID="c2">
<cityName>Kuala Lumpur</cityName>
</city ID>
</cityDetails>
<hotelDetails>
<hotel cityRef="c1">
<hotelName>Bull Frog Inn</hotelName>
</hotel>
<hotel cityRef="c2">
<hotelName>Pan Pacific Kuala Lumpur</hotelName>
</hotel>
</hotelDetails>
Inter-document relationships: The inter-document relationship is much like the intra-document relationship. It also uses the id and idRef attributes to assign an attribute to a parent attribute. The difference is that the inter-document relationship is used when tables, such as the city and hotel tables, might live in different filesystems or tablespaces.
<city id="c1">
<cityName>Belmopa</cityName>
</city>
<city id="c2">
<cityName>Kuala Lumpur</cityName>
</city>
<hotel>
<city href="cityDetails.xml#c1"/>
<hotelName>Bull Frog Inn</hotelName>
</hotel>
<hotel>
<city href="cityDetails.xml#c2"/>
<hotelName>Pan Pacific Kuala Lumpur</hotelName>
</hotel>
Exhibit 2:Checklist for deciding what technique to use:
Technique | Passing Data | Flexibility | Ease of Use |
---|---|---|---|
Containment | Excellent | Fair | Excellent |
Intra-Document | Good | Good | Good |
Inter-Document | Fair | Excellent | Fair |
XML schema
[edit | edit source]Some of the built-in data types for an XML schema were introduced in the previous chapter, but still, there are more that are very useful, such as anyURI, date, time, year, and month. In addition to the built-in data types, a custom data type can be defined by the schema designer to accept specific data input. As we have learned, data are defined in XML documents using markup tags defined in an XML schema. However, some elements might not have values. An empty element tag can be used to address this situation. An empty element tag (and any custom markup tag) can contain attributes that add additional information about the tag without adding extra text to the element. An example will be shown in the chapter, using attributes in an empty element tag.
Empty elements with attributes in XML document
[edit | edit source]Elements can have different content types depending on how each element is defined in the XML schema. The different types are element content, mixed content, simple content, and empty content. An XML element consists of everything from the start of the element tag to the close of that element tag.
- An element with element content is the root element - everything in between the opening and closing tags consists of elements only.
Example: | <tourGuide> |
: | |
</tourGuide> |
- A mixed content element is one that has text and as well as other elements between its opening and closing tags.
Example: | <restaurant>My favorite restaurant is |
<restaurantName>Provino's Italian Restaurant</restaurantName> | |
: | |
</restaurant> |
- A simple content element is one that contains only text between its opening and closing tags.
Example: | <restaurantName>Provino's Italian Restaurant</restaurantName> |
- An empty content element, which is an empty element, is one that does not contain anything between its opening and closing tags (or the element tag is opened and ended with a single tag, by using / before the closing of the opening tag.
Example: | <hotelPicture filename="pan_pacific.jpg" size="80" |
value="Image of Pan Pacific"/> |
An empty element is useful when there is no need to specify its content or that the information describing the element is fixed. Two examples illustrated this concept. First, a picture element that references the source of an image with its attributes, but has no need in specifying text content. Second, the owner’s name is fixed for a company, thus it can specify the related information inside the owner tag using attributes. An attribute is meta-information, information that describes the content of the element.
European Central Bank's use of XML
[edit | edit source]<?xml version="1.0" encoding="UTF-8"?>
<gesmes:Envelope xmlns:gesmes="http://www.gesmes.org/xml/2002-08-01"
xmlns="http://www.ecb.int/vocabulary/2002-08-01/eurofxref">
<gesmes:subject>Reference rates</gesmes:subject>
<gesmes:Sender>
<gesmes:name>European Central Bank</gesmes:name>
</gesmes:Sender>
<Cube>
<Cube time="2004-05-28">
<Cube currency="USD" rate="1.2246"/>
<Cube currency="JPY" rate="135.77"/>
<Cube currency="DKK" rate="7.4380"/>
<Cube currency="GBP" rate="0.66730"/>
<Cube currency="SEK" rate="9.1150"/>
<Cube currency="CHF" rate="1.5304"/>
<Cube currency="ISK" rate="87.72"/>
<Cube currency="NOK" rate="8.2120"/>
</Cube>
</Cube>
<!--For the sake of illustration, some of the currencies are omitted
in the preceding code.Banks, consultants, currency traders,
and firms involved in international trade are the major users
of this information.-->
</gesmes:Envelope>
XML schema data types
[edit | edit source]Some of the commonly used data types, such as string, decimal, integer, and boolean, are introduced in chapter 2. The following are a few more data types that are useful.
Exhibit 3:Other data types:
Type | Format | Example | Comment |
---|---|---|---|
year | YYYY | 1999 | |
month | YYYY-MM | 1999-03 | Month type is used when the day is irrelevant for the data element |
time | hh:mm:ss.sss with optional time zone indicator | 20:14:05 | Z for UTC or one of –hh:mm or +hh:mm to indicate the difference from UTC. This time type is used when you want the content to represent a particular time of day that recurs every day, such as 4:15 pm. |
date | YYYY-MM-DD | 1999-03-14 | |
anyURI | The domain name specified beginning with http:// | http://www.panpacific.com |
More data types
[edit | edit source]Besides the built-in data types, custom data types can be created as required. A custom data type can be a simple type or complex type. For simplicity, we create a custom data type that is a simple type, which means that the element does not contain other elements or attributes. It contains text only. The creation of a custom simple type starts from using a built-in simple type and applying it with restrictions, or facets, to limit the acceptable values of the tag. A custom simple type can be nameless or named. If the custom simple type is to be used only once, then it makes sense to not name it; thus, that custom type will only be used in where it is defined. Since a named custom type can be referenced (by its name), that custom type can be used wherever necessary.
A pattern can be used to specify exactly how the content of the element should look. For example, one might want to specify the format of a telephone number, a postal code, or a product code. By having a defined pattern for certain elements, the data exchanged will be uniform and the values will be consistent when stored in a database. A useful way to set patterns is through Regex, which will be discussed in later chapters.
Schema examples
[edit | edit source]The following is a schema that extends the schema introduced in the previous chapter to include a one-to-many relationship of city to hotels with two examples of custom data types.
Exhibit 1:Data model for 1:m relationship
Important, this is a continuing example, so new code is added to the last chapter's example!
Containment example
[edit | edit source] <?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
<!--Tour Guide-->
<xsd:element name="tourGuide">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="city" type="cityDetails" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<!--This will contain the City details-->
<xsd:complexType name="cityDetails">
<xsd:sequence>
<xsd:element name="cityName" type="xsd:string"/>
<xsd:element name="adminUnit" type="xsd:string"/>
<xsd:element name="country" type="xsd:string"/>
<!--The element Continent uses a Nameless Custom Simple Type-->
<xsd:element name="continent">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="Asia"/>
<xsd:enumeration value="Africa"/>
<xsd:enumeration value="Australia"/>
<xsd:enumeration value="Europe"/>
<xsd:enumeration value="North America"/>
<xsd:enumeration value="South America"/>
<xsd:enumeration value="Antarctica"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="population" type="xsd:integer"/>
<xsd:element name="area" type="xsd:integer"/>
<xsd:element name="elevation" type="xsd:integer"/>
<xsd:element name="longitude" type="xsd:decimal"/>
<xsd:element name="latitude" type="xsd:decimal"/>
<xsd:element name="description" type="xsd:string"/>
<xsd:element name="history" type="xsd:string"/>
<xsd:element name="hotel" type="hotelDetails" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<!-- This will contain the Hotel details-->
<xsd:complexType name="hotelDetails">
<xsd:sequence>
<xsd:element name="hotelName" type="xsd:string"/>
<xsd:element name="hotelPicture"/>
<xsd:element name="streetAddress" type="xsd:string"/>
<xsd:element name="postalCode" type="xsd:string" minOccurs="0"/>
<xsd:element name="phone" type="xsd:string"/>
<xsd:element name="emailAddress" type="emailAddressType" minOccurs="0"/>
<!-- The custom simple type, emailAddressType, defined in the xsd:complexType,
is used as the type of the emailAddress element. -->
<xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
<xsd:element name="hotelRating" type="xsd:integer"/>
</xsd:sequence>
</xsd:complexType>
<!-- NOTE: Since postalCode, emailAddress, and websiteURL are not standard elements that
must be provided, the minOccurs=”0” indicates that they are optional -->
<!--This is a Named Custom SimpleType that is called from Hotel whenever someone types in an
email address-->
<xsd:simpleType name="emailAddressType">
<xsd:restriction base="xsd:string">
<!--You can learn more about this pattern by reading the Regex section.-->
<xsd:pattern value="\w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
Intra-document example
[edit | edit source] <?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
<!--Tour Guide-->
<xsd:element name="tourGuide">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="city" type="cityDetails" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<!--This will contain the City details-->
<xsd:complexType name="cityDetails">
<xsd:sequence>
<xsd:element name="cityID" type="xsd:ID"/>
<xsd:element name="cityName" type="xsd:string"/>
<xsd:element name="adminUnit" type="xsd:string"/>
<xsd:element name="country" type="xsd:string"/>
<!--The element Continent uses a Nameless Custom Simple Type-->
<xsd:element name="continent">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="Asia"/>
<xsd:enumeration value="Africa"/>
<xsd:enumeration value="Australia"/>
<xsd:enumeration value="Europe"/>
<xsd:enumeration value="North America"/>
<xsd:enumeration value="South America"/>
<xsd:enumeration value="Antarctica"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="population" type="xsd:integer"/>
<xsd:element name="area" type="xsd:integer"/>
<xsd:element name="elevation" type="xsd:integer"/>
<xsd:element name="longitude" type="xsd:decimal"/>
<xsd:element name="latitude" type="xsd:decimal"/>
<xsd:element name="description" type="xsd:string"/>
<xsd:element name="history" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<!-- This will contain the Hotel details-->
<xsd:complexType>
<xsd:sequence>
<xsd:element name="hotel" type="hotelDetails" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="hotelDetails">
<xsd:sequence>
<xsd:element name="cityRef" type="xsd:IDRef"/>
<xsd:element name="hotelName" type="xsd:string"/>
<xsd:element name="hotelPicture"/>
<xsd:element name="streetAddress" type="xsd:string"/>
<xsd:element name="postalCode" type="xsd:string" minOccurs="0"/>
<xsd:element name="phone" type="xsd:string"/>
<xsd:element name="emailAddress" type="emailAddressType" minOccurs="0"/>
<!-- The custom simple type, emailAddressType, defined in the xsd:complexType,
is used as the type of the emailAddress element. -->
<xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
<xsd:element name="hotelRating" type="xsd:integer"/>
</xsd:sequence>
</xsd:complexType>
<!-- NOTE: Since postalCode, emailAddress, and websiteURL are not standard elements that
must be provided, the minOccurs=”0” indicates that they are optional -->
<!--This is a Named Custom SimpleType that is called from Hotel whenever someone types in an
email address-->
<xsd:simpleType name="emailAddressType">
<xsd:restriction base="xsd:string">
<!--You can learn more about this pattern by reading the Regex section.-->
<xsd:pattern value="\w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
Inter-document example
[edit | edit source]<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
<!--Tour Guide-->
<xsd:element name="tourGuide">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="city" type="cityDetails" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<!--This will contain the City details-->
<xsd:complexType name="cityDetails">
<xsd:sequence>
<xsd:element name="cityID" type="xsd:ID"/>
<xsd:element name="cityName" type="xsd:string"/>
<xsd:element name="adminUnit" type="xsd:string"/>
<xsd:element name="country" type="xsd:string"/>
<!--The element Continent uses a Nameless Custom Simple Type-->
<xsd:element name="continent">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="Asia"/>
<xsd:enumeration value="Africa"/>
<xsd:enumeration value="Australia"/>
<xsd:enumeration value="Europe"/>
<xsd:enumeration value="North America"/>
<xsd:enumeration value="South America"/>
<xsd:enumeration value="Antarctica"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="population" type="xsd:integer"/>
<xsd:element name="area" type="xsd:integer"/>
<xsd:element name="elevation" type="xsd:integer"/>
<xsd:element name="longitude" type="xsd:decimal"/>
<xsd:element name="latitude" type="xsd:decimal"/>
<xsd:element name="description" type="xsd:string"/>
<xsd:element name="history" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<!-- This will contain the Hotel details-->
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
<!--Tour Guide 2-->
<xsd:element name="tourGuide2">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="hotel" type="hotelDetails" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="hotelDetails">
<xsd:sequence>
<xsd:element name="cityRef" type="xsd:IDRef"/>
<xsd:element name="hotelName" type="xsd:string"/>
<xsd:element name="hotelPicture"/>
<xsd:element name="streetAddress" type="xsd:string"/>
<xsd:element name="postalCode" type="xsd:string" minOccurs="0"/>
<xsd:element name="phone" type="xsd:string"/>
<xsd:element name="emailAddress" type="emailAddressType" minOccurs="0"/>
<!-- The custom simple type, emailAddressType, defined in the xsd:complexType,
is used as the type of the emailAddress element. -->
<xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
<xsd:element name="hotelRating" type="xsd:integer"/>
</xsd:sequence>
</xsd:complexType>
<!-- NOTE: Since postalCode, emailAddress, and websiteURL are not standard elements that
must be provided, the minOccurs=”0” indicates that they are optional -->
<!--This is a Named Custom SimpleType that is called from Hotel whenever someone types in an
email address-->
<xsd:simpleType name="emailAddressType">
<xsd:restriction base="xsd:string">
<!--You can learn more about this pattern by reading the Regex section.-->
<xsd:pattern value="\w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
Refers to Chapter 2 - A single entity for steps in using NetBeans to create the above XML schema.
XML document
[edit | edit source]Attributes
- The valid element naming structure applies to attribute names as well
- In a given element, all attributes’ names must be unique
- An attribute may not contain the symbol ‘<’ The character string ‘<’ can be used to represent it
- Each attribute must have a name and a value. (i.e. <hotelPicture filename=“pan_pacific.jpg” />, filename is the name and pan_pacific.jpg is the value)
- If the assigned value itself contains a quoted string, the type of quotation marks must differ from those used to enclose the entire value. (For instance, if double quotes are used to enclose the whole value then use single quotes for the string: <name familiar=”’Jack’”>John Smith</name>)
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="city_hotel.xsl"?>
<tourGuide xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="TourGuide3.xsd">
<!--This is where you define the first city and all its attributes-->
<city>
<cityName>Belmopa</cityName>
<adminUnit>Cayo</adminUnit>
<country>Belize</country>
<!--The content of the element “continent” must be one of the values specified in the set of
acceptable values in the XML schema for the element “continent”-->
<continent>South America</continent>
<population>11100</population>
<area>5</area>
<elevation>130</elevation>
<longitude>12.3</longitude>
<latitude>123.4</latitude>
<description>Belmopan is the capital of Belize</description>
<history>Belmopan was established following devastation of the former capitol, Belize City ,
by Hurricane Hattie in 1965. High ground and open space influenced the choice and
ground-breaking began in 1966. By 1970 most government offices and operations had
already moved to the new location. </history>
<!--This is where you would store the name of the Hotel and its attributes-->
<!--Notice that the hotelDetails elements did not contain the postalCode entity. The document is
still valid, because postalCode is optional-->
<hotel>
<hotelName>Bull Frog Inn</hotelName>
<!--The empty element, hotelPicture, contains attributes: “filename”, “size”, and “value”, to
indicate the name and location of the image file, the desired size, and
the description of the empty element, hotelPicture-->
<hotelPicture filename="bull_frog_inn.jpg" size="80" value="Image of Bull Frog Inn"
imageURL="http://www.bullfroginn.com"/>
<streetAddress>25 Half Moon Avenue</streetAddress>
<phone>501-822-3425</phone>
<!--The emailAddress elements must match the pattern specified in the schema to be valid -->
<emailAddress>bullfrog@btl.net</emailAddress>
<websiteURL>http://www.bullfroginn.com/</websiteURL>
<hotelRating>4</hotelRating>
</hotel>
<!--This is where you put the information for another Hotel-->
<hotel>
<hotelName>Pook's Hill Lodge</hotelName>
<hotelPicture filename="pook_hill_lodge.jpg" size="80" value="Image of Pook's Hill
Lodge" imageURL="http://www.global-travel.co.uk/pook1.htm"/>
<streetAddress>Roaring River</streetAddress>
<phone>440-126-854-1732</phone>
<emailAddress>info@global-travel.co.uk</emailAddress>
<websiteURL>http://www.global-travel.co.uk/pook1.htm</websiteURL>
<hotelRating>3</hotelRating>
</hotel>
</city>
<!--This is where you define another city and its attributes-->
<city>
<cityName>Kuala Lumpur</cityName>
<adminUnit>Selangor</adminUnit>
<country>Malaysia</country>
<continent>Asia</continent>
<population>1448600</population>
<area>243</area>
<elevation>111</elevation>
<longitude>101.71</longitude>
<latitude>3.16</latitude>
<description>Kuala Lumpur is the capital of Malaysia and is the largest city in the nation.
</description>
<history>The city was founded in 1857 by Chinese tin miners and superseded Klang. In 1880
the British government transferred their headquarters from Klang to Kuala Lumpur , and
in 1896 it became the capital of Malaysia. </history>
<!--This is where you put the information for a Hotel-->
<hotel>
<hotelName>Pan Pacific Kuala Lumpur </hotelName>
<hotelPicture filename="pan_pacific.jpg" size="80" value="Image of Pan Pacific"
imageURL="http://www.malaysia-hotels-discount.com/hotels/kualalumpur/pan_pacific_hotel/index.shtml"/>
<streetAddress>Jalan Putra</streetAddress>
<postalCode>50746</postalCode>
<phone>1-866-260-0402</phone>
<emailAddress>president@panpacific.com</emailAddress>
<websiteURL>http://www.panpacific.com</websiteURL>
<hotelRating>5</hotelRating>
</hotel>
<!--This is where you put the information for another Hotel-->
<hotel>
<hotelName>Mandarin Oriental Kuala Lumpur </hotelName>
<hotelPicture filename="mandarin_oriental.jpg" size="80" value="Image of Mandarin
Oriental" imageURL="http://www.mandarinoriental.com/kualalumpur"/>
<streetAddress>Kuala Lumpur City Centre</streetAddress>
<postalCode>50088</postalCode>
<phone>011-603-2380-8888</phone>
<emailAddress>mokul-sales@mohg.com</emailAddress>
<websiteURL>http://www.mandarinoriental.com/kualalumpur/</websiteURL>
<hotelRating>5</hotelRating>
</hotel>
</city>
</tourGuide>
Table 3-2: XML Document for a one-to-many relationship – city_hotel.xml
Refers to Chapter 2 - A single entity for steps in using NetBeans to create the above XML document.
XML style sheet
[edit | edit source]<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html"/>
<xsl:template match="/">
<html>
<head>
<title>Tour Guide</title>
</head>
<body>
<h2>Cities</h2>
<xsl:apply-templates select="tourGuide"/>
</body>
</html>
</xsl:template>
<xsl:template match="tourGuide">
<xsl:for-each select="city">
<xsl:text>City: </xsl:text>
<xsl:value-of select="cityName"/>
<br/>
<xsl:text>Population: </xsl:text>
<xsl:value-of select="population"/>
<br/>
<xsl:text>Country: </xsl:text>
<xsl:value-of select="country"/>
<br/>
<xsl:for-each select="hotel">
<xsl:text>Hotel: </xsl:text>
<xsl:value-of select="hotelName"/>
<br/>
</xsl:for-each>
<br/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Summary
[edit | edit source]Besides the simple built-in data types (e.g, year, month, time, anyURI, and date) schema designers may create custom data types to suit their needs. A simple custom data type can be created from one of the built-in data types by applying to it some restrictions, facets (enumerations that specify a set of acceptable values), or specific patterns.
An empty element does not contain any text, however, it may contain attributes to provide additional information about that element. The presentation layout for displaying a HTML page can include code for style tags, background color, font size, font weight, and alignment. Table tags can be used to organize the layout of content in a HTML page, and images can also be displayed using an image tag. |
Exercises
[edit | edit source]In order to learn more about the one-to-many relationship, exercises are provided.
Answers
[edit | edit source]In order to learn more about the one-to-many relationship, answers are provided to go with the exercises above.
The one-to-one relationship
![]() |
XML - Managing Data Exchange
|
Related Topics
|
Get Involved
|
Previous Chapter | Next Chapter |
← The one-to-many relationship | The many-to-many relationship → |
Learning objectives
|
Introduction
[edit | edit source]In the previous chapter, some new features of XML schemas, documents, and stylesheets were introduced as well as how to model a one-to-many relationship. In this chapter, we will introduce the modeling of a one-to-one relationship in XML. We will also introduce more features of an XML schema.
A one-to-one (1:1) relationship
[edit | edit source]The following diagram shows a one-to-one and a one-to-many relationship. The one-to-one relationship records each country as a single top destination.
Exhibit 4-1: Data model for a 1:1 relationship
XML schema
[edit | edit source]A one-to-one (1:1) relationship is represented in the data model in Exhibit 4-1. The addition of country and destination to the data model allows the 1:1 relationship named topDestination. A country has many different destinations, but only one top destination. The XML schema in Exhibit 4-2 shows how to represent a 1:1 relationship in an XML schema.
XML schema example
[edit | edit source]<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
<!--
Tour Guide
-->
<xsd:element name="tourGuide">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="country" type="countryDetails" minOccurs="1" maxOccurs="unbounded" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<!--
Country
-->
<xsd:complexType name="countryDetails">
<xsd:sequence>
<xsd:element name="countryName" type="xsd:string" minOccurs="1" maxOccurs="1"/>
<xsd:element name="population" type="xsd:integer" minOccurs="0" maxOccurs="1" default="0"/>
<xsd:element name="continent" minOccurs="0" maxOccurs="1">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="Asia"/>
<xsd:enumeration value="Africa"/>
<xsd:enumeration value="Australasia"/>
<xsd:enumeration value="Europe"/>
<xsd:enumeration value="North America"/>
<xsd:enumeration value="South America"/>
<xsd:enumeration value="Antarctica"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="topDestination" type="destinationDetails" minOccurs="0" maxOccurs="1"/>
<xsd:element name="destination" type="destinationDetails" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<!--
Destination
-->
<xsd:complexType name="destinationDetails">
<xsd:all>
<xsd:element name="destinationName" type="xsd:string"/>
<xsd:element name="description" type="xsd:string"/>
<xsd:element name="streetAddress" type="xsd:string" minOccurs="0"/>
<xsd:element name="telephoneNumber" type="xsd:string" minOccurs="0"/>
<xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
</xsd:all>
</xsd:complexType>
</xsd:schema>
Exhibit 4-2: XML Schema for a one-to-one relationship
New elements in schema
[edit | edit source]
Let’s examine the new elements and attributes in the schema in Exhibit 4-2.
- Country is a complex type defined in City to represent the 1:M relationship between a country and its cities.
- Destination is a complex type defined in Country to represent the 1:M relationship between a country and its many destinations.
- topDestination is a complex type defined in Country to represent the 1:1 relationship between a country and its top destination.
Restrictions in schema
[edit | edit source]
Placing restrictions on elements was introduced in the previous chapter; however, there are more potentially useful restrictions that can be placed on an element. Restrictions can be placed on elements and attributes that affect how the processor handles whitespace characters:
<xsd:element name="streetAddress">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:whiteSpace value="preserve"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
White space & length constraints
[edit | edit source]The whiteSpace constraint is set to "preserve", which means that the XML processor will not remove any white space characters. Other useful restrictions include the following:
- Replace – the XML processor will replace all whitespace characters with spaces.
- <xsd:whiteSpace value="replace"/>
- Collapse – The processor will remove all whitespace characters.
- <xsd:whiteSpace value="collapse"/>
- Length, maxLength, minLength—the length of the element can be fixed or can have a predefined range.
- <xsd:length value="8"/>
- <xsd:minLength value="5"/>
- <xsd:maxLength value="8"/>
Order indicators
[edit | edit source]In addition to placing restrictions on elements, order indicators can be used to define in what order elements should occur.
All indicator
[edit | edit source]The <all> indicator specifies by default that the child elements can appear in any order and that each child element must occur once and only once:
<xsd:element name="person">
<xsd:complexType>
<xsd:all>
<xsd:element name="firstname" type="xsd:string"/>
<xsd:element name="lastname" type="xsd:string"/>
</xsd:all>
</xsd:complexType>
</xsd:element>
Choice indicator
[edit | edit source]The <choice> indicator specifies that either one child element or another can occur:
<xsd:element name="person">
<xsd:complexType>
<xsd:choice>
<xsd:element name="employee" type="employee"/>
<xsd:element name="visitor" type="visitor"/>
</xsd:choice>
</xsd:complexType>
</xsd:element>
Sequence indicator
[edit | edit source]The <sequence> indicator specifies that the child elements must appear in a specific order:
<xsd:element name="person">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="firstname" type="xsd:string"/>
<xsd:element name="lastname" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
XML document
[edit | edit source]
The XML document in Exhibit 4-3 shows how the new elements (country and destination) defined in the XML schema found in Exhibit 4-2 are used in an XML document. Note that the child elements of <topDestination> can appear in any order because of the <xsd:all> order indicator used in the schema.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="newXMLSchema.xsl" media="screen"?>
<tourGuide xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="XMLSchema.xsd">
<!--
Malaysia
-->
<country>
<countryName>Malaysia</countryName>
<population>22229040</population>
<continent>Asia</continent>
<topDestination>
<description>A popular duty-free island north of Penang.</description>
<destinationName>Pulau Langkawi</destinationName>
</topDestination>
<destination>
<destinationName>Muzium Di-Raja</destinationName>
<description>The original palace of the Sultan</description>
<streetAddress>122 Muzium Road</streetAddress>
<telephoneNumber>48494030</telephoneNumber>
<websiteURL>www.muziumdiraja.com</websiteURL>
</destination>
<destination>
<destinationName>Kinabalu National Park</destinationName>
<description>A national park</description>
<streetAddress>54 Ocean View Drive</streetAddress>
<telephoneNumber>4847101</telephoneNumber>
<websiteURL>www.kinabalu.com</websiteURL>
</destination>
</country>
<!--
Belize
-->
<country>
<countryName>Belize</countryName>
<population>249183</population>
<continent>South America</continent>
<topDestination>
<destinationName>San Pedro</destinationName>
<description>San Pedro is an island off the coast of Belize</description>
</topDestination>
<destination>
<destinationName>Belize City</destinationName>
<description>Belize City is the former capital of Belize</description>
<websiteURL>www.belizecity.com</websiteURL>
</destination>
<destination>
<destinationName>Xunantunich</destinationName>
<description>Mayan ruins</description>
<streetAddress>4 High Street</streetAddress>
<telephoneNumber>011770801</telephoneNumber>
</destination>
</country>
</tourGuide>
Exhibit 4-3: XML Document for a one-to-one relationship
Summary
[edit | edit source]Schema designers may place restrictions on the length of elements and on how the processor handles white space. Schema designers may also specify fixed or default values for an element. Order indicators can be used to specify the order in which elements must appear in an XML document. |
The many-to-many relationship
![]() |
XML - Managing Data Exchange
|
Related Topics
|
Get Involved
|
Previous Chapter | Next Chapter |
← The one-to-one relationship | Recursive relationships → |
Learning objectives
|
Introduction
[edit | edit source]In the previous chapters, you learned how to use XML to structure and format data based on one-to-one and one-to-many relationships. Because XML provides the means to model data using hierarchical parent-child relationships, the one-to-one and one-to-many relationships are relatively simple to represent in XML. However, this hierarchical parent-child structure is difficult to use to model the many-to-many relationship, a common relationship between entities in many situations.
In this chapter, we will explore the pros and cons of a few methods that are used to model a many-to-many relationship in XML; these methods offer compromises in overcoming the problems that arise when applying this relationship to XML. In particular, we will see examples of how to model the many-to-many relationship using two different methods, "Eliminate" and "ID/IDREF." Additionally, in the XML stylesheet, we will learn how to implement the key function to display the data that was modeled using the "ID/IDREF" method.
Problems: many-to-many relationship
[edit | edit source]In XML, the parent-child relationship is most commonly used to represent a relationship. This can easily be applied to a one-to-one or one-to-many relationship. A many-to-many relationship is not supported directly by XML; the parent-child relationship will not work as each element may only have a single parent element. There are couple of possible solutions to get around this.
Solutions: many-to-many relationship
[edit | edit source]Eliminate
[edit | edit source]Create XML documents that eliminate the need for a many-to-many relationship
By limiting the extent of information that is conveyed, you can get around the need for a many-to-many relationship. Instead of trying to have one XML document encompass all of the information, separate the information where one document describes only one of the entities that participates in the many-to-many relationship. Using our tourGuide relationship for example, one way for us to accomplish this would be creating a separate XML document for each hotel. The relationship with amenity would ultimately then become a one-to-many. This method is more suitable for situations in which the scope of data exchange can be limited to subsets of data. However, using this method for more broadly scoped data exchange, you may repeat data several times, especially if there are many attributes. To avoid this redundancy, use the ID/IDREF method.
ID/IDREF
[edit | edit source]Represent the many-to-many relationship using unique identifiers
Although not the most user-friendly way to handle this problem, one way of getting around the many-to-many relationship is by creating keys that would uniquely identify each entity. To do this, an element with ID or IDREF attributes-types must be specified within the XML schema. To use a data modeling analogy, ID is similar to the primary key, and IDREF is similar to the foreign key.
Many-to-many relationship data model
[edit | edit source]Exhibit 1: Data model for a m:m relationship
The relationship reads, a hotel can have many amenities, and an amenity can exist at many hotels.
As you will notice, in order to represent a many-to-many relationship, two entities were added. The middle entity is necessary for the data model to represent an associative entity that stores data about the relationship between hotel and amenity. Using our Tour Guide example, "Amenity" was added to represent a list of possible amenities that a hotel can possess.
The following examples illustrate methods to represent a many-to-many relationship in XML.
Eliminate: sample solution
[edit | edit source]In this example, the many-to-many relationship has been converted to a one-to-many relationship.
XML schema
[edit | edit source]Exhibit 2: XML schema for "Eliminate" method
<?xml version="1.0" encoding="UTF-8" ?>
<!--
Document : amenity1.xsd
Created on : February 4, 2006
Author : Dr. Rick Watson
-->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
<xsd:element name="hotelGuide">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="hotel" type="hotelDetails" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:simpleType name="emailAddressType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="hotelDetails">
<xsd:sequence>
<xsd:element name="hotelPicture"/>
<xsd:element name="hotelName" type="xsd:string"/>
<xsd:element name="streetAddress" type="xsd:string"/>
<xsd:element name="postalCode" type="xsd:string" minOccurs="0"/>
<xsd:element name="telephoneNumber" type="xsd:string"/>
<xsd:element name="emailAddress" type="emailAddressType" minOccurs="0"/>
<xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
<xsd:element name="hotelRating" type="xsd:integer" default="0"/>
<xsd:element name="lowerPrice" type="xsd:positiveInteger"/>
<xsd:element name="upperPrice" type="xsd:positiveInteger"/>
<xsd:element name="amenity" type="amenityValue" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="amenityValue">
<xsd:sequence>
<xsd:element name="amenityType" type="xsd:string"/>
<xsd:element name="amenityOpenHour" type="xsd:time"/>
<xsd:element name="amenityCloseHour" type="xsd:time"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
XML document
[edit | edit source]Exhibit 3: XML document for "Eliminate" method
<?xml version="1.0" encoding="UTF-8"?>
<!--
Document : amenity1.xml
Created on : February 4, 2006
Author : Dr. Rick Watson
-->
<hotelGuide xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="amenity1.xsd">
<hotel>
<hotelPicture/>
<hotelName>Narembeen Hotel</hotelName>
<streetAddress>Churchill Street</streetAddress>
<telephoneNumber>+61 (08) 9064 7272</telephoneNumber>
<emailAddress>narempub@oz.com.au</emailAddress>
<hotelRating>1</hotelRating>
<lowerPrice>50</lowerPrice>
<upperPrice>100</upperPrice>
<amenity>
<amenityType>Restaurant</amenityType>
<amenityOpenHour>06:00:00</amenityOpenHour>
<amenityCloseHour>22:00:00 </amenityCloseHour>
</amenity>
<amenity>
<amenityType>Pool</amenityType>
<amenityOpenHour>06:00:00</amenityOpenHour>
<amenityCloseHour>18:00:00 </amenityCloseHour>
</amenity>
<amenity>
<amenityType>Complimentary Breakfast</amenityType>
<amenityOpenHour>07:00:00</amenityOpenHour>
<amenityCloseHour>10:00:00 </amenityCloseHour>
</amenity>
</hotel>
<hotel>
<hotelPicture/>
<hotelName>Narembeen Caravan Park</hotelName>
<streetAddress>Currall Street</streetAddress>
<telephoneNumber>+61 (08) 9064 7308</telephoneNumber>
<emailAddress>naremcaravan@oz.com.au</emailAddress>
<hotelRating>1</hotelRating>
<lowerPrice>20</lowerPrice>
<upperPrice>30</upperPrice>
<amenity>
<amenityType>Pool</amenityType>
<amenityOpenHour>10:00:00</amenityOpenHour>
<amenityCloseHour>22:00:00 </amenityCloseHour>
</amenity>
</hotel>
</hotelGuide>
ID/IDREF: sample solution
[edit | edit source]To avoid redundancy, we create a separate element, "amenity," which is included at the top of the schema along with "hotel." Remember, the data types ID and IDREF are synonymous with the primary key and foreign key, respectively. For every foreign key (IDREF), there must be a matching primary key (ID). Note that the IDREF data type has to be an alphanumeric string.
The following example illustrates the ID/IDREF approach. Notice that the ID for the amenity pool is defined as "k1," and every hotel with a pool as an amenity references "k1," using IDREF. If the IDREF does not match any ID, then the document will not validate.
XML schema
[edit | edit source]Exhibit 4: XML schema for "ID/IDREF" method
<?xml version="1.0" encoding="UTF-8" ?>
<!--
Document : amenity2.xsd
Created on : February 4, 2006
Author : Dr. Rick Watson
-->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
<xsd:element name="hotelGuide">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="hotel" type="hotelDetails" minOccurs="1" maxOccurs="unbounded"/>
<xsd:element name="amenity" type="amenityList" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:simpleType name="emailAddressType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="hotelDetails">
<xsd:sequence>
<xsd:element name="hotelPicture"/>
<xsd:element name="hotelName" type="xsd:string"/>
<xsd:element name="streetAddress" type="xsd:string"/>
<xsd:element name="postalCode" type="xsd:string" minOccurs="0"/>
<xsd:element name="telephoneNumber" type="xsd:string"/>
<xsd:element name="emailAddress" type="emailAddressType" minOccurs="0"/>
<xsd:element name="websiteURL" type="xsd:anyURI" minOccurs="0"/>
<xsd:element name="hotelRating" type="xsd:integer" default="0"/>
<xsd:element name="lowerPrice" type="xsd:positiveInteger"/>
<xsd:element name="upperPrice" type="xsd:positiveInteger"/>
<xsd:element name="amenities" type="amenityDesc" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="amenityDesc">
<xsd:sequence>
<xsd:element name="amenityIDREF" type="xsd:IDREF"/>
<xsd:element name="amenityOpenHour" type="xsd:time"/>
<xsd:element name="amenityCloseHour" type="xsd:time"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="amenityList">
<xsd:sequence>
<xsd:element name="amenityID" type="xsd:ID"/>
<xsd:element name="amenityType" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
XML document
[edit | edit source]Exhibit 5: XML document for "ID/IDREF" method
<?xml version="1.0" encoding="UTF-8"?>
<!--
Document : amenity2.xml
Created on : February 4, 2006
Author : Dr. Rick Watson
-->
<?xml-stylesheet href="amenity2.xsl" type="text/xsl" media="screen"?>
<hotelGuide xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="amenity2.xsd">
<hotel>
<hotelPicture/>
<hotelName>Narembeen Hotel</hotelName>
<streetAddress>Churchill Street</streetAddress>
<telephoneNumber>+61 (08) 9064 7272</telephoneNumber>
<emailAddress>narempub@oz.com.au</emailAddress>
<hotelRating>1</hotelRating>
<lowerPrice>50</lowerPrice>
<upperPrice>100</upperPrice>
<amenities>
<amenityIDREF>k2</amenityIDREF>
<amenityOpenHour>06:00:00</amenityOpenHour>
<amenityCloseHour>22:00:00 </amenityCloseHour>
</amenities>
<amenities>
<amenityIDREF>k1</amenityIDREF>
<amenityOpenHour>06:00:00</amenityOpenHour>
<amenityCloseHour>18:00:00 </amenityCloseHour>
</amenities>
<amenities>
<amenityIDREF>k5</amenityIDREF>
<amenityOpenHour>07:00:00</amenityOpenHour>
<amenityCloseHour>10:00:00 </amenityCloseHour>
</amenities>
</hotel>
<hotel>
<hotelPicture/>
<hotelName>Narembeen Caravan Park</hotelName>
<streetAddress>Currall Street</streetAddress>
<telephoneNumber>+61 (08) 9064 7308</telephoneNumber>
<emailAddress>naremcaravan@oz.com.au</emailAddress>
<hotelRating>1</hotelRating>
<lowerPrice>20</lowerPrice>
<upperPrice>30</upperPrice>
<amenities>
<amenityIDREF>k1</amenityIDREF>
<amenityOpenHour>10:00:00</amenityOpenHour>
<amenityCloseHour>22:00:00 </amenityCloseHour>
</amenities>
</hotel>
<amenity>
<amenityID>k1</amenityID>
<amenityType>Pool</amenityType>
</amenity>
<amenity>
<amenityID>k2</amenityID>
<amenityType>Restaurant</amenityType>
</amenity>
<amenity>
<amenityID>k3</amenityID>
<amenityType>Fitness room</amenityType>
</amenity>
<amenity>
<amenityID>k4</amenityID>
<amenityType>Complimentary breakfast</amenityType>
</amenity>
<amenity>
<amenityID>k5</amenityID>
<amenityType>in-room data port</amenityType>
</amenity>
<amenity>
<amenityID>k6</amenityID>
<amenityType>Water slide</amenityType>
</amenity>
</hotelGuide>
Key function: XML stylesheet
[edit | edit source]In order to set up an XML stylesheet using the ID/IDREF method for a many-to-many relationship, the key function should be used. In the stylesheet, the <xsl:key> element specifies the index, which is used to return a node-set from the XML document.
A key consists of the following:
1. the node that has the key
2. the name of the key
3. the value of a key
The following XML stylesheet illustrates how to use the key function to present content that is structured in a many-to-many relationship.
XML stylesheet
[edit | edit source]Exhibit 6: XML stylesheet for "ID/IDREF" method
<?xml version="1.0" encoding="UTF-8"?>
<!--
Document : amenity2.xsl
Created on : February 4, 2006
Author : Dr. Rick Watson
-->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="amList" match="amenity" use="amenityID"/>
<xsl:output method="html"/>
<xsl:template match="/">
<html>
<head>
<title>Hotel Guide</title>
</head>
<body>
<h2>Hotels</h2>
<xsl:apply-templates select="hotelGuide"/>
</body>
</html>
</xsl:template>
<xsl:template match="hotelGuide">
<xsl:for-each select="hotel">
<xsl:value-of select="hotelName"/>
<br/>
<xsl:for-each select="amenities">
<xsl:value-of select="key('amList',amenityIDREF)/amenityType"/>
<xsl:text> </xsl:text>
<xsl:value-of select="amenityOpenHour"/> -
<xsl:value-of select="amenityCloseHour"/>
<BR/>
</xsl:for-each>
<br/>
<br/>
</xsl:for-each>
<br/>
</xsl:template>
</xsl:stylesheet>
Expedia.de: XML and affiliate marketing
[edit | edit source]
Expedia.de is the German subsidiary of expedia.com, the internet-based travel agency headquartered in Bellevue, Washington, USA. It offers its customers the booking of airline tickets, car rentals, vacation packages and various other attractions and services via its website and by phone. Its websites attract more than 70 million visitors each month. Currently expedia.com employs 4.600 employees serving customers in the United States, Canada, the UK, France, Germany, Italy, and Australia. For marketing purposes expedia.de set up an affiliate marketing program. Affiliate marketing is a way to reach potential customers without any financial risk for the company intending to advertise (merchant). The merchant gives website owners, which are called affiliates, the opportunity to refer to the merchant page, offering commission-based monetary rewards as incentives. In the case of Expedia.de the affiliate partners receive a commission every time users from their websites book travel on expedia.de. So the affiliates can concentrate on selling and the merchant takes care of handling the transactions. To ease the business of the affiliate partners – and of course to make the program more attractive – Expedia.de offers its partners a service called xmlAdEd. xmlAdEd is a service providing current product information on using XML. Affiliates using this service are able to request more than 8 million of travel offerings in XML format via HTTP-request. The data is updated several times a day. In the HTTP-request you can set certain parameters such as location, price, airport code, ... The use of XML in this case gives the affiliates several advantages:
By providing their affiliates product information in XML, expedia.de not only eases the business of their partners, but also ensures that customers receive consistent, up-to-date information on their services. |
Summary
[edit | edit source]When describing a many-to-many relationship in XML, there are a few solutions available for designers to use. In choosing how to represent the many-to-many relationship, the designer not only must consider the most efficient way to represent the information, but also the audience for which the document is intended and how the document will be used. |
References
[edit | edit source]http://www-128.ibm.com/developerworks/xml/library/x-xdm2m.html
Recursive relationships
![]() |
XML - Managing Data Exchange
|
Related Topics
|
Get Involved
|
Previous Chapter | Next Chapter |
← The many-to-many relationship | Data schemas → |
Learning objectives
|
Introduction
[edit | edit source]Recursive relationships are an interesting and more complex concept than the relationships you have seen in the previous chapters. A recursive relationship occurs when there is a relationship between an entity and itself. For example, a one-to-many recursive relationship occurs when an employee is the manager of other employees. The employee entity is related to itself, and there is a one-to-many relationship between one employee (the manager) and many other employees (the people who report to the manager). Because of the more complex nature of these relationships, we will need slightly more complex methods of mapping them to a schema and displaying them in a style sheet.
The one-to-one recursive relationship
[edit | edit source]Continuing with the tour guide model, we will develop a schema that shows cities that have hosted the Olympics and the previous host city. Since the previous host is another city and only one city can be the previous host this is a one to one recursive relationship.
host.xsd (XML schema for a one-to-one recursive model)
[edit | edit source]<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
attributeFormDefault="unqualified">
<xsd:element name="cities">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="city" type="cityType" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="cityType">
<xsd:sequence>
<xsd:element name="cityID" type="xsd:ID"/>
<xsd:element name="cityName" type="xsd:string"/>
<xsd:element name="cityCountry" type="xsd:string"/>
<xsd:element name="cityPop" type="xsd:integer"/>
<xsd:element name="cityHostYr" type="xsd:integer"/>
<xsd:element name="cityPreviousHost" type="xsd:IDREF" minOccurs="0" maxOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
Exhibit 1: XML schema for Host City Entity
host.xml (XML document for a one-to-one recursive model)
[edit | edit source]<?xml version="1.0" encoding="UTF-8"?>
<cities xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:noNamespaceSchemaLocation='host.xsd'>
<city>
<cityID>c1</cityID>
<cityName>Atlanta</cityName>
<cityCountry>USA</cityCountry>
<cityPop>4000000</cityPop>
<cityHostYr>1996</cityHostYr>
</city>
<city>
<cityID>c2</cityID>
<cityName>Sydney</cityName>
<cityCountry>Australia</cityCountry>
<cityPop>4000000</cityPop>
<cityHostYr>2000</cityHostYr>
<cityPreviousHost>c1</cityPreviousHost>
</city>
<city>
<cityID>c3</cityID>
<cityName>Athens</cityName>
<cityCountry>Greece</cityCountry>
<cityPop>3500000</cityPop>
<cityHostYr>2004</cityHostYr>
<cityPreviousHost>c2</cityPreviousHost>
</city>
</cities>
Exhibit 2: XML Document for Olympic Host City
The one-to-many recursive relationship
[edit | edit source]A hypothetical sports team is divided into squads with each squad having a captain. Every person on the team is a player, regardless of whether they are a squad captain. Since a squad captain is a player, this situation meets the definition of a recursive relationship—a squad captain is also a player and has a one-to-many relationship with the other players. This is a one-to-many recursive relationship because one captain has many players under him/her. See the example below for how to model the relationship.
team.xsd (XML schema for a one-to-many recursive model)
[edit | edit source]<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
<xsd:element name="team">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="player" type="playerType" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="playerType">
<xsd:sequence>
<xsd:element name="playerID" type="xsd:ID"/>
<xsd:element name="playerName" type="xsd:string"/>
<xsd:element name="playerCap" type="playerC" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="playerC">
<xsd:sequence>
<xsd:element name="memberOf" type="xsd:IDREF"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
Exhibit 3: XML schema for Team Entity
team.xml (XML document for a one-to-many recursive model)
[edit | edit source]<?xml version="1.0" encoding="UTF-8"?>
<team xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:noNamespaceSchemaLocation='Recursive1toMSchema.xsd'>
<player>
<playerID>c1</playerID>
<playerName>Tommy Jones</playerName>
<playerCap>
<memberof>c3</memberof>
</playerCap>
</player>
<player>
<playerID>c2</playerID>
<playerName>Eddie Thomas</playerName>
<playerCap>
<memberof>c3</memberof>
</playerCap>
</player>
<player>
<playerID>c3</playerID>
<playerName>Sean McCombs</playerName>
</player>
<player>
<playerID>c4</playerID>
<playerName>Patrick O’Shea</playerName>
<playerCap>
<memberof>c3</memberof>
</playerCap>
</player>
</team>
Exhibit 4: XML Document for Team Entity
Natural one-to-many recursive structure
[edit | edit source]A more natural approach for most one-to-many recursive relationships is to use XML's hierarchical nature to directly represent the heirarchy. Consider Locations:
<?xml version="1.0" encoding="UTF-8"?>
<location type="country">
<name>USA</name>
<sub-locations>
<location type="state">
<name>Ohio</name>
<sub-locations>
<location type="city"><name>Akron</name></location>
<location type="city"><name>Columbus</name></location>
</sub-location>
</location>
</sub-locations>
</location>
The many-to-many recursive relationship
[edit | edit source]Think you're getting a feel for recursive relationships yet? Well, there is still the third and final relationship to add to your repertoire — the many-to-many recursive. A common example of a many-to-many recursive relationship is when one item can be comprised of many items of the same data type as itself, and each of those sub-items may belong to another parent item of the same data type. Sound confusing? Let's look at the example of a product that can consist of a single item or multiple items (i.e., a packaged product). The example below describes tourist products that can be packaged together to create a new product.
product.xsd (XML schema for a many-to-many recursive model)
[edit | edit source]<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
<xsd:element name="products">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="product" type="prodType" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="prodType">
<xsd:sequence>
<xsd:element name="prodID" type="xsd:ID"/>
<xsd:element name="prodName" type="xsd:string"/>
<xsd:element name="prodCost" type="xsd:decimal" minOccurs="0"/>
<xsd:element name="prodPrice" type="xsd:decimal"/>
<xsd:element name="components" type="componentsType" minOccurs="0" maxOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="componentsType">
<xsd:sequence>
<xsd:element name="component" type="xsd:IDREF"/>
<xsd:element name="componentqty" type="xsd:integer"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
Exhibit 5: XML schema for Product Entity
product.xml (XML document for a many-to-many recursive model)
[edit | edit source]<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="product.xsl"?>
<products xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="product.xsd">
<product>
<prodID>p1000</prodID>
<prodName>Animal photography kit</prodName>
<prodPrice>725</prodPrice>
<components>
<component>p101</component>
<componentqty>1</componentqty>
</components>
</product>
<product>
<prodID>p101</prodID>
<prodName>Camera case</prodName>
<prodCost>150</prodCost>
<prodPrice>300</prodPrice>
</product>
</products>
Exhibit 6: XML Document for Product Entity
Summary
[edit | edit source]When the child has the same type of data as its parent in a parent-child type data relationship, this is a sign of the existence of a recursive relationship. The xsd:ID and xsd:IDREF elements can be used in a schema to create primary key-foreign key values in an XML document.
External Links
Data schemas
![]() |
XML - Managing Data Exchange
|
Related Topics
|
Get Involved
|
Previous Chapter | Next Chapter |
← Recursive relationships | DTD → |
Learning objectives
|
Initiated by:
The University of Georgia
|
Introduction
[edit | edit source]Data schemas are the foundation of all XML pages. They define objects, their relationships, their attributes, and the structure of the data model. Without them, XML documents would not exist. In this chapter, you will come to understand the purpose of XML data schemas, their intricate parts, and how to utilize them. Also, examples will be included for you to copy when creating your own data schema, making your job a lot easier. At the bottom of this Web page a whole Schema has been included, from which parts have been included in the different sections throughout this chapter. Refer to it if you would like to see how the whole Schema works as one.
Overview of Data Schemas
[edit | edit source]The data schema, all technicalities aside, is the data model with which all the XML information is conveyed. It has a hierarchy structure starting with a root element (to be explained later) and goes all the way down to cover even the most minute detail of the model with detailed steps in between. Data schemas have two main parts, the entities and their relationships. The entities contained in a data schema represent objects from the model. They have unique identifiers, attributes, and names for what kind of object they are. The relationships in the schema represent the relationships between the objects, simple enough. Relationships can be one to one, one to many, many to many, recursive, and any other kind you could find in a data model. Now we will begin to create our own data schema.
Starting your schema the right way
[edit | edit source]All schemas begin the same way, no matter what type of objects they represent. The first line in every Schema is this declaration:
<?xml version="1.0" encoding="UTF-8"?>
Exhibit 1: XML Declaration
Exhibit 1 simply tells the browser or whatever file/program accessing this schema that it is an XML file and uses the encoding structure "UTF-8". You can copy this to use to start your own XML file. Next comes the Namespace declaration:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
Exhibit 2: Namespace Declaration
Namespaces are basically dictionaries containing definitions of most of the coding in the schema. For example, when creating a schema, if you declare an object to be of type "String", the definition of the type "String" is contained in the Namespace along with all of its attributes. This is true for most of the code you write. If you have made or seen other schemas, most of the code is prefaced by "xsd:". A good example is something like "xsd:sequence" or "xsd:complexType". sequence and complexType are both objects defined in the Namespace that has been linked to the prefix "xsd". In fact, you could theoretically name the default Namespace anything, as long as you referenced it the same way throughout the Schema. The most common Namespace which contains most of the XML objects is http://www.w3.org/2001/XMLSchema. Now onto Exhibit 2.
The first part lets any file/program know that this file is a schema. Pretty easy to understand. Like the XML declaration, this is universal to XML schemas and you can use it in yours. The second part is the actual Namespace declaration; xmlns stands for XML NameSpace. This defines the Schema's default Namespace and is usually the one given in the code. Again, I would recommend using this code to start your Schemas. The last part is difficult to understand, but here is a pretty detailed explanation. Using "unqualified" is most applicable until you get to some really complicated code.
Entities in general
[edit | edit source]Entities are basically the objects a Schema is created to represent. As stated before, they have attributes and relationships. We will now go much further into explaining exactly what they are and how to write code for them.
There are two types of Entities: simpleType and complexType. A simpleType object has one value associated with it. A string is a perfect example of a simpleType object as it only contains the value of the string. Most simpleTypes used will be defined in the default Namespace; however, you can define your own simpleType at the bottom of the Schema (this will be brought up in the restrictions section). Because of this, the only objects you will most often need to include in your Schema are complexTypes. A complexType is an object with more than one attribute associated with it, and it may or may not have a child elements attached to it. Here is an example of a complexType object:
<xsd:complexType name="GenreType">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="description" type="xsd:string"/>
<xsd:element name="movie" type="MovieType" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
Exhibit 3: The complexType Element
This code begins with the declaration of a complexType and its name. When other entities refer to it, such as a parent element, it will refer to this name. The 2nd line begins the sequence of attributes and child elements, which are all declared as an "element". The elements are declared as elements with the 1st part of the line of code, and their name to which other documents will refer is included as the "name" as the 2nd part. After the first two declarations comes the "type" declaration. Note that for the name and description elements their type is "xsd:string" showing that the type string is defined in the Namespace "xsd". For the movie element, the type is "MovieType", and because there is no Namespace before "MovieType", it is assumed that this type is included in this Schema. (it could refer to a type defined in another Schema if the other Schema was included at the top of the Schema. don't worry about that now) "minOccurs" and "maxOccurs" represents the relationship between Genre's and MovieTypes. "minOccurs" can be either 0 or an arbitrary number, depending only on the data model. "maxOccurs" can be either 1 (a one to one relationship), an arbitrary number (a one to many relationship), or "unbounded" (a one to many relationship).
For each schema, there must be one root element. This entity contains every other entity underneath it in the hierarchy. For instance, when creating a schema to include a list of movies, the root element would be something like MovieDatabase, or maybe MovieCollection, just something that would logically contain all the other objects (like genre, movie, actor, director, plotline, etc.) It is always started with this line of code: <xsd:element name="xxx">
showing that it is the root element and then goes on as a normal complexType. All other objects will begin with either simpleType or complexType. Here is sample code for a MovieDatabase root element:
<xsd:element name="MovieDatabase">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Genre" type="GenreType" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Exhibit 4: The Root Element
This represents a MovieDatabase where the child element of MovieDatabase is a Genre. From there it goes onto movie, etc. We will continue to use this example help you better understand.
The Parent / Child Relationship
[edit | edit source]The Parent / Child Relationship is a key topic in Data Schemas. It represents the basic structure of the data model's hierarchy by clearly laying out the top down configuration. Look at this piece of code which shows how movies have actors associated with them:
<xsd:complexType name="MovieType">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="actor" type="ActorType" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ActorType">
<xsd:sequence>
<xsd:element name="lname" type="xsd:string"/>
<xsd:element name="fname" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
Exhibit 5: The Parent/Child Relationship
Within each MovieType, there is an element named "actor" which is of "ActorType". When the XML document is populated with information, the surrounding tags for actor will be <actor></actor>
and not <ActorType></ActorType>
. To keep your Schema flowing smoothly and without error, the type field in the Parent Element will always equal the name field in the declaration of the complexType Child Element.
Attributes and Restrictions
[edit | edit source]An attribute of an entity is a simpleType object in that it only contains one value. <xsd:element name="lname" type="xsd:string"/>
is a good example of an attribute. It is declared as an element, has a name associated with it, and has a type declaration. Located in the appendix of this chapter is a long list of simpleTypes built into the default Namespace. Attributes are incredibly simple to use, until you try and restrict them.
In some cases, certain data must abide by a standard to maintain data integrity. An example of this would be a Social Security number or an email address. If you have a database of email addresses that sends mass emails to, you would need all of them to be valid addresses, or else you'd get tons of error messages each time you send out that mass email. To avoid this problem, you can essentially take a known simpleType and add a restriction to it to better suit your needs. Now you can do this two ways, but one is simpler and better to use in Data Schemas. You could edit the simpleType within its declaration in the Parent Element, but it gets messy, and if another Schema wants to use it, the code must be written again. The better way to do it is to list a new type at the bottom of the Schema that edits a previously known simpleType. Here is an example of this with a Social Security number:
<xsd:simpleType name="emailaddressType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="[^@]+@[^\.]+\..+"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="ssnType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{3}-\d{2}-\d{4}"/>
</xsd:restriction>
</xsd:simpleType>
Exhibit 6: Restriction on a simpleType
This was included in the Schema below the last Child Element and before the closing </xsd:schema>
. The first line declares the simpleType and gives it a name, "ssnType". You could name yours anything you want, as long as you reference it correctly throughout the Schema. By doing this, you can use this type anywhere in the Schema, or anywhere in another Schema, provided the references are correct. The second line lets the Schema know it is a restricted type and its base is a string defined in the default Namespace. Basically, this type is a string with a restriction on it, and the third line is the actual restriction. It can be one of many types of restrictions, which are listed in the Appendix of this chapter. This one happens to be of type "pattern". A "pattern" means that only a certain sequence of characters will be allowed in the XML document and is defined in the value field. This particular one means three digits, a hyphen, two digits, a hyphen, and four digits. To learn more about how to use restrictions, follow this link to the W3 school's section on restrictions.
Not of little import: Introducing the <xsd:import>
tag
[edit | edit source]The <xsd:import>
tag is used to import a schema document and the namespace associated with the data types defined within the schema document. This allows an XML schema document to reference a type library using namespace names (prefixes). Let's take a closer look at a simple XML instance document for a store that uses these multiple namespace names:
<?xml version="1.0" encoding="UTF-8"?> <store:SimpleStore xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opentourism.org/xmltext/SimpleStore.xsd" xmlns:store="http://www.opentourism.org/xmltext/Store" xmlns:MGR="http://www.opentourism.org/xmltext/CoreSchema"> <!-- Note the explicitly defined namespace declarations, the prefix store represents data types defined in the <code>http://www.opentourism.org/xmltext/Store.xml</code> namespace and the prefix MGR represents data types defined in the <code>http://www.opentourism.org/xmltext/CoreSchema</code> namespace. Also, notice that there is no default namespace declaration – every element and attribute must be associated with a namespace (we will see this is necessary weh we examine the schema document) --> <store:Store> <MGR:Name xmlns:MGR=" http://www.opentourism.org/xmltext/CoreSchema "> <MGR:FirstName>Michael</MGR:FirstName> <MGR:MiddleNames>Jay</MGR:MiddleNames> <MGR:LastName>Fox</MGR:LastName> </MGR:Name> <store:StoreName>The Gap</store:StoreName> <store:StoreAddress> <store:Street>86 Nowhere Ave.</store:Street> <store:City>Los Angeles</store:City> <store:State>CA</store:State> <store:ZipCode>75309</store:ZipCode> </store:StoreAddress> <!-- More store information would go here. --> </store:Store> <!-- More stores would go here. --> </store:SimpleStore>
Exhibit 7 XML Instance Document – [1]
Let's look at the schema document and see how the <xsd:import>
tag was used to import data types from a type library (external schema document).
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.opentourism.org/xmltext/Store.xml" xmlns:MGR="http://www.opentourism.org/xmltext/CoreSchema" targetNamespace="http://www.opentourism.org/xmltext/Store.xml" elementFormDefault="qualified"> <!-- The prefix MGR is bound to the following namespace name: <code>http://www.opentourism.org/xmltext/CoreSchema</code> The managerTypeLib.xsd schema document is imported by associating the schema with the <code>http://www.opentourism.org/xmltext/CoreSchema</code> namespace name, which was bound to the MGR prefix. The elementFormDefault attribute has the value ‘qualified' indicating that an XML instance document must use qualified names for every element(default namespace can not be used) --> <!-- The target namespace and default namespace are the same --> <xsd:import namespace="http://www.opentourism.org/xmltext/CoreSchema" schemaLocation="ManagerTypeLib.xsd"/> <xsd:element name="SimpleStore"> <xsd:complexType> <xsd:sequence> <xsd:element name="Store" type="StoreType" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:complexType name="StoreType"> <xsd:sequence> <xsd:element ref="MGR:Name"/> <xsd:element name="StoreName" type="xsd:string"/> <xsd:element name="StoreAddress" type="StoreAddressType"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="StoreAddressType"> <xsd:sequence> <xsd:element name="Street" type="xsd:string"/> <xsd:element name="City" type="xsd:string"/> <xsd:element name="State" type="xsd:string"/> <xsd:element name="ZipCode" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema>
Exhibit 8: XML Schema [http://www.opentourism.org/xmltext/SimpleStore.xsd
Like the include tag and the redefine tag, the import tag is another means of incorporating any data types from an external schema document into another schema document and must occur before any element or attribute declarations. These mechanisms are important when XML schemas are modularized and type libraries are being maintained and used in multiple schema documents.
When the whole is greater than the sum of its parts:
Schema Modularization
[edit | edit source]Now that we have covered all three methods of incorporating external XML schemas, let’s consider the importance of these mechanisms. As is typical with most programming code, redundancy is frowned upon; this is true for custom data type definitions as well. If a custom data type already exists that can be applied to an element in your schema document, does it not make sense to use this data type rather than create it again within your new schema document? Moreover, if you know that a single data type can be reused for several applications, should you not have a method for referencing that data type when you need it?
The idea behind modular schemas is to examine what your schema does, determine what data types are frequently used in one form or another and develop a type library. As your needs for more complex schemas increase you can continue to add to your library, reuse data types in your type library, and redefine those data types as needed. An example of this reuse would be a schema for customer information – different departments would use different schemas as they would need only partial customer information. However most, if not all, departments would need some specific customer information, like name and contact information, which could be incorporated in the individual departmental schema documents.
Schema modularization is a “best practice”. By maintaining a type library and reusing and redefining types in the type library, you can help ensure that your XML schema documents don't become overwhelming and difficult to read. Readability is important, because you may not be the only one using these schemas, and it is important that others can easily understand your schema documents.
“Choose, but choose wisely…”: Schema alternatives
[edit | edit source]Thus far in this book we have only discussed XML schemas as defined by the World Wide Web Consortium (W3C). Yet there are other methods of defining the data contained within an XML instanced document, but we will only mention the two most popular and well known alternatives: Document Type Definition (DTD) and Relax NG Schema.
We will cover DTDs in the next chapter. Relax NG schema is a newer and has many of the same features that W3C XML schema have; Relax NG also claims to be simpler, and easier to learn, but this is very subjective. For more about Relax NG, visit: http://www.relaxng.org/
Appendix
[edit | edit source]First is the full Schema used in the examples throughout this chapter:
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="unqualified">
<xsd:element name="MovieDatabase">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Genre" type="GenreType" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="GenreType">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="description" type="xsd:string"/>
<xsd:element name="movie" type="MovieType" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="MovieType">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="rating" type="xsd:string"/>
<xsd:element name="director" type="xsd:string"/>
<xsd:element name="writer" type="xsd:string"/>
<xsd:element name="year" type="xsd:int"/>
<xsd:element name="tagline" type="xsd:string"/>
<xsd:element name="actor" type="ActorType" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ActorType">
<xsd:sequence>
<xsd:element name="lname" type="xsd:string"/>
<xsd:element name="fname" type="xsd:string"/>
<xsd:element name="gender" type="xsd:string"/>
<xsd:element name="bday" type="xsd:string"/>
<xsd:element name="birthplace" type="xsd:string"/>
<xsd:element name="ssn" type="ssnType"/>
</xsd:sequence>
</xsd:complexType>
<xsd:simpleType name="ssnType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{3}-\d{2}-\d{4}"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
It’s time to go back to the beginning…and review all of the schema data types, elements, and attributes that we have covered thus far (and maybe a few that we have not). The following tables will detail the XML data types, elements and attributes that can be used in an XML Schema.
Primitive Types
This is a table with all the primitive types the attributes in your schema can be.
Type | Syntax | Legal value example | Constraining facets |
xsd:anyURI | <xsd:element name = “url” type = “xsd:anyURI” /> | http://www.w3.com | length, minLength, maxLength, pattern, enumeration, whitespace |
xsd:boolean | <xsd:element name = “hasChildren” type = “xsd:boolean” /> | true or false or 1 or 0 | pattern and whitespace |
xsd:byte | <xsd:element name = “stdDev” type = “xsd:byte” /> | -128 through 127 | length, minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, whitespace, and totalDigits |
xsd:date | <xsd:element name = “dateEst” type = “xsd:date” /> | 2004-03-15 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace |
xsd:dateTime | <xsd:element name = “xMas” type = “xsd:dateTime” /> | 2003-12-25T08:30:00 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace |
xsd:decimal | <xsd:element name = “pi” type = “xsd:decimal” /> | 3.1415292 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, whitespace, fractionDigits, and totalDigits |
xsd:double | <xsd:element name = “pi” type = “xsd:double” /> | 3.1415292 or INF or NaN | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace |
xsd:duration | <xsd:element name = “MITDuration” type = “xsd:duration” /> | P8M3DT7H33M2S | |
xsd:float | <xsd:element name = “pi” type = “xsd:float” /> | 3.1415292 or INF or NaN | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace |
xsd:gDay | <xsd:element name = “dayOfMonth” type = “xsd:gDay” /> | ---11 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace |
xsd:gMonth | <xsd:element name = “monthOfYear” type = “xsd:gMonth” /> | --02-- | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace |
xsd:gMonthDay | <xsd:element name = “valentine” type = “xsd:gMonthDay” /> | --02-14 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace |
xsd:gYear | <xsd:element name = “year” type = “xsd:gYear” /> | 1999 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace |
xsd:gYearMonth | <xsd:element name = “birthday” type = “xsd:gYearMonth” /> | 1972-08 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace |
xsd:ID | <xsd:attribute name="id" type="xsd:ID"/> | id-102 | length, minLength, maxLength, pattern, enumeration, and whitespace |
xsd:IDREF | <xsd:attribute name="version" type="xsd:IDREF"/> | id-102 | length, minLength, maxLength, pattern, enumeration, and whitespace |
xsd:IDREFS | <xsd:attribute name="versionList" type="xsd:IDREFS"/> | id-102 id-103 id-100 | length, minLength, maxLength, pattern, enumeration, and whitespace |
xsd:int | <xsd:element name = “age” type = “xsd:int” /> | 77 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, whitespace, and totalDigits |
xsd:integer | <xsd:element name = “age” type = “xsd:integer” /> | 77 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace |
xsd:long | <xsd:element name = “cannelNumber” type = “xsd:int” /> | 214 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace |
xsd:negativeInteger | <xsd:element name = “belowZero” type = “xsd:negativeInteger” /> | -123 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, whitespace, and totalDigits |
xsd:nonNegativeInteger | <xsd:element name = “numOfchildren” type = “xsd:nonNegativeInteger” /> | 2 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, whitespace, and totalDigits |
xsd:nonPositiveInteger | <xsd:element name = “debit” type = “xsd:nonPositiveInteger” /> | 0 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, whitespace, and totalDigits |
xsd:positiveInteger | <xsd:element name = “credit” type = “xsd:positiveInteger” /> | 500 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, whitespace, and totalDigits |
xsd:short | <xsd:element name = “numOfpages” type = “xsd:short” /> | 476 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, whitespace, and totalDigits |
xsd:string | <xsd:element name = “name” type = “xsd:string” /> | Joeseph | length, minLength, maxLength, pattern, enumeration, whitespace, and totalDigits |
xsd:time | <xsd:element name = “credit” type = “xsd:time” /> | 13:02:00 | minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace, |
Schema Elements
( from http://www.w3schools.com/schema/schema_elements_ref.asp )
Here is a list of all the elements which can be included in your schemas.
Element | Explanation |
all | Specifies that the child elements can appear in any order. Each child element can occur 0 or 1 time |
annotation | Specifies the top-level element for schema comments |
any | Enables the author to extend the XML document with elements not specified by the schema |
anyAttribute | Enables the author to extend the XML document with attributes not specified by the schema |
appInfo | Specifies information to be used by the application (must go inside annotation) |
attribute | Defines an attribute |
attributeGroup | Defines an attribute group to be used in complex type definitions |
choice | Allows only one of the elements contained in the <choice> declaration to be present within the containing element |
complexContent | Defines extensions or restrictions on a complex type that contains mixed content or elements only |
complexType | Defines a complex type element |
documentation | Defines text comments in a schema (must go inside annotation) |
element | Defines an element |
extension | Extends an existing simpleType or complexType element |
field | Specifies an XPath expression that specifies the value used to define an identity constraint |
group | Defines a group of elements to be used in complex type definitions |
import | Adds multiple schemas with different target namespace to a document |
include | Adds multiple schemas with the same target namespace to a document |
key | Specifies an attribute or element value as a key (unique, non-nullable, and always present) within the containing element in an instance document |
keyref | Specifies that an attribute or element value correspond to those of the specified key or unique element |
list | Defines a simple type element as a list of values |
notation | Describes the format of non-XML data within an XML document |
redefine | Redefines simple and complex types, groups, and attribute groups from an external schema |
restriction | Defines restrictions on a simpleType, simpleContent, or a complexContent |
schema | Defines the root element of a schema |
selector | Specifies an XPath expression that selects a set of elements for an identity constraint |
sequence | Specifies that the child elements must appear in a sequence. Each child element can occur from 0 to any number of times |
simpleContent | Contains extensions or restrictions on a text-only complex type or on a simple type as content and contains no elements |
simpleType | Defines a simple type and specifies the constraints and information about the values of attributes or text-only elements |
union | Defines a simple type as a collection (union) of values from specified simple data types |
unique | Defines that an element or an attribute value must be unique within the scope |
Schema Restrictions and Facets for data types
( from http://www.w3schools.com/schema/schema_elements_ref.asp )
Here is a list of all the types of restrictions which can be included in your schema.
Constraint | Description |
enumeration | Defines a list of acceptable values |
fractionDigits | Specifies the maximum number of decimal places allowed. Must be equal to or greater than zero |
length | Specifies the exact number of characters or list items allowed. Must be equal to or greater than zero |
maxExclusive | Specifies the upper bounds for numeric values (the value must be less than this value) |
maxInclusive | Specifies the upper bounds for numeric values (the value must be less than or equal to this value) |
maxLength | Specifies the maximum number of characters or list items allowed. Must be equal to or greater than zero |
minExclusive | Specifies the lower bounds for numeric values (the value must be greater than this value) |
minInclusive | Specifies the lower bounds for numeric values (the value must be greater than or equal to this value) |
minLength | Specifies the minimum number of characters or list items allowed. Must be equal to or greater than zero |
pattern | Defines the exact sequence of characters that are acceptable |
totalDigits | Specifies the exact number of digits allowed. Must be greater than zero |
whiteSpace | Specifies how white space (line feeds, tabs, spaces, and carriage returns) are handled |
Regex
Special regular expression (regex) language can be used to construct a pattern. The regex language in XML Schema is based on Perl's regular expression language. The following are some common notations:
. (the period | for any character at all |
\d | for any digit |
\D | for any non-digit |
\w | for any word (alphanumeric) character |
\W | for any non-word character (i.e. -, +, =) |
\s | for any white space (including space, tab, newline, and return) |
\S | for any character that is not white space |
x* | to have zero or more x's |
(xy)* | to have zero or more xy's |
x+ | repetition of the x, at least once |
x? | to have one or zero x's |
(xy)? | To have one or no xy's |
[abc] | to include one of a group of values |
[0-9] | to include the range of values from 0 to 9 |
x{5} | to have exactly 5 x's (in a row) |
x{5,} | to have at least 5 x's (in a row) |
x{5,8} | at least 5 but at most 8 x's (in a row) |
(xyz){2} | to have exactly 2 xyz's (in a row) |
For example, the pattern for validating a Social Security Number is \d{3}-\d{2}-\d{4}
The schema code for emailAddressType is \w+\W*\w*@{1}\w+\W*\w+.\w+.*\w* | ||
[w+] | at least one word (alphanumeric) character, | e. g. answer |
[W*] | followed by none, one or many non-word character(s), | e. g. - |
[w*@{1}] | followed by any (or none) word character and one at-sign, | e. g. my@ |
[w+] | followed by at least one word character, | e. g. mail |
[W*] | followed by none, one or many non-word character(s), | e. g. _ |
[w+.] | followed by at least one word character and period, | e. g. please. |
[w+.*] | zero to infinite times followed by the previous string, | e. g. opentourism. |
[w*] | finally followed by none, one or many word character(s) | e. g. org |
email-address: answer-my@mail_please.opentourism.org |
Instance Document Attributes
These attributes do NOT need to be declared within the schemas
Attribute | Explanation | Example |
xsi:nil | Indicates that a certain element does not have a value or that the value is unknown. The element must be set to nillable inside the schema document: <xsd:element name=”last_name” type=”xsd:string” nillable=true”/> |
<full_name xmlns:xsi= ”http://www.w3.org/2001/XMLSchema-instance”> <first_name>Madonna</first_name> <last_name xsi:nil=”true”/> </full_name> |
xsi:noNamespaceSchemaLocation | Locates the schema for elements that are not in any namespace | <radio xsi:noNamespaceSchemaLocation= ”http://www.opentourism.org/xmtext/radio.xsd”>
<!—radio stuff goes here -- > </radio> |
xsi:schemaLocation | Locates schemas for elements and attributes that are in a specified namespace | <radio xmlns= ”http://www.opentourism.org/xmtext/NS/radio xmlns:xsi= ”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation= ”http://www.arches.uga.eduNS/radio” ”http://www.opentourism.org/xmtext/radio.xsd”>
<!—radio stuff goes here -- > </radio> |
xsi:type | Can be used in instance documents to indicate the type of an element. | <height xsi:type=”xsd:decimal”>78.9</height> |
For more information on XML Schema structures, data types, and tools you can visit http://www.w3.org/XML/Schema.
DTD
![]() |
XML - Managing Data Exchange
|
Related Topics
|
Get Involved
|
Previous Chapter | Next Chapter |
← Data schemas | XHTML → |
![]() |
This page or section is an undeveloped draft or outline. You can help to develop the work, or you can ask for assistance in the project room. |
A Document Type Definition is a file that links to an XML page. It controls what must or can be displayed, what attributes and their values must/can have and how the XML file should look like. XHTML, HTML and other markup languages use DTDs to validate their documents. Note: Web browsers accept bad markup in HTML.
Uses OF DTDs
[edit | edit source]DTDs are used to store large amounts of data in a custom markup language that can be used for a specific program or organization. Like schemas they can have elements, attributes and entities. The only difference is how it is displayed.
Prologue
[edit | edit source]Like in a schema, a DTD has a prolog. It is one line of text.
<?xml version="1.0" encoding="UTF-8"?>
The question mark is to tell the computer you are giving him an instruction. The word xml tells him that you are using XML, the version attribute tells what version of XML you are using and the encoding attribute tells him how to encode the data (you would use a different encoding if you wanted to use chinese text).
<!ELEMENT> tag
[edit | edit source]The element tag is used to display an element of the page, depending on how you declare it. It can go only on a specific part of the page or anywhere on the page.
The first element you declare is the root element (in HTML it's html). Let's pretend that there was an organization that wanted a bunch of XML files containing info about each person. They probably would have a root element of the file named "person". The standard for declaring an element with children elements is
<!ELEMENT elementName (childElement, childElement2, childElement3)>
So the orginization root element tag declaration would be
<!ELEMENT person (firstName, lastName, postalCode, cellNumber, homeNumber, email)>
Note: A child element must be declared in a separate element tag to be valid.
Note: The comma is used where you identify the child element is an occurrence indicator (something that tells the computer how it should occur). There are other occurrence indicators. We will cover them later in this chapter.
Note: The parentheses define what content type is found in the bracket. Different content types are found later in this chapter.
Some elements you don't want to be linked to specific tags (like a formatting tag you want to use to highlight important info), you do the same thing except you don't use it as a child element for any element depending on your needs, you may use the ANY content type, which allows you to use character data or other tags in your tag, the EMPTY content type, which looks like "<exampleXmlTag />" or #PCDATA for text.
Note:In an element declaration you can combine parentheses with #PCDATA. It looks like this <!ELEMENT elementName ( #PCDATA| childName). The pipe bar means that you can use text or other tags.
XHTML
![]() |
XML - Managing Data Exchange
|
Related Topics
|
Get Involved
|
Previous Chapter | Next Chapter |
← DTD | XPath → |
Learning objectives
|
In previous chapters, we have learned how to generate HTML documents from XML documents and XSL stylesheets. In this chapter, we will learn how to convert those HTML documents into valid XHTML. We will discuss why XHTML has evolved as a standard and when it should be used.
The Evolution of XHTML
[edit | edit source]Originally, Web pages were designed in HTML. Unfortunately most implementations of this markup language allow all sorts of mistakes and bad formatting. Major browsers were designed to be forgiving, and poor code would display with few problems in most cases. This poor code was often not portable between browsers, e.g. a page would render in Netscape but not Internet Explorer or vice versa. The accounting for human error and bad formatting takes an amount of processing power that small handheld devices might not have. Thus when displaying data on handhelds, a tiny mistake can crash the device.
XHTML partially mitigates these problems. The processing burden is reduced by requiring XHTML documents to conform to the much stricter rules defined in XML. Aside from the stricter rules, HTML 4.01 and XHTML 1.0 are functionally equivalent. If a document breaks XML's well-formedness rules, an XHTML-compliant browser must not render the page. If a document is well-formed but invalid, an XHTML-compliant browser may render the page, so a significant number of mistakes still slip through.
In this chapter, we will examine in detail how to create an XHTML document.
The biggest problem with HTML from a design standpoint is that it was never meant to be a graphical design language. The original version of HTML was intended to structure human readable content (e.g. marking a section of text as a paragraph), not to format it (e.g. this paragraph should be displayed in 14pt Arial). HTML has evolved far past its original purpose and is being stretched and manipulated to cover cases that the original HTML designers never imagined.
The recommended solution is to use a separate language to describe the presentation of a group of documents. Cascading Style Sheets (CSS) is a language used for describing presentation. From version 1.1 of XHTML upwards web pages must be formatted using CSS or a language with equivalent capabilites such as XSLT (XSL Transformations). The use of CSS or XSLT is optional in XHTML 1.0 unless the strict variant is used. HTML 4.01 supports CSS but not XSLT.
So What is XHTML?
[edit | edit source]As you might have guessed, XHTML stands for eXtensible HyperText Markup Language. It is a cross between HTML and XML. It fulfills two major purposes that were ignored by HTML:
- XHTML is a stricter standard than HTML. XHTML documents must be well-formed just like regular XML. This reduces vagaries and inconsistency between browsers, because browsers do not have to decide how to display a badly-formed page. Malformed XHTML is not allowed.
Note 1: Browsers only enforce well-formedness if the MIME type is set toapplication/xhtml+xml
. If the MIME type is set totext/html
, the browser will allow badly-formed documents. There are a large number of 'XHTML' documents on the web that are badly-formed and get away with it because their MIME type istext/html
.
Note 2: Browsers are not required to check for validity. See Invalid XHTML below for an example. - XHTML allows for modularization (m12n). For different environments different element and attribute subsets can be defined.
The best thing about XHTML is that it is almost the same as HTML! If you know how to write an HTML document, it will be very simple for you to create an XHTML document without too much trouble. The biggest thing that you must keep in mind is that unlike with HTML, where simple errors like missing a closing tag are ignored by the browser, XHTML code must be written according to an exact specification. We will see later that adhering to these strict specifications actually allows XHTML to be more flexible than HTML.
XHTML Document Structure
[edit | edit source]At a minimum, an XHTML document must contain a DOCTYPE declaration and four elements: html, head, title, and body:
<!DOCTYPE ... >
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="...">
<head>
<title></title>
</head>
<body></body>
</html>
The opening html
tag of an XHTML document must include a namespace declaration for the XHTML namespace.
The DOCTYPE declaration should appear immediately before the html tag in an XHTML document. It can follow one of three formats.
XHTML 1.0 Strict
[edit | edit source]<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
The Strict declaration is the least forgiving. This is the preferred DOCTYPE for new documents. Strict documents tend to be streamlined and clean. All formatting will appear in Cascading Style Sheets rather than the document itself. Elements that should be included in the Cascading Style Sheet and not the document itself include, but are not limited to:
<body text="blue">, <u>nderline</u>, <b>old</b>, <i>talics</i>, and <font color="#9900FF" face="Arial" size="+2">
There are also certain instances where your code needs to be nested within block elements.
Incorrect Example:
<p>I hope that you enjoy</p> your stay.
Correct Example:
<p>I hope that you enjoy your stay.</p>
XHTML 1.0 Transitional
[edit | edit source]<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
This declaration is intended as a halfway house for migrating legacy HTML documents to XHTML 1.0 Strict. The W3C encourages authors to use the Strict DOCTYPE for new documents. (The XHTML 1.0 Transitional DTD refers readers to the relevant note in the HTML4.01 Transitional DTD.)
This DOCTYPE does not require CSS for formatting; although, it is recommended. It generally tolerates inline elements found where block-level elements are expected.
There are a couple of reasons why you might choose this DOCTYPE for new documents.
- You require backwards compatibility with browsers that support the formatting elements of XHTML but do not support CSS. This is a very small fraction of general users (less than 1%). Many browsers that don't support CSS don't support HTML 4.0 or XHTML either. However, it may be useful on a corporate intranet that has a larger than normal fraction of very old (pre-2000) browsers.
- You need to link to frames. Using frames is discouraged as they work badly in many browsers.
XHTML 1.0 Frameset
[edit | edit source]<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
If you are creating a page with frames, this declaration is appropriate. However, since frames are generally discouraged when designing Web pages, this declaration should be used rarely.
XML Prolog
[edit | edit source]Additionally, XHTML authors are encouraged by the W3C to include the following processing instruction as the first line of each document:
<?xml version="1.0" encoding="UTF-8"?>
Although it is recommended by the standard, this processing instruction may cause errors in older Web browsers including Internet Explorer version 6. It is up to the individual author to decide whether to include the prolog.
Language
[edit | edit source]It is good practice to include the optional xml:lang
attribute [2] on the html element to describe the document's primary language. For compatibility with HTML the lang
attribute should also be specified with the same value. For an English language document use:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
The xml:lang
and lang
attributes can also be specified on other elements to indicate changes of language within the document, e.g. a French quotation in an English document.
Converting HTML to XHTML
[edit | edit source]In this section, we will discover how to transform an HTML document into an XHTML document. We will examine each of the following rules:
- Documents must be well-formed
- Tags must be properly nested
- Elements must be closed
- Tags must be lowercase
- Attribute names must be lowercase
- Attribute values must be quoted
- Attributes cannot be minimized
- The name attribute is replaced with the id attribute (in XHTML 1.0 both name and id should be used with the same value to maintain backwards-compatibility).
- Plain ampersands are not allowed
- Scripts and CSS must be escaped(enclose them within the tags <![CDATA[ and ]]>) or preferably moved into external files.
Documents must be well-formed
[edit | edit source]Because XHTML conforms to all XML standards, an XHTML document must be well-formed according to the W3C's recommendations for an XML document. Several of the rules here reemphasize this point. We will consider both incorrect and correct examples.
Tags must be properly nested
[edit | edit source]Browsers widely tolerate badly nested tags in HTML documents.
<b><u>
This text is probably bold and underlined, but inside incorrectly nested tags.
</b></u>
The text above would display as bold and underlined, even though the end tags are not in the proper order. An XHTML page will not display if the tags are improperly nested, because it would not be considered a valid XML document. The problem can be easily fixed.
<b><u>
This text is bold and underlined and inside properly nested tags.
</u></b>
Elements must be closed
[edit | edit source]Again, XHTML documents must be considered valid XML documents. For this reason, all tags must be closed. HTML specifications listed some tags as having "optional" end tags, such as the <p> and <li> tags.
<p>Here is a list:
<ul>
<li>Item 1
<li>Item 2
<li>Item 3
</ul>
In XHTML, the end tags must be included.
<p>Here is a list: </p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
</ul>
What should we do about HTML tags that do not have a closing tag? Some special tags do not require or imply a closing tag.
<img src="titlebar.gif" alt="Title">
<hr>
<br>
<p>Welcome to my web page!</p>
In XHTML, the XML rule of including a closing slash within the tag must be followed.
<img src="titlebar.gif" alt="title" />
<hr />
<br />
<p>Welcome to my Web page!</p>
Note that some of today's browsers will incorrectly render a page if the closing slash does not have a space before it (<br/>). Although it is not part of the official recommendation, you should always include the space (<br />) for compatibility purposes.
Here are the common empty tags in HTML:
- area
- base
- basefont
- br
- hr
- img
- input
- link
- meta
- param
Tags must be lowercase
[edit | edit source]In HTML, tags could be written in either lowercase or uppercase. In fact, some Web authors preferred to write tags in uppercase to make them easier to read. XHTML requires that all tags be lowercase.
<H1>This is an example of bad case.</h1>
This difference is necessary because XML differentiates between cases. XML would read <H1> and <h1> as different tags, causing problems in the above example.
<h1>This is an example of good case.</h1>
The problem can be easily fixed by changing all tags to lowercase.
Attribute names must be lowercase
[edit | edit source]Following the pattern of writing all tags in lowercase, all attribute names must also be in lowercase.
<p CLASS="specialText">Important Notice</p>
The correct tags are easy to create.
<p class="specialText">Important Notice</p>
Attribute values must be quoted
[edit | edit source]Some HTML values do not require quotation marks around them. They are understood by browsers.
<table border=1 width=100%>
</table>
XHTML requires all attributes to be quoted. Even numeric, percentage, and hexadecimal values must appear in quotations for them to be considered part of a proper XHTML document.
<table border="1" width="100%">
</table>
Attributes cannot be minimized
[edit | edit source]HTML allowed some attributes to be written in shorthand, such as selected or noresize.
<form>
<input checked ... />
<input disabled ... />
</form>
When using XHTML, attribute minimization is forbidden. Instead, use the syntax x="x", where x is the attribute that was formerly minimized.
<form>
<input checked="checked" .../>
<input disabled="disabled" .../>
</form>
A complete list of minimized attributes follows:
- checked
- compact
- declare
- defer
- disabled
- ismap
- nohref
- noresize
- noshade
- nowrap
- readonly
- selected
- multiple
The name
attribute is replaced with the id
attribute
[edit | edit source]HTML 4.01 standards define a name attribute for the tags a, applet, frame, iframe, img, and map.
<a name="anchor">
<img src="banner.gif" name="mybanner" />
</a>
XHTML has deprecated the name attribute. Instead, the id attribute is used. However, to ensure backwards compatibility with today's browsers, it is best to use both the name and id attributes.
<a name="anchor" id="anchor" >
<img src="banner.gif" name="mybanner" id="mybanner" />
</a>
As technology advances, it will eventually be unnecessary to use both attributes and XHTML 1.1 removed name altogether.
Ampersands are not supported
[edit | edit source]Ampersands are illegal in XHTML.
<a href="home.aspx?status=done&itWorked=false">Home & Garden</a>
They must instead be replaced with the equivalent character code &.
<a href="home.aspx?status=done&amp;itWorked=false">Home &amp; Garden</a>
Image alt attributes are mandatory
[edit | edit source]Because XHTML is designed to be viewed on different types of devices, some of which are not image-capable, alt attributes must be included for all images.
<img src="titlebar.gif">
Remember that the img tag must include a closing slash in XHTML!
<img src="titlebar.gif" alt="title" />
Scripts and CSS must be escaped
[edit | edit source]Internal scripts and CSS often include characters like the ampersand and less-than characters.
<script language="JavaScript">
<!--
document.write('Hello World!');
//-->
</script>
If you are using internal scripts or CSS, enclose them within the tags <![CDATA[ and ]]>. This will mark them as character data that should not be parsed. If you do not use these tags, characters like & and < will be treated as start-of-character entities (like ) and tags (like <b>) respectively. This will cause your page to behave unpredictably, and it may invalidate your code.
Additionally, the type attribute is mandatory for scripts. The comment tags <!-- and --> that have traditionally been used to hide JavaScript from noncompliant browsers should not be included. The XML standard states that text enclosed in comment tags may be completely excluded from rendered documents, which would lose all script enclosed in the tags.
<script type="text/javascript" language="javascript">
/*<![CDATA[*/
document.write('Hello World!');
/*]]>*/
</script>
Also document.write();
is not permitted in XHTML documents. You must used node creation methods such as document.createElementNS();
instead. Confusingly, document.write();
will appear to work as expected if the document is incorrectly served with a MIME type of text/html
(the type for HTML documents), instead of application/xhtml+xml
(the type for XHTML documents). If the MIME type is text/html
the document will be parsed as HTML which allows document.write();
. Parsing the document as HTML defeats the purpose of writing it in XHTML.
Similar changes must be made for internal stylesheets.
<style>
<!--
.SpecialClass {
color: #000000;
}
-->
</style>
The type attribute must be included, and the CDATA tags should be used.
<style type="text/css">
/*<![CDATA[*/
.SpecialClass {
color: #000000;
}
/*]]>*/
</style>
Because scripts and CSS may complicate an XHTML document, it is strongly recommended that they be placed in external .js and .css files, respectively. They can then be linked to from your XHTML document.
<script src="myscript.js" type="text/javascript" />
<link href="styles.css" type="text/css" rel="stylesheet" />
Some elements may not be nested
[edit | edit source]The W3C recommendations state that certain elements may not be contained within others in an XHTML document, even when no XML rules are violated by the inclusion. Elements affected are listed below.
Element | Cannot contain ... |
---|---|
a | a |
pre | big, img, object, small, sub, sup |
button | button, fieldset, form, iframe, input, isindex, label, select, textarea |
label | label |
form | form |
When to convert
[edit | edit source]By now, it probably sounds as though converting an HTML document into XHTML is easy, but tedious. When would you want to convert your existing pages into XHTML? Before deciding to change your entire Web site, consider these questions.
- Do you want your pages to be easily viewed over a nontraditional Internet-capable device, such as a PDA or Web-enabled telephone? Will this be a goal of your site in the future? XHTML is the language of choice for Web-enabled portable devices. Now may be a good time for you to commit to creating an all-XHTML site.
- Do you plan to work with XML in the future? If so, XHTML may be a logical place to begin. If you head up a team of designers who are accustomed to using HTML, XHTML is a small step away. It may be less intimidating for beginners to learn XHTML than it is to try teaching them all about XML from scratch.
- Is it important that your site be current with the most recent W3C standards? Staying on top of current standards will make your site more stable and help you stay updated in the future, as you will only have to make small changes to upgrade your site to the newest versions of XHTML as they are approved by the W3C.
- Will you need to convert your documents to another format? As a valid XML document, XHTML can utilize XSL to be converted into text, plain HTML, another XHTML document, or another XML document. HTML cannot be used for this purpose.
If you answered yes to any of the above questions, then you should probably convert your Web site to XHTML.
MIME Types
[edit | edit source]XHTML 1.0 documents should be served with a MIME Type of application/xhtml+xml
to Web browsers that can accept this type. XHTML 1.0 may be served with the MIME type text/html
to clients that cannot accept application/xhtml+xml
provided that the XHTML complies with the additional constraints in [Appendix C] of the XHTML 1.0 specification. If you cannot configure your Web server to serve documents as different MIME types, you probably should not convert your Web site to XHTML.
You should check that your XHTML documents are served correctly to browsers that support application/xhtml+xml
, e.g. Mozilla Firefox. Use 'Page Info' to verify that the type is correct.
XHTML 1.1 documents are often not backwards compatible with HTML and should not be served with a MIME type of text/html
.[3]
Help Converting
[edit | edit source]HTML Tidy
[edit | edit source]When creating HTML, it's very easy to make a mistake by leaving out an end tag or not properly nesting tags. HTML Tidy is a wonderful application that can be used to correct a number of errors with poorly formed HTML documents and convert it into XHTML. Tidy can also format ugly code to be more readable, including code generated by WYSIWYG editors. HTML Tidy can't generate clean code when it encounters problems it isn't sure of how to fix. In these cases, it will generate an error to let you know where the mistake is located in your document.
A few examples of problems that HTML Tidy can remedy:
- Missing or mismatched end tags.
- Improperly nested elements.
- Mixed up tags.
- Add a missing "/" to properly close tags.
- Insert missing tags into lists.
- Add missing quotes around attribute values.
- Ability to insert the correct DOCTYPE value based on your code (can also recognize and report proprietary elements).
HTML Tidy can also be customized at runtime using a wide array of command line arguments. It is capable of indenting code to make it more readable as well as replacing FONT, NOBR, and CENTER tags with style tags and rules using CSS. Tidy can also be taught new tags by declaring them in the configuration file.
You can read more about HTML Tidy at the W3C's HTML Tidy site, as well as download the application as a binary or get the source code. There are several sites that offer HTML Tidy as an online service including the W3C and Site Valet.
You can also validate your page using the validator available at http://validator.w3.org/.
When not to convert
[edit | edit source]You shouldn't convert your Web pages if they will always be served with a MIME type of text/html
. Make sure you know how to configure your server or server-side script to perform HTTP content negotiation so that XHTML capable browsers receive XHTML marked as application/xhtml+xml
. If you can't set up content negotiation, stick to HTML 4.01. People viewing your Web pages with mainstream browsers will be unable to tell the difference between a valid HTML 4.01 web page and a valid XHTML 1.0 Web page.
Make sure the automated tests you run on your site simulate connections from both XHTML-compatible browsers, e.g. Mozilla Firefox, and non–XHTML-compatiable browsers, e.g. Internet Explorer 6.0. This is particularly important if you use Javascript on your Web site. If maintaining two copies of your test suite is too time consuming, don't convert.
Bear in mind that valid HTML 4.01 Strict documents generally require less effort to convert to XHTML 1.1 than valid XHTML 1.0 Transitional documents. A valid HTML 4.01 Strict document can only contain elements that are valid in XHTML 1.1, although a few attributes may need changing. XHTML 1.0 Transitional documents on the other hand can contain ten element types and more than a dozen attributes that are not valid in XHTML 1.1. The XHTML 1.0 Transitional body
element alone has six atrributes that are not supported in XHTML 1.1.
Don't be pressured into using XHTML by people talking vaguely about bad practice. Pin them down to what they mean by bad practice. If they start talking about separation of content and presentation, they have confused the differences between HTML and XHTML with the differences between the Transitional and Strict doctypes. Both XHTML 1.0 Transitional and HTML 4.01 Transitional allow you to mix presentation and content in the same document, i.e. they allow this type of bad practice. Both HTML 4.01 Strict and XHTML 1.0 Strict force you to move the bulk of the presentation (but not all of it) in to CSS or an equivalent language. All four doctypes allow you to use embedded stylesheets, whereas, true separation requires that all CSS and Javascript be moved to external files.
XHTML 1.1
[edit | edit source]XHTML 1.0 is a suitable markup language for most purposes. It provides the option to separate content and presentation, which fits the needs of most Web authors. XHTML 1.1 enforces the separation of content and presentation. All deprecated elements and attributes have been removed. It also removes two attributes that were retained in XHTML 1.0 purely for backwards-compatibility. The lang
attribute is replaced by xml:lang
and name
is replaced by id
. Finally it adds support for ruby text found in East Asian documents.
DOCTYPE
[edit | edit source]The DOCTYPE for XHTML 1.1 is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
Modularization
[edit | edit source]The modularization of XHTML, or XHTML m12n, provides suggestions for customizing XHTML, either by integrating subsets of XHTML into other XML applications or extending the XHTML element set. The framework defines two proceses:
- How to group elements and attributes into "modules"
- How to combine modules to create new markup languages
The resulting languages, which the W3C calls "XHTML Host Languages", are based on the familiar XHTML structure but specialized for specific purposes. XHTML 1.1 is an example of a host language. It was created by grouping the different elements available to XHTML.
XHTML variations, while possible in theory, have not been widely adopted. There is continuing work being done to develop host languages, but their details are beyond the scope of this discussion.
Invalid XHTML
[edit | edit source]XHTML-compliant browsers are allowed to render invalid XHTML documents provided that the documents are well-formed. A simple example is given below:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Invalid XHTML</title>
</head>
<body>
<p>This sentence contains a <p>nested paragraph.</p></p>
</body>
</html>
Save the example as invalid.xhtml
(the .xhtml extension is important) and open the page with Mozilla Firefox. The page will render even though it is invalid.
Summary
[edit | edit source]
XHTML stands for eXtensible HyperText Markup Language. XHTML is very similar to HTML, but it is stricter and easier to parse. XHTML documents must be well-formed just like regular XML. XHTML allows for modularization. XHTML code must be written according to an exact specification unlike with HTML, where simple errors like missing a closing tag are ignored by the browser. Adhering to these strict specifications actually allows XHTML to be more flexible than HTML. The benefits described in this summary are only gained if the MIME type of the document is |
XPath
![]() |
XML - Managing Data Exchange
|
Related Topics
|
Get Involved
|
Previous Chapter | Next Chapter |
← XHTML | XLink → |
Learning objectives
|
Introduction
[edit | edit source]Throughout the previous chapters you have learned the basic concepts of XSL and how you must refer to nodes in an XML document when performing an XSL transformation. Up to this point you have been using a straightforward syntax for referring to nodes in an XML document. Although the syntax you have used so far has been XPath there are many more functions and capabilities that you will learn in this chapter. As you begin to comprehend how path language is used for referring to nodes in an XML document your understanding of XML as a tree structure will begin to fall into place. This chapter contains examples that demonstrate many of the common uses of XPath, but for the full XPath specification, see the latest version of the standard at:
XSL uses XPath heavily.
XPath
[edit | edit source]When you go to copy a file or ‘cd’ into a directory at a command prompt you often type something along the lines of ‘/home/darnell/’ to refer to folders. This enables you to change into or refer to folders throughout your computer’s file system. XML has a similar way of referring to elements in an XML document. This special syntax is called XPath, which is short for XML Path Language.
XPath is a language for finding information in an XML document. XPath is used to navigate through elements and attributes in an XML document.
XPath, although used for referring to nodes in an XML tree, is not itself written in XML. This was a wise choice on the part of the W3C, because trying to specify path information in XML would be a very cumbersome task. Any characters that form XML syntax would need to be escaped so that it is not confused with XML when being processed. XPath is also very succinct, allowing you to call upon nodes in the XML tree with a great degree of specificity without being unnecessarily verbose.
XML as a tree structure
[edit | edit source]The great benefit about XML is that the document itself describes the structure of data. If any of you have researched your family history, you have probably come across a family tree. At the top of the tree is some early ancestor and at the bottom of the tree are the latest children.
With a tree structure you can see which children belong to which parents, which grandchildren belong to which grandparents and many other relationships.
The neat thing about XML is that it also fits nicely into this tree structure, often referred to as an XML Tree.
Understanding node relationships
[edit | edit source]We will use the following example to demonstrate the different node relationships.
<bookstore>
<book>
<title>Less Than Zero</title>
<author>Bret Easton Ellis</author>
<year>1985</year>
<price>13.95</price>
</book>
</bookstore>
- Parent
- Each element and attribute has one parent.
- The book element is the parent of the title, author, year, and price:
- Children
- Element nodes may have zero, one or more children.
- The title, author, year, and price elements are all children of the book element:
- Siblings
- Nodes that have the same parent.
- The title, author, year, and price elements are all siblings:
- Ancestors
- A node's parent, parent's parent, etc.
- The ancestors of the title element are the book element and the bookstore element:
- Descendants
- A node's children, children's children, etc.
- Descendants of the bookstore element are the book, title, author, year, and price elements:
Also, it is still useful in some ways to think of an XML file as simultaneously being a serialized file, like you would view it in an XML editor. This is so you can understand the concepts of preceding and following nodes. A node is said to precede another if the original node is before the other in document order. Likewise, a node follows another if it is after that node in document order. Ancestors and descendants are not considered to be either preceding or following a node. This concept will come in handy later when discussing the concept of an axis.
Abbreviated vs. Unabbreviated XPath syntax
[edit | edit source]XPath was created so that nodes can be referred to very succinctly, while retaining the ability to search on many options. Most uses of XPath will involve searching for child nodes, parent nodes, or attribute nodes of a particular node. Because these uses are so common, an abbreviated syntax can be used to refer to these commonly-searched nodes. Following is an XML document that simulates a tree (the type that has leaves and branches.) It will be used to demonstrate the different types of syntax.
<?xml version="1.0" encoding="UTF-8"?>
<trunk name="the_trunk">
<bigBranch name="bb1" thickness="thick">
<smallBranch name="sb1">
<leaf name="leaf1" color="brown" />
<leaf name="leaf2" weight="50" />
<leaf name="leaf3" />
</smallBranch>
<smallBranch name="sb2">
<leaf name="leaf4" weight="90" />
<leaf name="leaf5" color="purple" />
</smallBranch>
</bigBranch>
<bigBranch name="bb2">
<smallBranch name="sb3">
<leaf name="leaf6" />
</smallBranch>
<smallBranch name="sb4">
<leaf name="leaf7" />
<leaf name="leaf8" />
<leaf name="leaf9" color="black" />
<leaf name="leaf10" weight="100" />
</smallBranch>
</bigBranch>
</trunk>
Exhibit 9.2: tree. xml – Example XML page
Following are a few examples of XPath location paths in English, Abbreviated XPath, then Unabbreviated XPath.
Selection 1:
English: All <leaf> elements in this document that are children of <smallBranch> elements that are children of <bigBranch> elements, that are children of the trunk, which is a child of the root.
Abbreviated: /trunk/bigBranch/smallBranch/leaf
Unabbreviated: /child::trunk/child::bigBranch/child::smallBranch/child::leaf
Selection 2:
English: The <bigBranch> elements with ‘name’ attribute equal to ‘bb3,’ that are children of the trunk element, which is a child of the root.
Abbreviated: /trunk/bigBranch[@name=’bb3’]
Unabbreviated: /child::trunk/child::bigBranch[attribute::name=’bb3’]
Notice how we can specify which bigBranch objects we want by using a predicate in the previous example. This narrows the search down to only bigBranch nodes that satisfy the predicate. The predicate is the part of the XPath statement that is in square brackets. In this case, the predicate is asking for bigBranch nodes with their ‘name’ attribute set to ‘bb3’.
The last two examples assume we want to specify the path from the root. Let’s now assume that we are specifying the path from a <smallBranch> node.
Selection 3:
English:The parent node of the current <smallBranch>. (Notice that this selection is relative to a <smallBranch>)
Abbreviated: ..
Unabbreviated: parent::node()
When using the Unabbreviated Syntax, you may notice that you are calling a parent or child followed by two colons (::). Each of those are called an axis. You will learn more about axes shortly.
Also, this may be a good time to explain the concept of a location path. A location path is the series of location steps taken to reach the node/nodes being selected. Location steps are the parts of XPath statements separated by / characters. They are one step on the way to finding the nodes you would like to select.
Location steps are comprised of three parts: an axis (child, parents, descendant, etc.), a node test (name of a node, or a function that retrieves one or more nodes), and a series of predicates (tests on the retrieved nodes that narrow the results, eliminating nodes that do not pass the predicate’s test).
So, in a location path, each of its location steps returns a node-list. If there are further steps on the path after a location step, the next step is executed on all the nodes returned by that step.
Relative vs. Absolute paths
[edit | edit source]When specifying a path with XPath, there are times when you will already be ‘in’ a node. But other times, you will want to select nodes starting from the root node. XPath lets you do both. If you have ever worked with websites in HTML, it works the same way as referring to other files in HTML hyperlinks. In HTML, you can specify an Absolute Path for the hyperlink, describing where another page is with the server name, folders, and filename all in the URL. Or, if you are referring to another file on the same site, you need not enter the server name or all of the path information. This is called a Relative Path. The concept can be applied similarly in XPath.
You can tell the difference by whether there is a ‘/’ character at the beginning of the XPath expression. If so, the path is being specified from the root, which makes it an Absolute Path. But if there is no ‘/’ at the beginning of the path, you are specifying a Relative Path, which describes where the other nodes are relative to the context node, or the node for which the next step is being taken.
Below is an XSL stylesheet (Exhibit 9.3) for use with our tree.xml file above (Exhibit 9.2).
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<!-- Example of an absolute link. The element '/child::trunk'
is being specified from the root element. -->
<xsl:template match="/child::trunk">
<html>
<head>
<title>XPath Tree Tests</title>
</head>
<body>
<!-- Example of a relative link. The <for-each> xsl statement will
execute for every <bigBranch> node in the
‘current’ node, which is the <trunk>node. -->
<xsl:for-each select="child::bigBranch">
<xsl:call-template name="print_out" />
</xsl:for-each>
</body>
</html>
</xsl:template>
<xsl:template name="print_out">
<xsl:value-of select="attribute::name" /> <br />
</xsl:template>
</xsl:stylesheet>
Exhibit 9.3: xsl_tree.xsl – Example of both a relative and absolute path
Four types of XPath location paths
[edit | edit source]In the last two sections you learned about two different distinctions to separate out different location paths: Unabbreviated vs. Abbreviated and Relative vs. Absolute. Combining these two concepts could be helpful when talking about XPath location paths. Not to mention, it could make you sound really smart in front of your friends when you say things like:
- Abbreviated Relative Location Paths- Use of abbreviated syntax while specifying a relative path.
- Abbreviated Absolute Location Paths- Use of abbreviated syntax while specifying a absolute path.
- Unabbreviated Relative Location Paths- Use of unabbreviated syntax while specifying a relative path.
- Unabbreviated Absolute Location Paths- Use of unabbreviated syntax while specifying a absolute path.
I only mention this four-way distinction now because it could come in handy while reading the specification, or other texts on the subject.
XPath axes
[edit | edit source]In XPath, there are some node selections whose performance requires the Unabbreviated Syntax. In this case, you will be using an axis to specify each location step on your way through the location path.
From any node in the tree, there are 13 axes along which you can step. They are as follows:
Axes | Meaning |
---|---|
ancestor:: | Parents of the current node up to the root node |
ancestor-or-self:: | Parents of the current node up to the root node and the current node |
attribute:: | Attributes of the current node |
child:: | Immediate children of the current node |
descendant:: | Children of the current node (including children's children) |
descendant-or-self:: | Children of the current node (including children's children) and the current node |
following:: | Nodes after the current node (excluding children) |
following-sibling:: | Nodes after the current node (excluding children) at the same level |
namespace:: | XML namespace of the current node |
parent:: | Immediate parent of the current node |
preceding:: | Nodes before the current node (excluding children) |
preceding-sibling:: | Nodes before the current node (excluding children) at the same level |
self:: | The current node |
XPath predicates and functions
[edit | edit source]Sometimes, you may want to use a predicate in an XPath Location Path to further filter your selection. Normally, you would get a set of nodes from a location path. A predicate is a small expression that gets evaluated for each node in a set of nodes. If the expression evaluates to ‘false’, then the node is not included in the selection. An example is as follows:
//p[@class=‘alert’]
In the preceding example, every <p> tag in the document is checked to see if its ‘class’ attribute is set to ‘alert’. Only those <p> tags with a ‘class’ attribute with value ‘alert’ are included in the set of nodes for this location path.
The following example uses a function, which can be used in a predicate to get information about the context node.
/book/chapter[position()=3]
This previous example selects only the chapter of the book in the third position. So, for something to be returned, the current <book> element must have at least 3 <chapter> elements.
Also notice that the position function returns an integer. There are many functions in the XPath specification. For a complete list, see the W3C specification at http://www.w3.org/TR/xpath#corelib
Here are a few more functions that may be helpful:
number last() – last node in the current node set
number position() – position of the context node being tested
number count(node-set) – the number of nodes in a node-set
boolean starts-with(string, string) – returns true if the first argument starts with the second
boolean contains(string, string) – returns true if the first argument contains the second
number sum(node-set) – the sum of the numeric values of the nodes in the node-set
number floor(number) – the number, rounded down to the nearest integer
number ceiling(number) – the number, rounded up to the nearest integer
number round(number) – the number, rounded to the nearest integer
Example
[edit | edit source]The following XML document, XSD schemas, and XSL stylesheet examples are to help you put everything you have learned in this chapter together using real life data. As you study this example you will notice how XPath can be used in the stylesheet to call and modify the output of specific information from the document.
Below is an XML document (Exhibit 9.4)
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="movies.xsl" type="text/xsl" media="screen"?>
<movieCollection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="movies.xsd">
<movie>
<movieTitle>Meet the Parents</movieTitle>
<movieSynopsis>
Greg Focker is head over heels in love with his girlfriend Pam, and is ready to
pop the big question. When his attempt to propose is thwarted by a phone call
with the news that Pam's younger sister is getting married, Greg realizes that
the key to Pam's hand in marriage lies with her formidable father.
</movieSynopsis>
<role>
<roleIDREF>bs1</roleIDREF>
<roleType>Lead Actor</roleType>
</role>
<role>
<roleIDREF>tp1</roleIDREF>
<roleType>Lead Actress</roleType>
</role>
<role>
<roleIDREF>rd1</roleIDREF>
<roleType>Lead Actor</roleType>
</role>
<role>
<roleIDREF>bd1</roleIDREF>
<roleType>Supporting Actress</roleType>
</role>
</movie>
<movie>
<movieTitle>Elf</movieTitle>
<movieSynopsis>
One Christmas Eve, a long time ago, a small baby at an orphanage crawled into
Santa’s bag of toys, only to go undetected and accidentally carried back to Santa’s
workshop in the North Pole. Though he was quickly taken under the wing of a surrogate
father and raised to be an elf, as he grows to be three sizes larger than everyone else,
it becomes clear that Buddy will never truly fit into the elf world. What he needs is
to find his real family. This holiday season, Buddy decides to find his true place in the
world and sets off for New York City to track down his roots.
</movieSynopsis>
<role>
<roleIDREF>wf1</roleIDREF>
<roleType>Lead Actor</roleType>
</role>
<role>
<roleIDREF>jc1</roleIDREF>
<roleType>Supporting Actor</roleType>
</role>
<role>
<roleIDREF>zd1</roleIDREF>
<roleType>Lead Actress</roleType>
</role>
<role>
<roleIDREF>ms1</roleIDREF>
<roleType>Supporting Actress</roleType>
</role>
</movie>
<castMember>
<castMemberID>rd1</castMemberID>
<castFirstName>Robert</castFirstName>
<castLastName>De Niro</castLastName>
<castSSN>489-32-5984</castSSN>
<castGender>male</castGender>
</castMember>
<castMember>
<castMemberID>bs1</castMemberID>
<castFirstName>Ben</castFirstName>
<castLastName>Stiller</castLastName>
<castSSN>590-59-2774</castSSN>
<castGender>male</castGender>
</castMember>
<castMember>
<castMemberID>tp1</castMemberID>
<castFirstName>Teri</castFirstName>
<castLastName>Polo</castLastName>
<castSSN>099-37-8765</castSSN>
<castGender>female</castGender>
</castMember>
<castMember>
<castMemberID>bd1</castMemberID>
<castFirstName>Blythe</castFirstName>
<castLastName>Danner</castLastName>
<castSSN>273-44-8690</castSSN>
<castGender>male</castGender>
</castMember>
<castMember>
<castMemberID>wf1</castMemberID>
<castFirstName>Will</castFirstName>
<castLastName>Ferrell</castLastName>
<castSSN>383-56-2095</castSSN>
<castGender>male</castGender>
</castMember>
<castMember>
<castMemberID>jc1</castMemberID>
<castFirstName>James</castFirstName>
<castLastName>Caan</castLastName>
<castSSN>389-49-3029</castSSN>
<castGender>male</castGender>
</castMember>
<castMember>
<castMemberID>zd1</castMemberID>
<castFirstName>Zooey</castFirstName>
<castLastName>Deschanel</castLastName>
<castSSN>309-49-4005</castSSN>
<castGender>female</castGender>
</castMember>
<castMember>
<castMemberID>ms1</castMemberID>
<castFirstName>Mary</castFirstName>
<castLastName>Steenburgen</castLastName>
<castSSN>988-43-4950</castSSN>
<castGender>female</castGender>
</castMember>
</movieCollection>
Exhibit 9.4: movies_xpath.xml
Below is the second XML document (Exhibit 9.5)
<?xml version="1.0" encoding="UTF-8"?>
<cities xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="cities.xsd">
<city>
<cityID>c2</cityID>
<cityName>Mandal</cityName>
<cityPopulation>13840</cityPopulation>
<cityCountry>Norway</cityCountry>
<tourismDescription>A small town with a big atmosphere. Mandal provides comfort
away from normal luxuries.
</tourismDescription>
<capitalCity>c3</capitalCity>
</city>
<city>
<cityID>c3</cityID>
<cityName>Oslo</cityName>
<cityPopulation>533050</cityPopulation>
<cityCountry>Norway</cityCountry>
<tourismDescription>Oslo is the capital of Norway for many reasons.
It is also the capital location for tourism. The culture, shopping,
and attractions can all be experienced in Oslo. Just remember
to bring your wallet.
</tourismDescription>
</city>
</cities>
Exhibit 9.5: cites__xpath.xml
Below is the Movies schema (Exhibit 9.6)
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">
<!--Movie Collection-->
<xsd:element name="movieCollection">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="movie" type="movieDetails" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<!--This contains the movie details.-->
<xsd:complexType name="movieDetails">
<xsd:sequence>
<xsd:element name="movieTitle" type="xsd:string" minOccurs="1" maxOccurs="unbounded"/>
<xsd:element name="movieSynopsis" type="xsd:string"/>
<xsd:element name="role" type="roleDetails" minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<!--The contains the genre details.-->
<xsd:complexType name="roleDetails">
<xsd:sequence>
<xsd:element name="roleIDREF" type="xsd:IDREF"/>
<xsd:element name="roleType" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<xsd:simpleType name="ssnType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{3}-\d{2}-\d{4}"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="castDetails">
<xsd:sequence>
<xsd:element name="castMemberID" type="xsd:ID"/>
<xsd:element name="castFirstName" type="xsd:string"/>
<xsd:element name="castLastName" type="xsd:string"/>
<xsd:element name="castSSN" type="ssnType"/>
<xsd:element name="castGender" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
Exhibit 9.6: movies.xsd
Below is the Cities schema (Exhibit 9.7)
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
attributeFormDefault="unqualified">
<xsd:element name="cities">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="city" type="cityType" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="cityType">
<xsd:sequence>
<xsd:element name="cityID" type="xsd:ID"/>
<xsd:element name="cityName" type="xsd:string"/>
<xsd:element name="cityPopulation" type="xsd:integer"/>
<xsd:element name="cityCountry" type="xsd:string"/>
<xsd:element name="tourismDescription" type="xsd:string"/>
<xsd:element name="capitalCity" type="xsd:IDREF" minOccurs="0" maxOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
Exhibit 9.7: cities.xsd
Below is the XSL stylesheet (Exhibit 9.8)
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="castList" match="castMember" use="castMemberID"/>
<xsl:output method="html"/>
<!-- example of using an abbreviated absolute path to pull info
from cities_xpath.xml for the city "Oslo" specifically -->
<!-- specify absolute path to select cityName and assign it the variable "city" -->
<xsl:variable name="city" select="document('cities_xpath.xml')
/cities/city[cityName='Oslo']/cityName" />
<!-- specify absolute path to select cityCountry and assign it the variable "country" -->
<xsl:variable name="country" select="document('cities_xpath.xml')
/cities/city[cityName='Oslo']/cityCountry" />
<!-- specify absolute path to select tourismDescription and assign it the variable "description" -->
<xsl:variable name="description" select="document('cities_xpath.xml')
/cities/city[cityName='Oslo']/tourismDescription" />
<xsl:template match="/">
<html>
<head>
<title>Movie Collection</title>
</head>
<body>
<h2>Movie Collection</h2>
<xsl:apply-templates select="movieCollection"/>
</body>
</html>
</xsl:template>
<xsl:template match="movieCollection">
<!-- let's say we just want to see the actors. -->
<!--
<xsl:for-each select="movie">
<hr />
<br />
<b><xsl:text>Movie Title: </xsl:text></b>
<xsl:value-of select="movieTitle"/>
<br />
<br />
<b><xsl:text>Movie Synopsis: </xsl:text></b>
<xsl:value-of select="movieSynopsis"/>
<br />
<br />-->
<!-- actor info begins here. -->
<b><xsl:text>Cast: </xsl:text></b>
<br />
<!-- specify an abbreviated relative path here for "role."
NOTE: there is no predicate in this one; it's just a path. -->
<xsl:for-each select="movie/role">
<xsl:sort select="key('castList',roleIDREF)/castLastName"/>
<xsl:number value="position()" format="
 0. " />
<xsl:value-of select="key('castList',roleIDREF)/castFirstName"/>
<xsl:text> </xsl:text>
<xsl:value-of select="key('castList',roleIDREF)/castLastName"/>
<xsl:text>, </xsl:text>
<xsl:value-of select="roleType"/>
<br />
<xsl:value-of select="key('castList',roleIDREF)/castGender"/>
<xsl:text>, </xsl:text>
<xsl:value-of select="key('castList',roleIDREF)/castSSN"/>
<br />
<br />
</xsl:for-each>
<!--
</xsl:for-each>-->
<hr />
<!--calling the variables -->
<span style="color:red;">
<p><b>Travel Advertisement</b></p>
<!-- reference the city, followed by a comma, and then the country -->
<p><xsl:value-of select="$city" />, <xsl:value-of select="$country" /></p>
<!-- reference the description -->
<xsl:value-of select="$description" />
</span>
</xsl:template>
</xsl:stylesheet>
Exhibit 9.6: movies.xsl
Summary
[edit | edit source]Throughout the chapter we have learned many of the features and capabilities of the XML Path Language. You should now have a good understanding of node relationships though the use of the XML tree structure. Using the concept of Abbreviated and Unabbreviated location paths allows us to narrow our searches down to only a particular element by satisfying the predicate in the square brackets. Relative and Absolute are used for specifying the path to your location. The Relative path gives the file location in relation to the current working directory while the Absolute path gives an exact location of a file or directory name within a computer or file system. Both of these concepts can be combined to come up with four types of XPath location paths: Abbreviated Relative, Abbreviated Absolute, Unabbreviated Relative, and lastly Unabbreviated Absolute. If further filtering is required XPath predicates and functions can be used. These allow for the predicate to be evaluated for such things as true/false and count functions. When used correctly XPath can be a very powerful tool in the XML language. |
XLink
![]() |
XML - Managing Data Exchange
|
Related Topics
|
Get Involved
|
Previous Chapter | Next Chapter |
← XPath | CSS → |
Learning objectives
|
sponsored by:
The University of Georgia
|
Introduction
[edit | edit source]Through the use of Uniform Resource Identifiers (URI's), an XLink allows elements to be inserted into XML documents that create links between resources such as documents, images, files and other pages. An XLink is similar in concept to an HTML hyperlink, but is more powerful and flexible.
This chapter will be a general overview of the XLink syntax. It will also provide exposure to some of XLink's basic concepts. For the full XLink specification, see the latest version of the standard at:
XLink
[edit | edit source]XLinks create a linking relationship between two or more resources. They allow for any XML element, image, text or markup files to be specified in the link.
By using a method similar to the centralized formatting of XSL stylesheets, XLinks allow a document's hyperlinks to be isolated and centralized in a separate document. As a linked document's addresses changes, the XLink remains functional.
The use of XLink requires the declaration of the XLink namespace. This namespace provides the global attributes for type, href, role, arcrole, title, show, actuate, label, from and to. The following example would make the prefix xlink available within the tourGuide element.
<tourGuide
xmlns:xlink="http://www.w3.org/1999/xlink">
...
</tourGuide>
XLink global attributes
[edit | edit source]The following table outlines the attributes that can be used with the xlink namespace. The global attributes are type, href, role, arcrole, title, show, actuate, label, from, and to. The table also includes descriptions of how the attributes can be used.
Exhibit 1: Table of global attributes
Attributes |
Description and Valid Values |
type |
Describes the meaning of an item
|
href |
Location of resource
|
role |
Description of XLink's content
|
arcrole |
Description of XLink's content
|
title |
Name displayed, usually short description of link |
show |
Describes behavior of the browser once the XLink has been actuated and loaded
|
actuate |
Specifies when resource is retrieved or link processing occurs
|
label, from & to |
Specifies link direction |
XML schema
[edit | edit source]
The following XML schema defines a tour guide that contains at least one city. Each city contains one or more attractions. The name of each attraction is an XLink.
Exhibit 2: XML schema for TourGuide
<?xml version="1.0" encoding="UTF-8"?>
<!--
Document : TourGuide.xsd
Created on : February 28, 2006
Author : Billy Timmins
-->
<!--
Declaration of usage of xlink Namespace
-->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified"
xmlns:xlink="http://www.w3.org/1999/xlink">
<xsd:element name="tourGuide">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="city" type="cityDetails" minOccurs="1" maxOccurs="unbounded" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<!--
This section will contain the City details
-->
<xsd:complexType name="cityDetails">
<xsd:sequence>
<xsd:element name="cityName" type="xsd:string"/>
<xsd:element name="adminUnit" type="xsd:string"/>
<xsd:element name="country" type="xsd:string"/>
<xsd:element name="continent">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="Asia"/>
<xsd:enumeration value="Africa"/>
<xsd:enumeration value="Australia"/>
<xsd:enumeration value="Europe"/>
<xsd:enumeration value="North America"/>
<xsd:enumeration value="South America"/>
<xsd:enumeration value="Antarctica"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="population" type="xsd:integer"/>
<xsd:element name="description" type="xsd:string"/>
<xsd:element