Jump to content

XML - Managing Data Exchange/Data schemas

From Wikibooks, open books for an open world



Previous Chapter Next Chapter
Recursive relationships DTD




Learning objectives

  • Overview of Data Schemas
  • Starting your schema the right way
  • Entities in general
  • The Parent Child Structure
  • Attributes and Restrictions
  • Ending your schema the right way

Initiated by:

The University of Georgia

Terry College of Business

Department of Management Information Systems


Introduction

[edit | edit source]

Data schemas are the foundation of all XML pages. They define objects, their relationships, their attributes, and the structure of the data model. Without them, XML documents would not exist. In this chapter, you will come to understand the purpose of XML data schemas, their intricate parts, and how to utilize them. Also, examples will be included for you to copy when creating your own data schema, making your job a lot easier. At the bottom of this Web page a whole Schema has been included, from which parts have been included in the different sections throughout this chapter. Refer to it if you would like to see how the whole Schema works as one.

Overview of Data Schemas

[edit | edit source]

The data schema, all technicalities aside, is the data model with which all the XML information is conveyed. It has a hierarchy structure starting with a root element (to be explained later) and goes all the way down to cover even the most minute detail of the model with detailed steps in between. Data schemas have two main parts, the entities and their relationships. The entities contained in a data schema represent objects from the model. They have unique identifiers, attributes, and names for what kind of object they are. The relationships in the schema represent the relationships between the objects, simple enough. Relationships can be one to one, one to many, many to many, recursive, and any other kind you could find in a data model. Now we will begin to create our own data schema.

Starting your schema the right way

[edit | edit source]

All schemas begin the same way, no matter what type of objects they represent. The first line in every Schema is this declaration:

<?xml version="1.0" encoding="UTF-8"?>

Exhibit 1: XML Declaration

Exhibit 1 simply tells the browser or whatever file/program accessing this schema that it is an XML file and uses the encoding structure "UTF-8". You can copy this to use to start your own XML file. Next comes the Namespace declaration:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">

Exhibit 2: Namespace Declaration

Namespaces are basically dictionaries containing definitions of most of the coding in the schema. For example, when creating a schema, if you declare an object to be of type "String", the definition of the type "String" is contained in the Namespace along with all of its attributes. This is true for most of the code you write. If you have made or seen other schemas, most of the code is prefaced by "xsd:". A good example is something like "xsd:sequence" or "xsd:complexType". sequence and complexType are both objects defined in the Namespace that has been linked to the prefix "xsd". In fact, you could theoretically name the default Namespace anything, as long as you referenced it the same way throughout the Schema. The most common Namespace which contains most of the XML objects is http://www.w3.org/2001/XMLSchema. Now onto Exhibit 2.

The first part lets any file/program know that this file is a schema. Pretty easy to understand. Like the XML declaration, this is universal to XML schemas and you can use it in yours. The second part is the actual Namespace declaration; xmlns stands for XML NameSpace. This defines the Schema's default Namespace and is usually the one given in the code. Again, I would recommend using this code to start your Schemas. The last part is difficult to understand, but here is a pretty detailed explanation. Using "unqualified" is most applicable until you get to some really complicated code.

Entities in general

[edit | edit source]

Entities are basically the objects a Schema is created to represent. As stated before, they have attributes and relationships. We will now go much further into explaining exactly what they are and how to write code for them.

There are two types of Entities: simpleType and complexType. A simpleType object has one value associated with it. A string is a perfect example of a simpleType object as it only contains the value of the string. Most simpleTypes used will be defined in the default Namespace; however, you can define your own simpleType at the bottom of the Schema (this will be brought up in the restrictions section). Because of this, the only objects you will most often need to include in your Schema are complexTypes. A complexType is an object with more than one attribute associated with it, and it may or may not have a child elements attached to it. Here is an example of a complexType object:

<xsd:complexType name="GenreType">
  <xsd:sequence>
    <xsd:element name="name" type="xsd:string"/>
    <xsd:element name="description" type="xsd:string"/>
    <xsd:element name="movie" type="MovieType" minOccurs="1" maxOccurs="unbounded"/>
  </xsd:sequence>
</xsd:complexType>

Exhibit 3: The complexType Element

This code begins with the declaration of a complexType and its name. When other entities refer to it, such as a parent element, it will refer to this name. The 2nd line begins the sequence of attributes and child elements, which are all declared as an "element". The elements are declared as elements with the 1st part of the line of code, and their name to which other documents will refer is included as the "name" as the 2nd part. After the first two declarations comes the "type" declaration. Note that for the name and description elements their type is "xsd:string" showing that the type string is defined in the Namespace "xsd". For the movie element, the type is "MovieType", and because there is no Namespace before "MovieType", it is assumed that this type is included in this Schema. (it could refer to a type defined in another Schema if the other Schema was included at the top of the Schema. don't worry about that now) "minOccurs" and "maxOccurs" represents the relationship between Genre's and MovieTypes. "minOccurs" can be either 0 or an arbitrary number, depending only on the data model. "maxOccurs" can be either 1 (a one to one relationship), an arbitrary number (a one to many relationship), or "unbounded" (a one to many relationship).

For each schema, there must be one root element. This entity contains every other entity underneath it in the hierarchy. For instance, when creating a schema to include a list of movies, the root element would be something like MovieDatabase, or maybe MovieCollection, just something that would logically contain all the other objects (like genre, movie, actor, director, plotline, etc.) It is always started with this line of code: <xsd:element name="xxx"> showing that it is the root element and then goes on as a normal complexType. All other objects will begin with either simpleType or complexType. Here is sample code for a MovieDatabase root element:

<xsd:element name="MovieDatabase">
     <xsd:complexType>
       <xsd:sequence>
         <xsd:element name="Genre" type="GenreType" minOccurs="1" maxOccurs="unbounded"/>            
       </xsd:sequence>
     </xsd:complexType>
   </xsd:element>

Exhibit 4: The Root Element

This represents a MovieDatabase where the child element of MovieDatabase is a Genre. From there it goes onto movie, etc. We will continue to use this example help you better understand.

The Parent / Child Relationship

[edit | edit source]

The Parent / Child Relationship is a key topic in Data Schemas. It represents the basic structure of the data model's hierarchy by clearly laying out the top down configuration. Look at this piece of code which shows how movies have actors associated with them:

<xsd:complexType name="MovieType">
  <xsd:sequence>
    <xsd:element name="name" type="xsd:string"/>       
    <xsd:element name="actor" type="ActorType" minOccurs="1" maxOccurs="unbounded"/>
  </xsd:sequence>
</xsd:complexType>
     
<xsd:complexType name="ActorType">
  <xsd:sequence>
    <xsd:element name="lname" type="xsd:string"/>
    <xsd:element name="fname" type="xsd:string"/>
  </xsd:sequence>
</xsd:complexType>

Exhibit 5: The Parent/Child Relationship

Within each MovieType, there is an element named "actor" which is of "ActorType". When the XML document is populated with information, the surrounding tags for actor will be <actor></actor> and not <ActorType></ActorType>. To keep your Schema flowing smoothly and without error, the type field in the Parent Element will always equal the name field in the declaration of the complexType Child Element.

Attributes and Restrictions

[edit | edit source]

An attribute of an entity is a simpleType object in that it only contains one value. <xsd:element name="lname" type="xsd:string"/> is a good example of an attribute. It is declared as an element, has a name associated with it, and has a type declaration. Located in the appendix of this chapter is a long list of simpleTypes built into the default Namespace. Attributes are incredibly simple to use, until you try and restrict them.

In some cases, certain data must abide by a standard to maintain data integrity. An example of this would be a Social Security number or an email address. If you have a database of email addresses that sends mass emails to, you would need all of them to be valid addresses, or else you'd get tons of error messages each time you send out that mass email. To avoid this problem, you can essentially take a known simpleType and add a restriction to it to better suit your needs. Now you can do this two ways, but one is simpler and better to use in Data Schemas. You could edit the simpleType within its declaration in the Parent Element, but it gets messy, and if another Schema wants to use it, the code must be written again. The better way to do it is to list a new type at the bottom of the Schema that edits a previously known simpleType. Here is an example of this with a Social Security number:

<xsd:simpleType name="emailaddressType">
  <xsd:restriction base="xsd:string">
    <xsd:pattern value="[^@]+@[^\.]+\..+"/>
  </xsd:restriction>
</xsd:simpleType>
   
<xsd:simpleType name="ssnType">
  <xsd:restriction base="xsd:string">
    <xsd:pattern value="\d{3}-\d{2}-\d{4}"/>
  </xsd:restriction>
</xsd:simpleType>

Exhibit 6: Restriction on a simpleType

This was included in the Schema below the last Child Element and before the closing </xsd:schema>. The first line declares the simpleType and gives it a name, "ssnType". You could name yours anything you want, as long as you reference it correctly throughout the Schema. By doing this, you can use this type anywhere in the Schema, or anywhere in another Schema, provided the references are correct. The second line lets the Schema know it is a restricted type and its base is a string defined in the default Namespace. Basically, this type is a string with a restriction on it, and the third line is the actual restriction. It can be one of many types of restrictions, which are listed in the Appendix of this chapter. This one happens to be of type "pattern". A "pattern" means that only a certain sequence of characters will be allowed in the XML document and is defined in the value field. This particular one means three digits, a hyphen, two digits, a hyphen, and four digits. To learn more about how to use restrictions, follow this link to the W3 school's section on restrictions.

Not of little import: Introducing the <xsd:import> tag

[edit | edit source]

The <xsd:import> tag is used to import a schema document and the namespace associated with the data types defined within the schema document. This allows an XML schema document to reference a type library using namespace names (prefixes). Let's take a closer look at a simple XML instance document for a store that uses these multiple namespace names:

<?xml version="1.0" encoding="UTF-8"?>
<store:SimpleStore xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.opentourism.org/xmltext/SimpleStore.xsd"
  xmlns:store="http://www.opentourism.org/xmltext/Store"
  xmlns:MGR="http://www.opentourism.org/xmltext/CoreSchema">
  <!-- Note the explicitly defined namespace declarations, the prefix store 
     represents data types defined in the     
     <code>http://www.opentourism.org/xmltext/Store.xml</code> namespace and the 
     prefix MGR represents data types defined in the 
     <code>http://www.opentourism.org/xmltext/CoreSchema</code> namespace. 
     Also, notice that there is no default namespace declaration – every element
     and attribute must be associated with a namespace (we will see this is 
     necessary weh we examine the schema document)  
-->
  <store:Store>
    <MGR:Name xmlns:MGR=" http://www.opentourism.org/xmltext/CoreSchema ">
      <MGR:FirstName>Michael</MGR:FirstName>
      <MGR:MiddleNames>Jay</MGR:MiddleNames>
      <MGR:LastName>Fox</MGR:LastName>
    </MGR:Name>
    <store:StoreName>The Gap</store:StoreName>
    <store:StoreAddress>
      <store:Street>86 Nowhere Ave.</store:Street>
      <store:City>Los Angeles</store:City>
      <store:State>CA</store:State>
      <store:ZipCode>75309</store:ZipCode>
    </store:StoreAddress>
    <!-- More store information would go here. -->
  </store:Store>
  <!-- More stores would go here. -->
</store:SimpleStore>

Exhibit 7 XML Instance Document – [1]


Let's look at the schema document and see how the <xsd:import> tag was used to import data types from a type library (external schema document).

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns="http://www.opentourism.org/xmltext/Store.xml"
  xmlns:MGR="http://www.opentourism.org/xmltext/CoreSchema"
  targetNamespace="http://www.opentourism.org/xmltext/Store.xml" elementFormDefault="qualified">
  <!-- The prefix MGR is bound to the following namespace name: 
          <code>http://www.opentourism.org/xmltext/CoreSchema</code>
          The managerTypeLib.xsd schema document is imported by associating the 
          schema with the <code>http://www.opentourism.org/xmltext/CoreSchema</code> 
          namespace name, which was bound to the MGR prefix. 
          The elementFormDefault attribute has the value ‘qualified' indicating that 
          an XML instance document must use qualified names for every element(default
          namespace can not be used)  
-->
  <!-- The target namespace and default namespace are the same  -->
  <xsd:import namespace="http://www.opentourism.org/xmltext/CoreSchema"
    schemaLocation="ManagerTypeLib.xsd"/>
  <xsd:element name="SimpleStore">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="Store" type="StoreType" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
  <xsd:complexType name="StoreType">
    <xsd:sequence>
      <xsd:element ref="MGR:Name"/>
      <xsd:element name="StoreName" type="xsd:string"/>
      <xsd:element name="StoreAddress" type="StoreAddressType"/>
    </xsd:sequence>
  </xsd:complexType>
  <xsd:complexType name="StoreAddressType">
    <xsd:sequence>
      <xsd:element name="Street" type="xsd:string"/>
      <xsd:element name="City" type="xsd:string"/>
      <xsd:element name="State" type="xsd:string"/>
      <xsd:element name="ZipCode" type="xsd:string"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:schema>

Exhibit 8: XML Schema [http://www.opentourism.org/xmltext/SimpleStore.xsd


Like the include tag and the redefine tag, the import tag is another means of incorporating any data types from an external schema document into another schema document and must occur before any element or attribute declarations. These mechanisms are important when XML schemas are modularized and type libraries are being maintained and used in multiple schema documents.

When the whole is greater than the sum of its parts:
Schema Modularization

[edit | edit source]

Now that we have covered all three methods of incorporating external XML schemas, let’s consider the importance of these mechanisms. As is typical with most programming code, redundancy is frowned upon; this is true for custom data type definitions as well. If a custom data type already exists that can be applied to an element in your schema document, does it not make sense to use this data type rather than create it again within your new schema document? Moreover, if you know that a single data type can be reused for several applications, should you not have a method for referencing that data type when you need it?

The idea behind modular schemas is to examine what your schema does, determine what data types are frequently used in one form or another and develop a type library. As your needs for more complex schemas increase you can continue to add to your library, reuse data types in your type library, and redefine those data types as needed. An example of this reuse would be a schema for customer information – different departments would use different schemas as they would need only partial customer information. However most, if not all, departments would need some specific customer information, like name and contact information, which could be incorporated in the individual departmental schema documents.

Schema modularization is a “best practice”. By maintaining a type library and reusing and redefining types in the type library, you can help ensure that your XML schema documents don't become overwhelming and difficult to read. Readability is important, because you may not be the only one using these schemas, and it is important that others can easily understand your schema documents.

“Choose, but choose wisely…”: Schema alternatives

[edit | edit source]

Thus far in this book we have only discussed XML schemas as defined by the World Wide Web Consortium (W3C). Yet there are other methods of defining the data contained within an XML instanced document, but we will only mention the two most popular and well known alternatives: Document Type Definition (DTD) and Relax NG Schema.

We will cover DTDs in the next chapter. Relax NG schema is a newer and has many of the same features that W3C XML schema have; Relax NG also claims to be simpler, and easier to learn, but this is very subjective. For more about Relax NG, visit: http://www.relaxng.org/

Appendix

[edit | edit source]

First is the full Schema used in the examples throughout this chapter:

<?xml version="1.0" encoding="UTF-8"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="unqualified">
    
   <xsd:element name="MovieDatabase">
     <xsd:complexType>
       <xsd:sequence>
         <xsd:element name="Genre" type="GenreType" minOccurs="1" maxOccurs="unbounded"/>            
       </xsd:sequence>
     </xsd:complexType>
   </xsd:element>
         
     <xsd:complexType name="GenreType">
       <xsd:sequence>
         <xsd:element name="name" type="xsd:string"/>
         <xsd:element name="description" type="xsd:string"/>
         <xsd:element name="movie" type="MovieType" minOccurs="1" maxOccurs="unbounded"/>
       </xsd:sequence>
     </xsd:complexType>
     
     <xsd:complexType name="MovieType">
       <xsd:sequence>
         <xsd:element name="name" type="xsd:string"/>
         <xsd:element name="rating" type="xsd:string"/>
         <xsd:element name="director" type="xsd:string"/>
         <xsd:element name="writer" type="xsd:string"/>
         <xsd:element name="year" type="xsd:int"/>
         <xsd:element name="tagline" type="xsd:string"/>       
         <xsd:element name="actor" type="ActorType" minOccurs="1" maxOccurs="unbounded"/>
       </xsd:sequence>
     </xsd:complexType>
     
     <xsd:complexType name="ActorType">
       <xsd:sequence>
         <xsd:element name="lname" type="xsd:string"/>
         <xsd:element name="fname" type="xsd:string"/>
         <xsd:element name="gender" type="xsd:string"/>
         <xsd:element name="bday" type="xsd:string"/>
         <xsd:element name="birthplace" type="xsd:string"/>
         <xsd:element name="ssn" type="ssnType"/>
       </xsd:sequence>
     </xsd:complexType>
     
     <xsd:simpleType name="ssnType">
       <xsd:restriction base="xsd:string">
         <xsd:pattern value="\d{3}-\d{2}-\d{4}"/>
       </xsd:restriction>
   </xsd:simpleType>
   
</xsd:schema>

It’s time to go back to the beginning…and review all of the schema data types, elements, and attributes that we have covered thus far (and maybe a few that we have not). The following tables will detail the XML data types, elements and attributes that can be used in an XML Schema.

Primitive Types

This is a table with all the primitive types the attributes in your schema can be.

Type Syntax Legal value example Constraining facets
xsd:anyURI <xsd:element name = “url” type = “xsd:anyURI” /> http://www.w3.com length, minLength, maxLength, pattern, enumeration, whitespace
xsd:boolean <xsd:element name = “hasChildren” type = “xsd:boolean” /> true or false or 1 or 0 pattern and whitespace
xsd:byte <xsd:element name = “stdDev” type = “xsd:byte” /> -128 through 127 length, minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, whitespace, and totalDigits
xsd:date <xsd:element name = “dateEst” type = “xsd:date” /> 2004-03-15 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace
xsd:dateTime <xsd:element name = “xMas” type = “xsd:dateTime” /> 2003-12-25T08:30:00 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, and whitespace
xsd:decimal <xsd:element name = “pi” type = “xsd:decimal” /> 3.1415292 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, fractionDigits, and totalDigits
xsd:double <xsd:element name = “pi” type = “xsd:double” /> 3.1415292 or INF or NaN minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:duration <xsd:element name = “MITDuration” type = “xsd:duration” /> P8M3DT7H33M2S
xsd:float <xsd:element name = “pi” type = “xsd:float” /> 3.1415292 or INF or NaN minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:gDay <xsd:element name = “dayOfMonth” type = “xsd:gDay” /> ---11 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:gMonth <xsd:element name = “monthOfYear” type = “xsd:gMonth” /> --02-- minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:gMonthDay <xsd:element name = “valentine” type = “xsd:gMonthDay” /> --02-14 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:gYear <xsd:element name = “year” type = “xsd:gYear” /> 1999 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:gYearMonth <xsd:element name = “birthday” type = “xsd:gYearMonth” /> 1972-08 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:ID <xsd:attribute name="id" type="xsd:ID"/> id-102 length, minLength, maxLength, pattern, enumeration,   and whitespace
xsd:IDREF <xsd:attribute name="version" type="xsd:IDREF"/> id-102 length, minLength, maxLength, pattern, enumeration,   and whitespace
xsd:IDREFS <xsd:attribute name="versionList" type="xsd:IDREFS"/> id-102 id-103 id-100 length, minLength, maxLength, pattern, enumeration,   and whitespace
xsd:int <xsd:element name = “age” type = “xsd:int” /> 77 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration, whitespace, and totalDigits
xsd:integer <xsd:element name = “age” type = “xsd:integer” /> 77 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:long <xsd:element name = “cannelNumber” type = “xsd:int” /> 214 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace
xsd:negativeInteger <xsd:element name = “belowZero” type = “xsd:negativeInteger” /> -123 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, and totalDigits
xsd:nonNegativeInteger <xsd:element name = “numOfchildren” type = “xsd:nonNegativeInteger” /> 2 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, and totalDigits
xsd:nonPositiveInteger <xsd:element name = “debit” type = “xsd:nonPositiveInteger” /> 0 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, and totalDigits
xsd:positiveInteger <xsd:element name = “credit” type = “xsd:positiveInteger” /> 500 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, and totalDigits
xsd:short <xsd:element name = “numOfpages” type = “xsd:short” /> 476 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   whitespace, and totalDigits
xsd:string <xsd:element name = “name” type = “xsd:string” /> Joeseph length, minLength, maxLength, pattern, enumeration,   whitespace, and totalDigits
xsd:time <xsd:element name = “credit” type = “xsd:time” /> 13:02:00 minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, enumeration,   and whitespace,

Schema Elements
( from http://www.w3schools.com/schema/schema_elements_ref.asp )

Here is a list of all the elements which can be included in your schemas.

Element Explanation
all Specifies that the child elements can appear in any order. Each child element can occur 0 or 1 time
annotation Specifies the top-level element for schema comments
any Enables the author to extend the XML document with elements not specified by the schema
anyAttribute Enables the author to extend the XML document with attributes not specified by the schema
appInfo Specifies information to be used by the application (must go inside annotation)
attribute Defines an attribute
attributeGroup Defines an attribute group to be used in complex type definitions
choice Allows only one of the elements contained in the <choice> declaration to be present within the containing element
complexContent Defines extensions or restrictions on a complex type that contains mixed content or elements only
complexType Defines a complex type element
documentation Defines text comments in a schema (must go inside annotation)
element Defines an element
extension Extends an existing simpleType or complexType element
field Specifies an XPath expression that specifies the value used to define an identity constraint
group Defines a group of elements to be used in complex type definitions
import Adds multiple schemas with different target namespace to a document
include Adds multiple schemas with the same target namespace to a document
key Specifies an attribute or element value as a key (unique, non-nullable, and always present) within the containing element in an instance document
keyref Specifies that an attribute or element value correspond to those of the specified key or unique element
list Defines a simple type element as a list of values
notation Describes the format of non-XML data within an XML document
redefine Redefines simple and complex types, groups, and attribute groups from an external schema
restriction Defines restrictions on a simpleType, simpleContent, or a complexContent
schema Defines the root element of a schema
selector Specifies an XPath expression that selects a set of elements for an identity constraint
sequence Specifies that the child elements must appear in a sequence. Each child element can occur from 0 to any number of times
simpleContent Contains extensions or restrictions on a text-only complex type or on a simple type as content and contains no elements
simpleType Defines a simple type and specifies the constraints and information about the values of attributes or text-only elements
union Defines a simple type as a collection (union) of values from specified simple data types
unique Defines that an element or an attribute value must be unique within the scope

Schema Restrictions and Facets for data types
( from http://www.w3schools.com/schema/schema_elements_ref.asp )

Here is a list of all the types of restrictions which can be included in your schema.

Constraint Description
enumeration Defines a list of acceptable values
fractionDigits Specifies the maximum number of decimal places allowed. Must be equal to or greater than zero
length Specifies the exact number of characters or list items allowed. Must be equal to or greater than zero
maxExclusive Specifies the upper bounds for numeric values (the value must be less than this value)
maxInclusive Specifies the upper bounds for numeric values (the value must be less than or equal to this value)
maxLength Specifies the maximum number of characters or list items allowed. Must be equal to or greater than zero
minExclusive Specifies the lower bounds for numeric values (the value must be greater than this value)
minInclusive Specifies the lower bounds for numeric values (the value must be greater than or equal to this value)
minLength Specifies the minimum number of characters or list items allowed. Must be equal to or greater than zero
pattern Defines the exact sequence of characters that are acceptable
totalDigits Specifies the exact number of digits allowed. Must be greater than zero
whiteSpace Specifies how white space (line feeds, tabs, spaces, and carriage returns) are handled

Regex

Special regular expression (regex) language can be used to construct a pattern. The regex language in XML Schema is based on Perl's regular expression language. The following are some common notations:

. (the period for any character at all
\d for any digit
\D for any non-digit
\w for any word (alphanumeric) character
\W for any non-word character (i.e. -, +, =)
\s for any white space (including space, tab, newline, and return)
\S for any character that is not white space
x* to have zero or more x's
(xy)* to have zero or more xy's
x+ repetition of the x, at least once
x? to have one or zero x's
(xy)? To have one or no xy's
[abc] to include one of a group of values
[0-9] to include the range of values from 0 to 9
x{5} to have exactly 5 x's (in a row)
x{5,} to have at least 5 x's (in a row)
x{5,8} at least 5 but at most 8 x's (in a row)
(xyz){2} to have exactly 2 xyz's (in a row)
For example, the pattern for validating a Social Security Number is \d{3}-\d{2}-\d{4}

The schema code for emailAddressType is \w+\W*\w*@{1}\w+\W*\w+.\w+.*\w*

[w+] at least one word (alphanumeric) character, e. g. answer
[W*] followed by none, one or many non-word character(s), e. g. -
[w*@{1}] followed by any (or none) word character and one at-sign, e. g. my@
[w+] followed by at least one word character, e. g. mail
[W*] followed by none, one or many non-word character(s), e. g. _
[w+.] followed by at least one word character and period, e. g. please.
[w+.*] zero to infinite times followed by the previous string, e. g. opentourism.
[w*] finally followed by none, one or many word character(s) e. g. org
email-address: answer-my@mail_please.opentourism.org

Instance Document Attributes
These attributes do NOT need to be declared within the schemas

Attribute Explanation Example
xsi:nil Indicates that a certain element does not have a value or that the value is unknown.   The element must be set to nillable inside the schema document:

<xsd:element name=”last_name” type=”xsd:string” nillable=true”/>

<full_name xmlns:xsi= ”http://www.w3.org/2001/XMLSchema-instance”>    <first_name>Madonna</first_name>

<last_name xsi:nil=”true”/> </full_name>

xsi:noNamespaceSchemaLocation Locates the schema for elements that are not in any namespace <radio xsi:noNamespaceSchemaLocation= ”http://www.opentourism.org/xmtext/radio.xsd”>

<!—radio stuff goes here -- > </radio>

xsi:schemaLocation Locates schemas for elements and attributes that are in a specified namespace <radio xmlns= ”http://www.opentourism.org/xmtext/NS/radio xmlns:xsi= ”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation= ”http://www.arches.uga.eduNS/radio”http://www.opentourism.org/xmtext/radio.xsd”>

<!—radio stuff goes here -- > </radio>

xsi:type Can be used in instance documents to indicate the type of an element. <height xsi:type=”xsd:decimal”>78.9</height>


For more information on XML Schema structures, data types, and tools you can visit http://www.w3.org/XML/Schema.