XML - Managing Data Exchange/Basic data structures

From Wikibooks, open books for an open world
Jump to navigation Jump to search

XML - Managing Data Exchange
Related Topics
Computer Science Home
Library and Information Science Home
Markup Languages
Get Involved
To do list
Contributors list
Contributing to Wikibooks
Previous Chapter Next Chapter
A single entity The one-to-many relationship

Learning objectives
  • introduce the concept and uses of basic data structures
  • describe how XML may be used to represent basic data structures
  • enumerate common technical considerations

Introduction[edit | edit source]

In reviewing the four central problems in data management, (capture, storage, retrieval, and exchange) the typical user of XML encounters recurring fundamental structural patterns that apply to all sorts of data throughout the storage and exchange phases. These patterns recur consistently because their use transcends the particular contexts in which the underlying data are processed. We call these patterns "data structures" (or datatypes).

In this section, we discuss a few of the most fundamental "basic data structures" and explain why they are useful, as well as how to work with them using XML.

We start our introduction with a simple example. Consider an ordinary grocery shopping list for a single-person household.

Introductory Shopping List Example:

   Andy's shopping list:
   * eggs
   * cough syrup(pick up for granny)
   * orange juice  
   * bread
   * laundry detergent **don't forget this**

When analyzing aspects of the information contained in this shopping list, we can make some basic generalizations:

  • Portability: the shopping list can be represented and transferred easily. If necessary, it could be stored in a database and processed by custom-designed software, but it could just as easily be written on a scrap of paper;
  • Comprehensibility: the shopping list is readily understood by its intended audience (in this instance, the sole person who wrote the list) and therefore needs no additional information or structure in order to be immediately usable;
  • Adaptability: if any changes become necessary (such as additions or removals to the list) there is an existing and well-known methodology for accomplishing this (e.g., in the case of a handwritten list, simply write down new entries or cross out unwanted entries).

The fundamental concept of basic data structures[edit | edit source]

Given that we have the previous example for background, we can now introduce the fundamental concept of "basic data structures".

The concept of "basic data structures" describes the fundamental conventions we use to store our data, so that we can more easily exchange our data. When we follow these fundamental conventions, we help to ensure the portability, comprehensibility and adaptability of information.

Basic data structures defined[edit | edit source]

Now that we have introduced our concept of data structures, we can start with some concrete definitions, and then review those definitions in the context of our shopping list example.

Overview of "core" data structures[edit | edit source]

The following terms define some "core" data structures[1] that we use throughout this chapter. This list is ordered in ascending degrees of complexity:

  • SimpleBoolean: Any value capable of being expressed as either "True" or "False".
  • SimpleString: A contiguous sequence of characters, including both alphanumeric and non-alphanumeric.
  • SimpleSequence: An enumeration of items generally accessible by numeric indexing.
  • Name-value pair: An arbitrary singular name attached to a singular value.
  • SimpleDictionary: An enumeration of items generally accessible by alphanumeric indexing.
  • SimpleTable: An ordered arrangement of columns and rows. A SimpleTable can be classified as a "composite" data structure (e.g., SimpleSequence where each item in the sequence is a single SimpleDictionary).

An important point to remember while reviewing these "core" data structures is that they are elemental and complementary. That is, the core structures, when used in combination, can form even more complex structures. Once the reader comes to understand this fact, it will become apparent that there is no conceivable application or data specification that cannot be wholly described in XML using nothing more than these "core" data structures.

Once we understand the "core" data structures, we can use them in combination to represent any conceivable kind of structured information.

Now review the "Introductory Shopping List Example" above. When we compare it with the "core" data structures that we've just defined, we can make some fairly straightforward observations:

  • The entire shopping list cannot be represented using a SimpleBoolean data structure, because the information is more complex than either "True" or "False".
  • The entire shopping list can be represented using a SimpleString.
  • There may be reasons why we would not want to use a SimpleString to represent the entire shopping list. For example, we might want to transfer the list into a database or other software application and then be able to sort, query, duplicate or otherwise process individual items on the list. Treating the entire list as a SimpleString would therefore complicate our processing requirements.

SimpleString[edit | edit source]

Different ways to represent a SimpleString in XML:

    <String note="This XML attribute contains a SimpleString.">
    This XML Text Node represents a SimpleString.

    <!-- This XML comment contains a SimpleString -->
    <![CDATA[ This XML CDATA section contains a SimpleString. ]]>

SimpleSequence[edit | edit source]

Different ways to represent a SimpleSequence in XML:

    <!-- use a single XML attribute with a space-delimited list of items -->
    <ShoppingList items="bread eggs milk juice" />

    <!-- use a single XML attribute with a semicolon-delimited list of items 
         (this allows us to add items with spaces in them) -->
    <ShoppingList items="bread;cough syrup;milk;juice;laundry detergent"  />

    <!-- yet another way (but not necessarily a good way) 
         using multiple XML attributes -->
    <ShoppingList item00="bread" item01="eggs" item02="cough syrup" />

    <!-- yet another way 
         using XML child elements -->
        <item>eggs</item><item>milk</item><item>cough syrup</item>

Name-value pair[edit | edit source]

SimpleDictionary[edit | edit source]

SimpleTable[edit | edit source]

Side-by-side examples[edit | edit source]

SimpleTable (XML_Elem):

    <tr><item>laundry detergent</item><getfor>andy</getfor><notes></notes></tr>
    <tr><item>cough syrup</item><getfor>granny</getfor><notes>try to get grape flavor</notes></tr>

SimpleTable (XML_Attr):

    <tr item="eggs"         getfor="andy"   notes=""    />
    <tr item="milk"         getfor="andy"   notes=""    />
    <tr item="laundry detergent"  getfor="andy"   notes=""  />
    <tr item="cough syrup"  getfor="granny" notes="try to get grape flavor"    />

SimpleTable (XML_Mixed):

        <item getfor="andy" >eggs</item><notes></notes>
        <item getfor="andy" >milk</item><notes></notes>
        <item getfor="andy" >laundry detergent</item><notes></notes>
        <item getfor="granny">cough syrup</item><notes>try to get grape flavor</notes>

Basic data structures in programming[edit | edit source]

To further illustrate how basic data structures apply in many different contexts, some of the basic data structures enumerated previously are examined and compared here in the context of computer programming.

For the first part of the comparison, we examine the generic terminology against that used commonly in programming languages:

  • SimpleBoolean: is commonly called a boolean and can usually take the values true or false, 0 or 1, or other values, depending on the language.
  • SimpleString: commonly called a string or stringBuffer.
  • SimpleSequence: numerically indexed variables in programming are commonly represented with an array.
  • Name-value pair: (explained in more detail below)
  • SimpleDictionary: these are commonly represented with a dictionary, or an associative array.
  • SimpleTable: (explained in more detail below)

Technical considerations[edit | edit source]

Now that we've introduced and discussed specific examples of the basic data structures, there are a few technical considerations that apply to all of the data structures, and are particularly important to those who may be responsible for implementing and designing XML schemas to deal with specific implementation scenarios.

  • Exact terminology depends on context: Although the "basic" structures described here apply to many different scenarios, the terms used to describe them can overlap or conflict. For example, the term "SimpleSequence" as used here closely coincides with what is called an "array" in many programming languages. Similarly, the term "SimpleDictionary" is shorthand for what some programming languages call an "associative array". Although this close correlation is intentional, one must always remember that the specific nuances of an application or programming language will require additional attention. Sometimes minor conflicts or discrepancies arise when one digs into the details for any specific data structure in any given project or technology.
  • Basic structures are flexible concepts: Structures can be defined in terms of one another, and some structures can be applied recursively. For example, one could easily define a SimpleSequence using a SimpleString along with some basic assumptions. (e.g., a SimpleSequence is a string of alphanumeric characters where each item in the sequence is separated by one or more whitespace characters: "eggs bread butter milk").
  • Abstract structures tend to hide tricky details: For example, the term "SimpleString" describes the abstract notion of a sequence of characters (e.g., "ISBN 0-596-00327-7"). The abstract notion is fairly intuitive and uncomplicated. Nevertheless, the precise notation used to implement that abstract notion, and represent it in real-live working code is a different matter entirely. Different programming languages and different environments may use different conventions for representing the same "string". Because of this variability, one can also recognize that the abstract notion of a "SimpleString" in XML is also subject to differing representations, based on the needs of any given project.

Notes and references[edit | edit source]

  1. An important note: the basic terms used here are generalizations. Although they may coincide with terms used in specific software, specific programming languages, or specific applications, these are not intended as technically precise definitions. The concepts described here are presented to help emphasize the context-neutral principle of interoperability in XML.