XML - Managing Data Exchange/Basic data structures
|Previous Chapter||Next Chapter|
|← A single entity||The one-to-many relationship →|
In reviewing the four central problems in data management, (capture, storage, retrieval, and exchange) the typical user of XML encounters recurring fundamental structural patterns that apply to all sorts of data throughout the storage and exchange phases. These patterns recur consistently because their use transcends the particular contexts in which the underlying data are processed. We call these patterns "data structures" (or datatypes).
In this section, we discuss a few of the most fundamental "basic data structures" and explain why they are useful, as well as how to work with them using XML.
We start our introduction with a simple example. Consider an ordinary grocery shopping list for a single-person household.
Introductory Shopping List Example:
Andy's shopping list: * eggs * cough syrup(pick up for granny) * orange juice * bread * laundry detergent **don't forget this**
When analyzing aspects of the information contained in this shopping list, we can make some basic generalizations:
- Portability: the shopping list can be represented and transferred easily. If necessary, it could be stored in a database and processed by custom-designed software, but it could just as easily be written on a scrap of paper;
- Comprehensibility: the shopping list is readily understood by its intended audience (in this instance, the sole person who wrote the list) and therefore needs no additional information or structure in order to be immediately usable;
- Adaptability: if any changes become necessary (such as additions or removals to the list) there is an existing and well-known methodology for accomplishing this (e.g., in the case of a handwritten list, simply write down new entries or cross out unwanted entries).
The fundamental concept of basic data structures
Given that we have the previous example for background, we can now introduce the fundamental concept of "basic data structures".
Basic data structures defined
Now that we have introduced our concept of data structures, we can start with some concrete definitions, and then review those definitions in the context of our shopping list example.
Overview of "core" data structures
The following terms define some "core" data structures that we use throughout this chapter. This list is ordered in ascending degrees of complexity:
- SimpleBoolean: Any value capable of being expressed as either "True" or "False".
- SimpleString: A contiguous sequence of characters, including both alphanumeric and non-alphanumeric.
- SimpleSequence: An enumeration of items generally accessible by numeric indexing.
- Name-value pair: An arbitrary singular name attached to a singular value.
- SimpleDictionary: An enumeration of items generally accessible by alphanumeric indexing.
- SimpleTable: An ordered arrangement of columns and rows. A SimpleTable can be classified as a "composite" data structure (e.g., SimpleSequence where each item in the sequence is a single SimpleDictionary).
An important point to remember while reviewing these "core" data structures is that they are elemental and complementary. That is, the core structures, when used in combination, can form even more complex structures. Once the reader comes to understand this fact, it will become apparent that there is no conceivable application or data specification that cannot be wholly described in XML using nothing more than these "core" data structures.
|Once we understand the "core" data structures, we can use them in combination to represent any conceivable kind of structured information.|
Now review the "Introductory Shopping List Example" above. When we compare it with the "core" data structures that we've just defined, we can make some fairly straightforward observations:
- The entire shopping list cannot be represented using a SimpleBoolean data structure, because the information is more complex than either "True" or "False".
- The entire shopping list can be represented using a SimpleString.
- There may be reasons why we would not want to use a SimpleString to represent the entire shopping list. For example, we might want to transfer the list into a database or other software application and then be able to sort, query, duplicate or otherwise process individual items on the list. Treating the entire list as a SimpleString would therefore complicate our processing requirements.
Different ways to represent a SimpleString in XML:
<Example> <String note="This XML attribute contains a SimpleString."> This XML Text Node represents a SimpleString. </String> <!-- This XML comment contains a SimpleString --> <![CDATA[ This XML CDATA section contains a SimpleString. ]]> </Example>
Different ways to represent a SimpleSequence in XML:
<Example> <!-- use a single XML attribute with a space-delimited list of items --> <ShoppingList items="bread eggs milk juice" /> <!-- use a single XML attribute with a semicolon-delimited list of items (this allows us to add items with spaces in them) --> <ShoppingList items="bread;cough syrup;milk;juice;laundry detergent" /> <!-- yet another way (but not necessarily a good way) using multiple XML attributes --> <ShoppingList item00="bread" item01="eggs" item02="cough syrup" /> <!-- yet another way using XML child elements --> <ShoppingList> <item>eggs</item><item>milk</item><item>cough syrup</item> </ShoppingList> </Example>
<table> <tr><item>eggs</item><getfor>andy</getfor><notes></notes></tr> <tr><item>milk</item><getfor>andy</getfor><notes></notes></tr> <tr><item>laundry detergent</item><getfor>andy</getfor><notes></notes></tr> <tr><item>cough syrup</item><getfor>granny</getfor><notes>try to get grape flavor</notes></tr> </table>
<table> <tr item="eggs" getfor="andy" notes="" /> <tr item="milk" getfor="andy" notes="" /> <tr item="laundry detergent" getfor="andy" notes="" /> <tr item="cough syrup" getfor="granny" notes="try to get grape flavor" /> </table>
<table> <tr> <item getfor="andy" >eggs</item><notes></notes> </tr> <tr> <item getfor="andy" >milk</item><notes></notes> </tr> <tr> <item getfor="andy" >laundry detergent</item><notes></notes> </tr> <tr> <item getfor="granny">cough syrup</item><notes>try to get grape flavor</notes> </tr> </table>
Basic data structures in programming
To further illustrate how basic data structures apply in many different contexts, some of the basic data structures enumerated previously are examined and compared here in the context of computer programming.
For the first part of the comparison, we examine the generic terminology against that used commonly in programming languages:
- SimpleBoolean: is commonly called a
booleanand can usually take the values
1, or other values, depending on the language.
- SimpleString: commonly called a
- SimpleSequence: numerically indexed variables in programming are commonly represented with an
- Name-value pair: (explained in more detail below)
- SimpleDictionary: these are commonly represented with a
dictionary, or an
- SimpleTable: (explained in more detail below)
Now that we've introduced and discussed specific examples of the basic data structures, there are a few technical considerations that apply to all of the data structures, and are particularly important to those who may be responsible for implementing and designing XML schemas to deal with specific implementation scenarios.
- Exact terminology depends on context: Although the "basic" structures described here apply to many different scenarios, the terms used to describe them can overlap or conflict. For example, the term "SimpleSequence" as used here closely coincides with what is called an "array" in many programming languages. Similarly, the term "SimpleDictionary" is shorthand for what some programming languages call an "associative array". Although this close correlation is intentional, one must always remember that the specific nuances of an application or programming language will require additional attention. Sometimes minor conflicts or discrepancies arise when one digs into the details for any specific data structure in any given project or technology.
- Basic structures are flexible concepts: Structures can be defined in terms of one another, and some structures can be applied recursively. For example, one could easily define a SimpleSequence using a SimpleString along with some basic assumptions. (e.g., a SimpleSequence is a string of alphanumeric characters where each item in the sequence is separated by one or more whitespace characters: "eggs bread butter milk").
- Abstract structures tend to hide tricky details: For example, the term "SimpleString" describes the abstract notion of a sequence of characters (e.g., "ISBN 0-596-00327-7"). The abstract notion is fairly intuitive and uncomplicated. Nevertheless, the precise notation used to implement that abstract notion, and represent it in real-live working code is a different matter entirely. Different programming languages and different environments may use different conventions for representing the same "string". Because of this variability, one can also recognize that the abstract notion of a "SimpleString" in XML is also subject to differing representations, based on the needs of any given project.
Notes and references
- An important note: the basic terms used here are generalizations. Although they may coincide with terms used in specific software, specific programming languages, or specific applications, these are not intended as technically precise definitions. The concepts described here are presented to help emphasize the context-neutral principle of interoperability in XML.