Introduction to XML for Web Developers (Part 6 of 8)

Related Articles

<<Part 1  [2]  [3] [4] [5] 

Defining Elements and their Children

In our previous example, we explained that we had defined an element named CONTACT, that was allowed to include a single ELEMENT NAME, that in turn contained parsed character data.

    <?xml version = "1.0" encoding="UTF-8" standalone = "yes"?>
    <!DOCTYPE CONTACTS [
	<!ELEMENT CONTACTS ANY>
	<!ELEMENT CONTACT (NAME)>
	<!ELEMENT NAME (#PCDATA)>
	]>

    <CONTACTS>

    <CONTACT>
        <NAME>Roger Kaplan</NAME>
    </CONTACT>

    </CONTACTS>

Well, truthfully, we were "mostly" right in our explanation of the DTD. More correctly, the example defined an element named CONTACT that was REQUIRED to have a child NAME.

Remember that DTDs give you quite a bit of flexibility to specify exactly what elements can contain. Using regular expression pattern matching, DTDs allow you to specify very complex logical relationships between elements and their children

For example, you could specify such things as: an element may contain a child, one or more children, zero or more children, or at least one child, You could also specify more complex relationships such as element X is valid if it contains one or more children named Y OR one Child named Z.

Element definitions are described by their Element Content Models (ECM)....that is, all the stuff in the parentheses. :)

Thus, as we saw, the ECM of the CONTACT element specified the child element NAME:

	<!ELEMENT CONTACT (NAME)>

The contents of the ECM are governed by a set of regular expression rules very similar to those used in UNIX. But if you are not familiar with UNIX, don't worry, it is pretty easy. The idea of regular expressions is that certain characters are used to communicate matching logic. Take a look at the possible meta characters....

Character Meaning
+ One or more occurrence
* Zero or more occurrences
? Optional
() A group of expressions to be matched together
| OR...as in, "this or that"
, Strictly ordered. Like an AND
ELEMENT_A ELEMENT_B ELEMENT_C Unordered and list.

Of course, these are best seen by example. Let's consider the simplest case of defining an order of child elements.

Ordering Child Elements

Consider the following DTD snippet....

	<!ELEMENT CONTACT (NAME, EMAIL)>
	<!ELEMENT NAME (#PCDATA)>
	<!ELEMENT EMAIL (#PCDATA)>

In this case, we expect to see XML along the lines of

	<CONTACT>
	<NAME>Jim Sanger</NAME>
	<EMAIL>sanger@sanger.com</EMAIL>
	</CONTACT>

Alternatively, the following code would be valid:

	<CONTACT>
	<EMAIL>sanger@sanger.com</EMAIL>
	<NAME>Jim Sanger</NAME>
	</CONTACT>

We used a comma to order the list because all children must be ordered. We could use a pipe to delimit a list of non-ordered, optional elements, however. [thanks to Jason Suwala for pointing our error on unordered children--ed]. Thus if we redefined our DTD to use

	<!ELEMENT CONTACT (NAME, EMAIL)>
	<!ELEMENT NAME (#PCDATA)>
	<!ELEMENT EMAIL (#PCDATA)>

Then the following XML would be valid

	<CONTACT>
	<NAME>Jim Sanger</NAME>
	<EMAIL>sanger@sanger.com</EMAIL>
	</CONTACT>

but the following XML would be invalid because the EMAIL element would not be allowed to precede the NAME element.

	<CONTACT>
	<EMAIL>sanger@sanger.com</EMAIL>
	<NAME>Jim Sanger</NAME>
	</CONTACT>

Repeated Elements

What do you think the following DTD snippet would imply?

	<!ELEMENT CONTACT (NAME, EMAIL+)>
	<!ELEMENT NAME (#PCDATA)>
	<!ELEMENT EMAIL (#PCDATA)>

Take a look at the regular expression character chart above and guess. That is right! It would mean that a CONTACT element could have a NAME element followed by one or more EMAIL elements. Thus, the following XML would be valid

	<CONTACT>
	<NAME>Jim Sanger</NAME>
	<EMAIL>sanger@sanger.com</EMAIL>
	<EMAIL>sanger@yahoo.com</EMAIL>
	<EMAIL>sanger@netscape.com</EMAIL>
	</CONTACT>

What about the following?

	<CONTACT>
	<NAME>Jim Sanger</NAME>
	</CONTACT>

Well that would be invalid because the "+" sign specifies "one or more". To allow for "zero or more" occurrences, you must use a "*" such as

	<!ELEMENT CONTACT (NAME, EMAIL*)>
	<!ELEMENT NAME (#PCDATA)>
	<!ELEMENT EMAIL (#PCDATA)>

Grouping Elements

Children can be grouped using parentheses. Thus, the following DTD snippet would specify that a CONTACT element could have one or more sets of NAME/EMAIL children such that NAME always precedes EMAIL.

	<!ELEMENT CONTACT (NAME, EMAIL)+>
	<!ELEMENT NAME (#PCDATA)>
	<!ELEMENT EMAIL (#PCDATA)>

That would look something like the following:

	<CONTACT>

	<NAME>Jim Sanger</NAME>
	<EMAIL>sanger@sanger.com</EMAIL>

	<NAME>James Sanger</NAME>
	<EMAIL>james.sanger@sanger.com</EMAIL>

	<NAME>Kris Kringle</NAME>
	<EMAIL>santa@sanger.com</EMAIL>

	</CONTACT>

Part 7>>


Publication Date: Tuesday 19th August, 2003
Author: Selena Sol View profile

Related Articles