Introduction to XML for Web Developers (Part 7 of 8)

Related Articles

<<Part 1  [2]  [3] [4] [5] [6] 

Either/Or

The pipe character is used to specify an "OR" operation. Thus, the following DTD snippet would specify an XML document in which all CONTACT elements would have a NAME child followed by either a PHONE or an EMAIL element (but not both).

	<!ELEMENT CONTACT (NAME, (PHONE | EMAIL))>
	<!ELEMENT NAME (#PCDATA)>
	<!ELEMENT EMAIL (#PCDATA)>

Note that XML regular expression matching is not a short circuited system. OR's imply one or the other but not both and not neither. Is that a tongue twister or what!?! Using examples to make my point, here are several invalid XML snippets based on the DTD snippet above....

	<CONTACT>
	<NAME>Jim Sanger</NAME>
	</CONTACT>

That is invalid because the DTD specified that every CONTACT must have either a PHONE or an EMAIL. The above has neither.

	<CONTACT>
	<NAME>Jim Sanger</NAME>
	<EMAIL>Jim Sanger</EMAIL>
	<PHONE>Jim Sanger</PHONE>
	</CONTACT>

This one is invalid because the contact has BOTH EMAIL and PHONE children.

	<CONTACT>
	<EMAIL>Jim Sanger</EMAIL>
	<NAME>Jim Sanger</NAME>
	</CONTACT>

This one is wrong because NAME must appear before EMAIL or PHONE

NOTE: within a grouping, you may use only one connector (such as , or |). Thus, it is invalid to use

	<!ELEMENT CONTACT (NAME, PHONE | EMAIL)>

Instead, you must create a subgroup as shown above

	<!ELEMENT CONTACT (NAME, (PHONE | EMAIL))>

Optional Children

Using the "?" character specifies that the element named is optional. Thus, in the following code snippet, we specify that every CONTACT must have a NAME and either a PHONE or EMAIL and may have an optional ADDRESS child.

	<!ELEMENT CONTACT (NAME, (PHONE | EMAIL), ADDRESS?)>
	<!ELEMENT NAME (#PCDATA)>
	<!ELEMENT PHONE (#PCDATA)>
	<!ELEMENT EMAIL (#PCDATA)>
	<!ELEMENT ADDRESS (STREET+, CITY, STATE, ZIP, COUNTRY?)
	<!ELEMENT STREET (#PCDATA)>
	<!ELEMENT CITY (#PCDATA)>
	<!ELEMENT STATE (#PCDATA)>
	<!ELEMENT ZIP (#PCDATA)>
	<!ELEMENT COUNTRY (#PCDATA)>

Mixed Content

In certain, probably rare circumstances, you will wish to include parsed character data as a valid element. Mixing content works as expected. Thus, the following XML document would be valid.

    <?xml version = "1.0" encoding="UTF-8" standalone = "yes"?>
    <!DOCTYPE CONTACTS [
	<!ELEMENT CONTACTS ANY>
	<!ELEMENT CONTACT (NAME | EMAIL | PHONE | #PCDATA)*>
	<!ELEMENT NAME (#PCDATA)>
	<!ELEMENT EMAIL (#PCDATA)>
	<!ELEMENT PHONE (#PCDATA)>
	]>

    <CONTACTS>

    <CONTACT>
        <NAME>Roger Kaplan</NAME>
        <EMAIL>rabbit@kaplan.com</EMAIL>
        <PHONE>1800YOMOMMA</PHONE>
	Roger is a swingin' hep cat!
    </CONTACT>

    </CONTACTS>

Empty Elements

Finally, we must mention the syntax for defining an empty tag. Of course, there is not much to it, you simply use the EMPTY keyword such as

<!ELEMENT HR EMPTY>

In your XML code, you will have an element such as <HR/>

Defining Valid Element Attributes

Well, as you might expect, just as you use the DTD to define valid elements, you also use the DTD to define valid element attributes.

We already went over attributes in the last section, but to refresh your memory, we used the following example where STYLE and COLORING were attributes of the SHOE element.

	<SHOE STYLE = "SPECTATOR" COLORING = "BLACK_AND_WHITE">

To declare attributes in the DTD you use the general format of:

	<!ATTLIST ELEMENT_NAME ATTRIBUTE_NAME TYPE DEFAULT_VALUE>

ELEMENT_NAME is equal to the element in which the attribute appears such as "SHOE" in the example above.

ATTRIBUTE_NAME is equal to the name of the attribute such as "STYLE" or "COLORING" in the example above.

DEFAULT_VALUE specifies the value that is used if none is specified by the document author. there are several keywords that define standard defaults....

NOTE: Since ATTLIST is a list it can have repeated attribute parts (and often does). Consider the following ATTLIST definition.

	<!ATTLIST Port
        name            NMTOKEN #REQUIRED
        hostName        NMTOKEN #IMPLIED
        function        %funcType; #REQUIRED
        number          CDATA #REQUIRED
        type            %serverType;
        serverPort      %fbool;
        %basicAttrs;>

Of the four pieces, TYPE and DEFAULT_VALUE require some discusison. Let's look at DEFAULT VALUES first.

Attribute Defaults

Required

The REQUIRED flag specifies that though there is no default value provided by the DTD, the attribute when actually implemented in an XML document must define a value. for example, suppose you wanted to define a standard PAGE_AUTHOR element that could be added to every page on any site that used it. Your intent is to make sure that every author provides contact information for bugs and broken links. However, you won't know in advance what the default values should be because everyone who implements your DTD will have different personal information. Thus, you can make the contact information attributes required, while not providing defaults.

Implied

When you use the IMPLIED default, you will provide a default value for the document author. If the document author does not override your default, your default will be used.

Fixed

Sometimes you will want to provide a default value that the document author may not modify. In that case, you will use FIXED.

Attribute Types

Aside from defaults, there are 10 TYPEs of content for attributes including

  • CDATA
  • Enumerated
  • ID
  • IDREF
  • IDREFS
  • ENTITY
  • ENTITIES
  • NMTOKEN
  • NMTOKENS
  • NOTATION

Let's take a look at each....

CDATA TYPE

CDATA refers to plain old character data that may be any string of characters that does not include ampersands (&), less than signs, (<), or quotation marks ("). Of course, as we discussed earlier, you may use the escaped characters such as &, <, or " if you must include those forbidden characters

<?xml version = "1.0"
         encoding="UTF-8"
         standalone = "yes"?>
<!DOCTYPE SCRIPT [
	<!ELEMENT SCRIPT ANY>
	<!ELEMENT DIALOG (#PCDATA)>
	<!ATTLIST DIALOG ACTOR CDATA>
	]>
<SCRIPT>
<DIALOG ACTOR = "Hanks">I don't think so!</DIALOG>
<DIALOG ACTOR = "Ryan">Why not?</DIALOG>
</SCRIPT>

Enumerated (The keyword is not actually used)

A list of acceptable pipe delimited values from which the document author must choose.

Note that in the example below, CHICKEN is assumed by default if no TYPE is specified.

<?xml version = "1.0"
         encoding="UTF-8"
         standalone = "yes"?>
<!DOCTYPE GROCERY_BASKET [
	<!ELEMENT GROCERY_BASKET ANY>
	<!ELEMENT MEAT EMPTY>
	<!ATTLIST MEAT (CHICKEN |
			BEEF |
			PORK |
			FISH) "CHICKEN">
	]>
<GROCERY_BASKET>
<MEAT TYPE = "FISH"/>
<MEAT TYPE = "BEEF"/>
<MEAT/>
</GROCERY_BASKET>

Part 8>>


Publication Date: Tuesday 19th August, 2003
Author: Selena Sol View profile

Related Articles