Introduction to XML for Web Developers (Part 7 of 8)
The pipe character is used to specify an "OR" operation. Thus, the following DTD snippet would specify an XML document in which all
CONTACT elements would have a
NAME child followed by either a
PHONE or an
<!ELEMENT CONTACT (NAME, (PHONE | EMAIL))> <!ELEMENT NAME (#PCDATA)> <!ELEMENT EMAIL (#PCDATA)>
Note that XML regular expression matching is not a short circuited system. OR's imply one or the other but not both and not neither. Is that a tongue twister or what!?! Using examples to make my point, here are several invalid XML snippets based on the DTD snippet above....
<CONTACT> <NAME>Jim Sanger</NAME> </CONTACT>
That is invalid because the DTD specified that every
CONTACT must have either a
PHONE or an
<CONTACT> <NAME>Jim Sanger</NAME> <EMAIL>Jim Sanger</EMAIL> <PHONE>Jim Sanger</PHONE> </CONTACT>
This one is invalid because the contact has BOTH
<CONTACT> <EMAIL>Jim Sanger</EMAIL> <NAME>Jim Sanger</NAME> </CONTACT>
This one is wrong because
NAME must appear before
NOTE: within a grouping, you may use only one connector (such as , or |). Thus, it is invalid to use
<!ELEMENT CONTACT (NAME, PHONE | EMAIL)>
Instead, you must create a subgroup as shown above
<!ELEMENT CONTACT (NAME, (PHONE | EMAIL))>
Using the "?" character specifies that the element named is optional. Thus, in the following code snippet, we specify that every
CONTACT must have a
NAME and either a
<!ELEMENT CONTACT (NAME, (PHONE | EMAIL), ADDRESS?)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT PHONE (#PCDATA)> <!ELEMENT EMAIL (#PCDATA)> <!ELEMENT ADDRESS (STREET+, CITY, STATE, ZIP, COUNTRY?) <!ELEMENT STREET (#PCDATA)> <!ELEMENT CITY (#PCDATA)> <!ELEMENT STATE (#PCDATA)> <!ELEMENT ZIP (#PCDATA)> <!ELEMENT COUNTRY (#PCDATA)>
In certain, probably rare circumstances, you will wish to include parsed character data as a valid element. Mixing content works as expected. Thus, the following XML document would be valid.
<?xml version = "1.0" encoding="UTF-8" standalone = "yes"?> <!DOCTYPE CONTACTS [ <!ELEMENT CONTACTS ANY> <!ELEMENT CONTACT (NAME | EMAIL | PHONE | #PCDATA)*> <!ELEMENT NAME (#PCDATA)> <!ELEMENT EMAIL (#PCDATA)> <!ELEMENT PHONE (#PCDATA)> ]> <CONTACTS> <CONTACT> <NAME>Roger Kaplan</NAME> <EMAIL>firstname.lastname@example.org</EMAIL> <PHONE>1800YOMOMMA</PHONE> Roger is a swingin' hep cat! </CONTACT> </CONTACTS>
Finally, we must mention the syntax for defining an empty tag. Of course, there is not much to it, you simply use the
EMPTY keyword such as
<!ELEMENT HR EMPTY>
In your XML code, you will have an element such as
Defining Valid Element Attributes
Well, as you might expect, just as you use the DTD to define valid elements, you also use the DTD to define valid element attributes.
We already went over attributes in the last section, but to refresh your memory, we used the following example where
COLORING were attributes of the
<SHOE STYLE = "SPECTATOR" COLORING = "BLACK_AND_WHITE">
To declare attributes in the DTD you use the general format of:
<!ATTLIST ELEMENT_NAME ATTRIBUTE_NAME TYPE DEFAULT_VALUE>
ELEMENT_NAME is equal to the element in which the attribute appears such as "
SHOE" in the example above.
ATTRIBUTE_NAME is equal to the name of the attribute such as "
STYLE" or "
COLORING" in the example above.
DEFAULT_VALUE specifies the value that is used if none is specified by the document author. there are several keywords that define standard defaults....
ATTLIST is a list it can have repeated attribute parts (and often does). Consider the following
<!ATTLIST Port name NMTOKEN #REQUIRED hostName NMTOKEN #IMPLIED function %funcType; #REQUIRED number CDATA #REQUIRED type %serverType; serverPort %fbool; %basicAttrs;>
Of the four pieces,
DEFAULT_VALUE require some discusison. Let's look at DEFAULT VALUES first.
REQUIRED flag specifies that though there is no default value provided by the DTD, the attribute when actually implemented in an XML document must define a value. for example, suppose you wanted to define a standard
PAGE_AUTHOR element that could be added to every page on any site that used it. Your intent is to make sure that every author provides contact information for bugs and broken links. However, you won't know in advance what the default values should be because everyone who implements your DTD will have different personal information. Thus, you can make the contact information attributes required, while not providing defaults.
When you use the
IMPLIED default, you will provide a default value for the document author. If the document author does not override your default, your default will be used.
Sometimes you will want to provide a default value that the document author may not modify. In that case, you will use
Aside from defaults, there are 10 TYPEs of content for attributes including
Let's take a look at each....
CDATA refers to plain old character data that may be any string of characters that does not include ampersands (&), less than signs, (<), or quotation marks ("). Of course, as we discussed earlier, you may use the escaped characters such as &, <, or " if you must include those forbidden characters
<?xml version = "1.0" encoding="UTF-8" standalone = "yes"?> <!DOCTYPE SCRIPT [ <!ELEMENT SCRIPT ANY> <!ELEMENT DIALOG (#PCDATA)> <!ATTLIST DIALOG ACTOR CDATA> ]> <SCRIPT> <DIALOG ACTOR = "Hanks">I don't think so!</DIALOG> <DIALOG ACTOR = "Ryan">Why not?</DIALOG> </SCRIPT>
Enumerated (The keyword is not actually used)
A list of acceptable pipe delimited values from which the document author must choose.
Note that in the example below,
CHICKEN is assumed by default if no
TYPE is specified.
<?xml version = "1.0" encoding="UTF-8" standalone = "yes"?> <!DOCTYPE GROCERY_BASKET [ <!ELEMENT GROCERY_BASKET ANY> <!ELEMENT MEAT EMPTY> <!ATTLIST MEAT (CHICKEN | BEEF | PORK | FISH) "CHICKEN"> ]> <GROCERY_BASKET> <MEAT TYPE = "FISH"/> <MEAT TYPE = "BEEF"/> <MEAT/> </GROCERY_BASKET>