Introduction to XML for Web Developers (Part 8 of 8)

Related Articles

<<Part 1  [2] [3] [4] [5] [6] [7] 

ID and IDREF

ID represents a unique ID name for the attribute that identifies the element within the context of the document. IDs are much like internal links in plain HTML. For the most part, ID is used primarily by programs or scripting languages that process the document. The value for ID must be a valid XML name beginning with a letter and containing alphanumeric characters or the underscore character without any whitespace.

NOTE: ID is incompatible with the #FIXED keyword but usually appears in conjunction with the #REQUIRED keyword (we'll discuss these later). Of course, while ID is usually #REQUIRED, the reverse is definitely not true.

Also, take care that your ID values are _unique_ within a document!

<?xml version = "1.0"
         encoding="UTF-8"
         standalone = "yes"?>
<!DOCTYPE CONTACTS [
	<!ELEMENT CONTACTS ANY>
	<!ELEMENT CONTACT (NAME, EMAIL)>
	<!ELEMENT NAME (#PCDATA)>
	<!ELEMENT EMAIL (#PCDATA)>
	<!ATTLIST CONTACT CONTACT_NUM ID #REQUIRED>
	]>
<CONTACTS>

    <CONTACT CONTACT_NUM = "1">
    <NAME>Lok Siu</NAME>
    <EMAIL>siu@lok.com</EMAIL>
    </CONTACT>

    <CONTACT CONTACT_NUM = "2">
    <NAME>Joseph Misuraca</NAME>
    <EMAIL>joe@misuraca.com</EMAIL>
    </CONTACT>

</CONTACTS>

The IDREF type allows the value of one attribute to be an element elsewhere in the document provided that the value of the IDREF is the ID value of the referenced element.

<?xml version = "1.0"
         encoding="UTF-8"
         standalone = "yes"?>
<!DOCTYPE CONTACTS [
	<!ELEMENT CONTACTS ANY>
	<!ELEMENT CONTACT (NAME, EMAIL)>
	<!ELEMENT NAME (#PCDATA)>
	<!ELEMENT EMAIL (#PCDATA)>
	<!ATTLIST CONTACT CONTACT_NUM ID #REQUIRED>
	<!ATTLIST CONTACT MOTHER IDREF #IMPLIED>
	]>
<CONTACTS>

    <CONTACT CONTACT_NUM = "2">
    <NAME>Teri Mancuso</NAME>
    <EMAIL>teri@teri.com</EMAIL>
    </CONTACT>

    <CONTACT CONTACT_NUM = "1" MOTHER = "2">
    <NAME>Kristin Mancuso</NAME>
    <EMAIL>kristin@kristin.com</EMAIL>
    </CONTACT>

</CONTACTS>

NMTOKEN and NMTOKENS

These types represent more of those types that are useful primarily to processing applications. The types are used to specify a valid name(s). You might use them when you are associating some other component with the element, such as a Java class or a security algorithm

<!ATTLIST DATA AUTHORIZED_USERS NMTOKENS #IMPLIED>
<DATA SECURITY="ON"
AUTHORIZED_USERS = "IggieeB SelenaS GuntherB">
blah blah blah
</DATA>

Notation Type

This type allows an attribute to have a value specified by a notation declared in the DTD in cases in which you want certain consequences to follow from the attribute. these are usually used as triggers such as when you want to specify a given player for a given file type.

<?xml version = "1.0"
         encoding="UTF-8"
         standalone = "yes"?>
<!DOCTYPE DOCUMENT [
	<!ELEMENT DOCUMENT ANY>
	<!ELEMENT MOVIE EMPTY>
	<!ATTLIST MOVIE SOURCE ENTITY #REQUIRED>
	<!ATTLIST MOVIE PLAYER NOTATION #REQUIRED>
	<!ENTITY BladeRunner SYSTEM "dvds/BR/br.mov">
	<!NOTATION mp SYSTEM "movPlayer.exe">
	]>

<DOCUMENT>
<MOVIE SOURCE = "&BladeRunner;" PLAYER = "mp"/>
</DOCUMENT>

Entity Declarations

In the last section, we touched briefly upon the concept of entities. If you recall, both general and parameter entities were used like macros or aliases in XML.

Essentially, an entity allows you to create an alias to some large bit of text. Elsewhere in the document, you can refer to the large bit of text simply by referring to its alias. As you can imagine, this saves a lot for time that might otherwise be spent retyping the same text. It also means that modifications to data need only happen in one centralized locale to implement global changes.

General Entities

General entities allow you to create document-wide entities and look something like:

    <!ENTITY % NAME "text that you want to be represented by the entity">

In the real world, you might have something that looked like the following:

    <!ENTITY % full_name "Diego Ramirez Valenzuela Martinez Perez the 5th">

Entities are referenced using a

     %ENTITYNAME;

such as

      <!ENTITY % TAG_NAMES "NAME | EMAIL | PHONE | ADDRESS">
      <!ELEMENT BUSINESS_CONTACT (%TAG_NAMES; | COMPANY_NAME)>

Make sure you remember the semi-colon. I forget this all the time :) NOTE: You can specify an entity that has text defined external to the document by using the SYSTEM keyword such as:

    <!ENTITY % license_agreement
       SYSTEM "http://www.mydomain.com/license.xml">

In this case, the XML processor will replace the entity reference with the contents of the document specified.

Be careful that when defining entities, that you define them before using them. Thus, the following would be invalid because the TAG_NAMES alias is defined after it is used.

      <!ELEMENT PERSONAL_CONTACT (%TAG_NAMES; | BIRTHDAY)>
      <!ELEMENT BUSINESS_CONTACT (%TAG_NAMES; | COMPANY_NAME)>
      <!ENTITY % TAG_NAMES "NAME | EMAIL | PHONE | ADDRESS">

Parameter Entities

Parameter entities, that can be either internal or external, are only used within the DTD and look something like the following:

  <!ENTITY % ALIAS "text to be aliased">

For example, you might have something like....

  <!ENTITY % NAME "text that you want to be represented">

Using Parameter entities, you can shorted the declarations of other elements and attributes such as:

  <!ENTITY % TAG_NAMES "NAME | EMAIL | PHONE | ADDRESS">
  <!ELEMENT PERSONAL_CONTACT (%TAG_NAMES; | BIRTHDAY)>
  <!ELEMENT BUSINESS_CONTACT (%TAG_NAMES; | COMPANY_NAME)>

NOTE: Parameter Entity declarations must precede any reference to them and must be properly nested.

Internal Versus External DTDs

As we have already alluded to several times throughout this tutorial, a DTD can either be included as part of a well-formed XML document (standalone=yes) or it can be referenced from an external source (standalone=no).

The benefits of using external DTDs is that they can more easily and efficiently be shared by more than one XML document, or in fact, many organizations with the need to standardize communications and data. You can write a DTD once and have multiple documents reference it. Not only does this save typing time, but it assures that as the DTD manager makes changes to the central DTD, all documents that rely on the DTD are updated in one fell swoop (Of course, DTD changes will not necessarily be backwards compatible, so watch out!).

In order to reference an external DTD, you must change both the XML declaration and the DOCTYPE declaration.

The XML Declaration must be changed to reflect the fact that the XML document will not work on its own. That is, it will not be standalone.

        <?xml version = "1.0"
	             encoding="UTF-8"
	             standalone = "no"?>

You will also need to change the DOCTYPE declaration to add the SYSTEM attribute.

        <!DOCTYPE ROOT_ELEMENT
	             SYSTEM "URL_OF_EXTERNAL_DTD">

such as....

        <!DOCTYPE CONTACTS
	             SYSTEM "http://www.mydomain.com/dtds/contacts.dtd">

Note also, that the URL may be a relative or absolute file location such as....

        <!DOCTYPE CONTACTS
	             SYSTEM "contacts.dtd">

which specifies a dtd file in the same directory as the XML document that references it. Or, similarly, you can reference the same document up one directory and down one into the "dtds" directory.

        <!DOCTYPE CONTACTS
	             SYSTEM "../dtds/contacts.dtd">

Using this method, you can simply cut out the DTD from your XML document and paste it into the separate document called contacts.dtd. Thus, you have one file with the DTD and one file with the well-formed XML document.

Public DTDs

The SYSTEM keyword is not the only way to reference an external DTD. This keyword is primarily used for referencing private DTDs that are shared among the documents of a single author or organization. Alternatively, DTDs may be made available to the public by using the PUBLIC keyword. When using the PUBLIC keyword, an external DTD also gets a name by which it can be recognized. the generic format for referencing a PUBLIC DTD loks something like...

        <!DOCTYPE ROOT_ELEMENT PUBLIC "DTD_NAME"
	             "URL_OF_EXTERNAL_DTD">

In usage, it would look more like the following:

        <!DOCTYPE CONTACTS PUBLIC "CONTACT_DTD"
	             "http://www.mydomain.com/dtds/contacts.dtd">

The names used for DTDs are a little different from XML names. In particular, they may contain only alphanumeric characters, the space and new line characaters, and the following punctuation: - _%$#@()+:=/!*;?. Further, DTD names follow some general standards.

ISO standard DTDs begin with ISO. Approved non-ISO standard DTDs begin with a plus (+) sign. Non approved non-ISO standard DTDs begin with a dash (-).

Whichever the case, the initial segment is followed by two slashes (//) and the name of the owner of the DTD. Following the name is another slash and then the type of document the DTD describes. Finally, the string is tagged with another slash and a language reference (ISO 639). For example, you might see a reference like

        <!DOCTYPE CONTACTS PUBLIC "-//Selena Sol//Contact Data//EN"
	             "http://www.mydomain.com/dtds/contacts.dtd">

--END OF ARTICLE--


Publication Date: Tuesday 19th August, 2003
Author: Selena Sol View profile

Related Articles