Introduction to XML for Web Developers (Part 4 of 8)
As we have already said, it is a pretty good rule of thumb to consider anything outside of tags to be character data and anything inside of tags to be considered markup. But alas, in one case this is not true. In the special case of CDATA blocks, all tags and entity references are ignored by an XML processor that treats them just like any old character data.
CDATA blocks have been provided as a convenience measure when you want to include large blocks of special characters a character data, but you do not want to have to use entity references all the time. What if you wanted to write about an XML document in XML! Consider the following example in which you would have an example tag in your XML Guide written in XML:
As you can see, you would be forced to use entity references for all the tags. YUCK!
To avoid the inconvenience of translating all special characters, you can use a CDATA block to specify that all character data should be considered character data whether or not it "looks" like a tag or entity reference.
Consider the following example:
As you might have guessed, the character string
]]> is not allowed within a CDATA block as it would signal the end of the CDATA block.
Not only will you sometimes want to include tags in your XML document that you want the XML processor will ignore (display as character data), but sometimes you will want to put character data in your document that you want the XML processor to ignore (not display at all). This type of text is called COMMENT text.
You will be familiar with comments from HTML. In HTML, you specified comments using the
--> syntax. Well, I have some good news. In XML, comments are done in just the same way! So the following would be a valid XML comment:
<!-- Begin the Names -->
<!-- End the names -->
When using comments in your XML documents, however, you should keep in mind a couple of rules.
First, you should never have "-" or "--" within the text of your comment as it might be confusing to the XML processor.
Second, never place a comment within a tag. Thus, the following code would be poorly-formed XML
<NAME <!--The name --> >Peter Williams</NAME>
Likewise, never place a comment inside of an entity declaration and never place a comment before the XML declaration that must always be the first line in any XML document.
Comments can be used to comment out tag sets. Thus, in the following case, all the names will be ignored except for Barbara Tropp.
<!-- don't show these
However, if you do comment out blocks of tags, make sure that the remaining XML is well-formed.
Processing InstructionsWe have already seen a processing instruction. The XML declaration is a processing instruction. And if you recall, when we introduced the XML declaration we promised to return to the concept of processing instructions to explain them as a category.
So here we are. A processing instruction is a bit of information meant for the application using the XML document. That is, they are not really of interest to the XML parser. Instead, the instructions are passed intact straight to the application using the parser. The application can then pass this on to another application or interpret it itself. All processing instructions follow the generic format of:
As you might imagine, you cannot use any combination of "xml" as the
NAME_OF_APPLICATION_INSTRUCTION_IS_FOR since "xml" is reserved. However, you might have something like:
<?JAVA_OBJECT JAR_FILE = "/java/myjar.jar"?>
EntitiesTo a large degree much of the discussion of entities is more relevant in the next section, writing "valid" documents, rather than in this section, writing "well-formed" documents.
As such, we will discuss entities in greater details in the next section. Nevertheless, some issues make sense within this section, because entities must be well-formed as well as valid. So, in this section, we will introduce entities in terms of their basic syntax and leave the nitty gritty for a little bit later.
As we said before, entities are essentially aliases that allow you to refer to large sections of text without having to type them out every time you want to use them.
Suppose you have your letterhead saved as an entity in a shared file. Then, every time you write a letter in XML, you might say something like
blah blah blah
Notice that the letterhead might expand out to
1234 Fifth Ave.
Los Angeles, California 90026
However, instead of typing that out in every letter, you just use
There are two types of entities, general and parameter entities and each entity has two parts, the declaration and the entity reference.
General EntitiesGeneral entities look something like:
<!ENTITY NAME "text that you want to be represented by the entity">
which might look like the following in the real world:
<!ENTITY full_name "Diego Ramirez Valenzuela Martinez Perez the 5th">
NOTE: You can specify an entity that has text defined external to the document by using the
SYSTEM keyword such as:
In this case, the XML processor will replace the entity reference with the contents of the document specified.
Parameter entities, that can also be either internal or external, are only used within the DTD that we will discus in the next section so we will defer a serious discussion until then. However, we will mention that a well-formed parameter entity will look the same as a general entity except that it will include the "
%" specifier. Consider the following example:
<!ENTITY % NAME "text that you want to be represented by the entity">
The DOCTYPE Declarations
If you want to declare entities, you MUST do so within the document
DOCTYPE declaration that always follows the prolog (DTD and xml Declaration) and looks like the following:
<!DOCTYPE myDocument [
...here is where you declare your entities....
...here is the body of your document....
Thus, you might have something like the following (Consider how much easier changing office addresses is when you use entities!):
<!DOCTYPE CLIENTS [
<!ENTITY ninthFloorAddress "2345 Broadway St Floor 9">
<!ENTITY eighthFloorAddress "2345 Broadway St Floor 8">
<!ENTITY seventhFloorAddress "2345 Broadway St Floor 7">
Entity ReferencesWell we have pretty much let the cat out of the bag already. We have shown several examples of entity references above.
In short, Entity References refer to the key that unlocks an entity whch has been declared in an Entity Declaration. Entity References follow the simple syntax of:
- Entities MUST be declared in an XML document before they are referenced.
- Note that there may not be any whitespace embedded in an entity reference. In other words, & letterhead; or &letterhead ; will cause errors.
- Though entities may refer to other entities, they may not be self-referential. So they may not reference other entities that reference them in return.
- References to entities may not appear in the DOCTYPE declaration
- The text that the entity references must be well-formed XML.
As you might expect parameter entity references work much like general entity references. In this case, we use a "
%" sign instead of a "
Now, you have already seen that entity references can take the place of regular character data and you have seen how useful that is. Before we leave the subject, I would only mention that you could also use entity references within tag attributes. For example, consider the following:
<INVOICE CLIENT = "&IBM;" PRODUCT = "&PRODUCT_ID_8762;" QUANTITY = "5">
- You may not reference an external entity from within element attributes.
- The referenced text may not contain the < character because it would cause a well-formed error in the element when replaced.