facebook google plus twitter
Webucator's Free XML Tutorial

Lesson: DTDs

Welcome to our free XML tutorial. This tutorial is based on Webucator's Introduction to XML Training course.

In this lesson you will learn about DTDs (Document Type Definitions) and their role in the world of XML.

Lesson Goals

  • Learn the difference between well-formed and valid XML documents.
  • Learn the purpose of DTDs.
  • Learn to create DTDs.
  • Validate an XML document according to a DTD.

Well-formed vs. Valid

It's possible for an XML document to be well formed, but not valid:

  • A well-formed XML document is one that follows the syntax rules described earlier and repeated below.
  • A valid XML document is one that conforms to a specified structure.

For an XML document to be validated, it must be checked against a schema, which is a document that defines the structure for a class of XML documents. XML documents that are not intended to conform to a schema can be well-formed, but they cannot be valid.

XML Syntax Rules (revisited)

  1. There must be one and only one document element.
  2. Every open tag must be closed.
  3. If an element is empty, it still must be closed.
    • Poorly-formed: <tag>
    • Well-formed: <tag></tag>
    • Also well-formed: <tag />
  4. Elements must be properly nested.
    • Poorly-formed: <a><b></a></b>
    • Well-formed: <a><b></b></a>
  5. Tag and attribute names are case sensitive.
  6. Attribute values must be enclosed in single or double quotes.

The Purpose of DTDs

A Document Type Definition (DTD) is a type of schema. The purpose of DTDs is to provide a framework for validating XML documents. By defining a structure that XML documents must conform to, DTDs allow different organizations to create shareable data files.

Imagine, for example, a company that creates technical courseware and sells it to technical training companies. Those companies may want to display the outlines for that courseware on their websites, but they do not want to display it in the same way as every other company who buys the courseware. By providing the course outlines in a predefined XML format, the courseware vendor makes it possible for the training companies to write programs to read those XML files and transform them into HTML pages with their own formatting styles (perhaps using XSLT or CSS). If the XML files had no predefined structure, it would be very difficult to write such programs.

Creating DTDs

DTDs are simple text files that can be created with any basic text editor. Although they look a little cryptic at first, they are not terribly complicated once you get used to them.

Note: we will review a DTD document in the next presentation.

A DTD outlines what elements can be in an XML document and the attributes and subelements that they can take. Let's start by taking a look at a complete DTD and then dissecting it.

Code Sample:

DTDs/Demos/Beatles.dtd
<!ELEMENT beatles (beatle+)>
<!ELEMENT beatle (name)>
<!ATTLIST beatle
	link CDATA #IMPLIED
	real (yes|no) "yes">
<!ELEMENT name (firstname, lastname)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>

The Document Element

When creating a DTD, the first step is to define the document element.

<!ELEMENT beatles (beatle+)>

The element declaration above states that the beatles element must contain one or more beatle elements.

Child Elements

When defining child elements in DTDs, you can specify how many times those elements can appear by adding a modifier after the element name. If no modifier is added, the element must appear once and only once. The other options are shown in the table below.

Modifier Description
? Zero or one times.
+ One or more times.
* Zero or more times.

It is not possible to specify a range of times that an element may appear (e.g, 2-4 appearances).

Other Elements

The other elements are declared in the same way as the document element - with the <!ELEMENT> declaration. The Beatles DTD declares four additional elements.

Each beatle element must contain a child element name, which must appear once and only once.

<!ELEMENT beatle (name)>

Each name element must contain a firstname and lastname element, which each must appear once and only once and in that order.

<!ELEMENT name (firstname, lastname)>

Some elements contain only text. This is declared in a DTD as #PCDATA. PCDATA stands for parsed character data, meaning that the data will be parsed for XML tags and entities. The firstname and lastname elements contain only text.

<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>

Choice of Elements

It is also possible to indicate that one of several elements may appear as a child element. For example, the declaration below indicates that an img element may have a child element name or a child element id, but not both.

<!ELEMENT img (name|id)>

Empty Elements

Empty elements are declared as follows.

<!ELEMENT img EMPTY>

Mixed Content

Sometimes elements can have elements and text intermingled. For example, the following declaration is for a body element that may contain text in addition to any number of link and img elements.

<!ELEMENT body (#PCDATA | link | img)*>

Location of Modifier

The location of modifiers in a declaration is important. If the modifier is outside of a set of parentheses, it applies to the group; whereas, if the modifier is immediately next to an element name, it applies only to that element. The following examples illustrate.

In the example below, the body element can have any number of interspersed child link and img elements.

<!ELEMENT body (link | img)*>

In the example below, the body element can have any number of child link elements or any number of child img elements, but it cannot have both link and img elements.

<!ELEMENT body (link* | img*)>

In the example below, the body element can have any number of child link and img elements, but they must come in pairs, with the link element preceding the img element.

<!ELEMENT body (link, img)*>

In the example below, the body element can have any number of child link elements followed by any number of child img elements.

<!ELEMENT body (link*, img*)>

Using Parentheses for Complex Declarations

Element declarations can be more complex than the examples above. For example, you can specify that a person element either contains a single name element or a firstname and lastname element. To group elements, wrap them in parentheses as shown below.

<!ELEMENT person (name | (firstname,lastname))>

Declaring Attributes

Attributes are declared using the <!ATTLIST > declaration. The syntax is shown below.

<!ATTLIST ElementName
	AttributeName AttributeType State DefaultValue?
	AttributeName AttributeType State DefaultValue?>
  • ElementName is the name of the element taking the attributes.
  • AttributeName is the name of the attribute.
  • AttributeType is the type of data that the attribute value may hold. Although there are many types, the most common are CDATA (unparsed character data) and ID (a unique identifier). A list of options can also be given for the attribute type.
  • DefaultValue is the value of the attribute if it is not included in the element.
  • State can be one of three values: #REQUIRED, #FIXED (set value), and #IMPLIED (optional).

The beatle element has two possible attributes: link, which is optional and may contain any valid XML text, and real, which defaults to yes if it is not included.

<!ATTLIST beatle
	link CDATA #IMPLIED
	real (yes|no) "yes">

Validating an XML Document with a DTD

The DOCTYPE declaration in an XML document specifies the DTD to which it should conform. In the code sample below, the DOCTYPE declaration indicates the file should be validated against Beatles.dtd in the same directory.

Code Sample:

DTDs/Demos/Beatles.xml
<?xml version="1.0"?>
<!DOCTYPE beatles SYSTEM "Beatles.dtd">
<beatles>
	<beatle link="http://www.paulmccartney.com">
		<name>
			<firstname>Paul</firstname>
			<lastname>McCartney</lastname>
		</name>
	</beatle>
	<beatle link="http://www.johnlennon.com">
		<name>
			<firstname>John</firstname>
			<lastname>Lennon</lastname>
		</name>
	</beatle>
	<beatle link="http://www.georgeharrison.com">
		<name>
			<firstname>George</firstname>
			<lastname>Harrison</lastname>
		</name>
	</beatle>
	<beatle link="http://www.ringostarr.com">
		<name>
			<firstname>Ringo</firstname>
			<lastname>Starr</lastname>
		</name>
	</beatle>
	<beatle link="http://www.webucator.com" real="no">
		<name>
			<firstname>Nat</firstname>
			<lastname>Dunn</lastname>
		</name>
	</beatle>
</beatles>

Writing a DTD

Duration: 60 to 90 minutes.

In this exercise, you will write a DTD for the business letter shown below. You will then mark up the business letter as a valid XML document according to your DTD. Make sure that the XML file contains a DOCTYPE declaration.

Both documents should be saved in the DTDs/Exercises folder. To test whether the XML file is valid, visit www.xmlvalidation.com/ and upload your XML document and DTD.

Code Sample:

DTDs/Exercises/BusinessLetter.txt
November 29, 2011

Joshua Lockwood
Lockwood & Lockwood
291 Broadway Ave.
New York, NY 10007
United States

Dear Mr. Lockwood:

Along with this letter, I have enclosed the following items:

	- two original, execution copies of the Webucator Master Services Agreement
	- two original, execution copies of the Webucator Premier Support for 
		Developers Services Description between 
		Lockwood & Lockwood and Webucator, Inc.
	
Please sign and return all four original, execution copies to me at your
earliest convenience.  Upon receipt of the executed copies, we will 
immediately return a fully executed, original copy of both agreements to you.

Please send all four original, execution copies to my attention as follows:

	Webucator, Inc.
	4933 Jamesville Rd.
	Jamesville, NY 13078  USA
	Attn: Bill Smith
	
If you have any questions, feel free to call me at 800-555-1000 x123 
or e-mail me at bsmith@webucator.com.

Best regards,

Bill Smith
VP, Operations