An XML document is made up of the following parts.
- An optional prolog.
- A document element, usually containing nested elements.
- Optional comments or processing instructions.
Note: we will review an XML document in the next presentation.
The prolog of an XML document can contain the following items.
- An XML declaration
- Processing instructions
- A Document Type Declaration
The XML Declaration
The XML declaration, if it appears at all, must appear on the very first line of the document with no preceding white space. It looks like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
This declares that the document is an XML document. The
version attribute is required, but the
standalone attributes are not. If the XML document uses any markup declarations that set defaults for attributes or declare entities then
standalone must be set to
Processing instructions are used to pass parameters to an application. These parameters tell the application how to process the XML document. For example, the following processing instruction tells the application that it should transform the XML document using the XSL stylesheet beatles.xsl.
<?xml-stylesheet href="beatles.xsl" type="text/xsl"?>
As shown above, processing instructions begin with and
<? end with
Comments can appear throughout an XML document. Like in HTML, they begin with
<!-- and end with
<!--This is a comment-->
A Document Type Declaration
The Document Type Declaration (or
DOCTYPE Declaration) has three roles.
- It specifies the name of the document
- It may point to an external Document Type Definition (DTD).
- It may contain an internal DTD.
DOCTYPE Declaration shown below simply states that the document element of the XML document is beatles.
DOCTYPE Declaration points to an external DTD, it must either specify that the DTD is on the same system as the XML document itself or that it is in some public location. To do so, it uses the keywords
PUBLIC. It then points to the location of the DTD using a relative Uniform Resource Indicator (URI) or an absolute URI. Here are a couple of examples.
As shown in the second declaration above, public identifiers are divided into three parts:
- An organization (e.g, Webucator)
- A name for the DTD (e.g, Beatles 1.0)
- A language (e.g, EN for English)
Every XML document must have at least one element, called the document element. The document element usually contains other elements, which contain other elements, and so on. Elements are denoted with tags. Let's look again at the
Not all elements contain other elements or text. For example, in XHTML, there is an img element that is used to display an image. It does not contain any text or elements within it, so it is called an empty element. In XML, empty elements must be closed, but they do not require a separate close tag. Instead, they can be closed with a forward slash at the end of the open tag as shown below.
The above code is identical in function to the code below.
XML elements can be further defined with attributes, which appear inside of the element's open tag as shown below.
Sometimes it is necessary to include sections in an XML document that should not be parsed by the XML parser. These sections might contain content that will confuse the XML parser, perhaps because it contains content that appears to be XML, but is not meant to be interpreted as XML. Such content must be nested in CDATA sections. The syntax for CDATA sections is shown below.
In XML data, there are only four white space characters.
- Single space
There are several important rules to remember with regards to white space in XML.
- White space within the content of an element is significant; that is, the XML processor will pass these characters to the application or user agent.
- White space in attributes is normalized; that is, neighboring white spaces are condensed to a single space.
- White space between elements is ignored.
xml:space attribute is a special attribute in XML. It can only take one of two values:
preserve. This attribute instructs the application how to treat white space within the content of the element. Note that the application is not required to respect this instruction.
XML Syntax Rules
XML has relatively straightforward, but very strict, syntax rules. A document that follows these syntax rules is said to be well-formed.
- There must be one and only one document element.
- Every open tag must be closed.
- If an element is empty, it still must be closed.
- Also well-formed:
- Elements must be properly nested.
- Tag and attribute names are case sensitive.
- Attribute values must be enclosed in single or double quotes.
There are five special characters that cannot be included in XML documents. These characters are replaced with predefined entity references as shown in the table below.