An XML document is made up of the following parts.
Note: we will review an XML document in the next presentation.
The prolog of an XML document can contain the following items.
The XML declaration, if it appears at all, must appear on the very first line of the document with no preceding white space. It looks like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
This declares that the document is an XML document. The
version attribute is required, but the
standalone attributes are not. If the XML document uses any markup declarations that set defaults for attributes or declare entities then
standalone must be set to
Processing instructions are used to pass parameters to an application. These parameters tell the application how to process the XML document. For example, the following processing instruction tells the application that it should transform the XML document using the XSL stylesheet beatles.xsl.
<?xml-stylesheet href="beatles.xsl" type="text/xsl"?>
As shown above, processing instructions begin with and
<? end with
Comments can appear throughout an XML document. Like in HTML, they begin with
<!-- and end with
<!--This is a comment-->
The Document Type Declaration (or
DOCTYPE Declaration) has three roles.
DOCTYPE Declaration shown below simply states that the document element of the XML document is beatles.
DOCTYPE Declaration points to an external DTD, it must either specify that the DTD is on the same system as the XML document itself or that it is in some public location. To do so, it uses the keywords
PUBLIC. It then points to the location of the DTD using a relative Uniform Resource Indicator (URI) or an absolute URI. Here are a couple of examples.
<!--DTD is on the same system as the XML document--> <!DOCTYPE beatles SYSTEM "dtds/beatles.dtd">
<!--DTD is publicly available--> <!DOCTYPE beatles PUBLIC "-//Webucator//DTD Beatles 1.0//EN" "http://www.webucator.com/beatles/DTD/beatles.dtd">
As shown in the second declaration above, public identifiers are divided into three parts:
Every XML document must have at least one element, called the document element. The document element usually contains other elements, which contain other elements, and so on. Elements are denoted with tags. Let's look again at the
<?xml version="1.0"?> <person> <name> <firstname>Paul</firstname> <lastname>McCartney</lastname> </name> <job>Singer</job> <gender>Male</gender> </person>
The document element is
person. It contains three elements:
gender. Further, the name element contains two elements of its own:
lastname. As you can see, XML elements are denoted with tags, just as in HTML. Elements that are nested within another element are said to be children of that element.
Not all elements contain other elements or text. For example, in XHTML, there is an img element that is used to display an image. It does not contain any text or elements within it, so it is called an empty element. In XML, empty elements must be closed, but they do not require a separate close tag. Instead, they can be closed with a forward slash at the end of the open tag as shown below.
The above code is identical in function to the code below.
XML elements can be further defined with attributes, which appear inside of the element's open tag as shown below.
<name title="Sir"> <firstname>Paul</firstname> <lastname>McCartney</lastname> </name>
Sometimes it is necessary to include sections in an XML document that should not be parsed by the XML parser. These sections might contain content that will confuse the XML parser, perhaps because it contains content that appears to be XML, but is not meant to be interpreted as XML. Such content must be nested in CDATA sections. The syntax for CDATA sections is shown below.
<![CDATA[ This section will not get parsed by the XML parser. ]]>
In XML data, there are only four white space characters.
There are several important rules to remember with regards to white space in XML.
xml:space attribute is a special attribute in XML. It can only take one of two values:
preserve. This attribute instructs the application how to treat white space within the content of the element. Note that the application is not required to respect this instruction.
XML has relatively straightforward, but very strict, syntax rules. A document that follows these syntax rules is said to be well-formed.
There are five special characters that cannot be included in XML documents. These characters are replaced with predefined entity references as shown in the table below.