Writing XML by Kevin Howard Goldberg

Writing XML

XML documents, like HTML documents, are comprised of tags and data. One big difference between the two documents, however, is that the tags used by an XML document are created by the author. Another big difference is that an XML document stores and describes that data; it doesn't do anything more with the data, such as display it, like an HTML document does.

<?xml version="1.0"?>
 <name>Colossus of Rhodes</name>
 <location>Rhodes, Greece</location>
 <height units="feet">107</height>
Figure 1.1 An XML document describing one of the Seven Wonders of the World: the Colossus of Rhodes. The document contains the name of the wonder, as well as its location and its height in feet.

XML documents should be rather self-explanatory in that the tags should describe the data they contain (Figure 1.1).

The first line of the XML document is the XML declaration which notes which version of XML you are using. The next line begins the data part of the document and is called the root element. In an XML document, there can be only one root element.

The next 3 lines are called child elements, and they describe the root element in more detail.

The last child element, height, contains an attribute called units which is being used to store the specific units of the height measurement. Attributes are used to include additional information to the element, without adding text to the element itself.

Finally, the XML document ends with the closing tag of the root element </wonder>.

This is a complete and valid XML document. Nothing more needs to be written, added, annotated, or complicated. Period.

Rules for Writing XML

XML has a structure that is extremely regular and predictable. It is defined by a set of rules, the most important of which are described below. If your document satisfies these rules, it is considered well-formed. Once a document is considered well-formed, it can be used in many, many ways.

<?xml version="1.0"?>
 <name>Colossus of Rhodes</name>
Figure 1.3 In a well-formed XML document, there must be one element (wonder) that contains all other elements. This is called the root element. The first line of an XML document is an exception because it's a processing instruction and not part of the XML data.

A root element is required. Every XML document must contain one, and only one, root element. This root element contains all the other elements in the document. The only pieces of XML allowed outside (preceding) the root element are comments and processing instructions (Figure 1.3).

<?xml version="1.0"?>
<name>Colossus of Rhodes</name>
<main_image file="colossus.jpg"/>
Figure 1.4 Every element must be enclosed by matching tags such as the name element. Empty elements like main_image can have an all-in-one opening and closing tag with a final slash. Notice that all elements are properly nested; that is, none are overlapping.

Closing tags are required. Every element must have a closing tag. Empty elements can use a separate closing tag, or an all-in-one opening and closing tag with a slash before the final > (Figure 1.4).

Elements must be properly nested. If you start element A, then start element B, you must first close element B before closing element A (Figure 1.4).

<name>Colossus of Rhodes</name>
<Name>Colossus of Rhodes</Name>

<name>Colossus of Rhodes</Name>
Figure 1.5 The top example is valid XML, though it may be confusing. The two elements (name and Name) are actually considered completely different and independent. The bottom example is incorrect since the opening and closing tags do not match.
Case matters. XML is case sensitive. Elements named wonder, WONDER, and Wonder are considered entirely separate and unrelated to each other (Figure 1.5).
<main_image file="colossus.jpg"/>
Figure 1.6 The quotation marks are required. They can be single or double, as long as they match each other. Note that the value of the file attribute doesn't necessarily refer to an image; it could just as easily say "The picture from last summer's vacation".

Values must be enclosed in quotation marks. An attribute's value must always be enclosed in either matching single or double quotation marks (Figure 1.6).

Elements, Attributes, and Values

Figure 1.7 A typical element is comprised of an opening tag, content, and a closing tag. This height element contains text.

XML uses the same building blocks as HTML: tags that define elements, values of those elements, and attributes. An XML element is the most basic unit of your document. It can contain text, attributes, and other elements. An element has an opening tag with a name written between less than (<) and greater than (>) signs (Figure 1.7). The name, which you invent yourself, should describe the element's purpose and, in particular, its contents. An element is generally concluded with a closing tag, comprised of the same name preceded with a forward slash, enclosed in the familiar less than and greater than signs. The exception to this is called an empty element which may be "selfclosing,".

Figure 1.8 The height element now has an attribute called units whose value is feet. Notice that the word feet isn't part of the height element's content. This doesn't make the value of height equal to 107 feet. Rather, the units attribute describes the content of the height element.

Elements may have attributes. Attributes, which are contained within an element's opening tag, have quotation-mark delimited values that further describe the purpose and content (if any) of the particular element (Figure 1.8). Information contained in an attribute is generally considered metadata; that is, information about the data in the element, as opposed to the data itself. An element can have as many attributes as desired, as long as each has a unique name.

About the Author

Kevin Howard Goldberg has been working with computers since 1976 when he taught himself BASIC on his elementary school's PDP 11/70. Since then, Kevin's career has included management consulting, lead software development and in his current capacity, he runs technology operations for a world-class Internet Strategy, Marketing and Development company. Kevin holds a bachelor's degree in Economics and Entrepreneurial Management from the Wharton School of Business at the University of Pennsylvania, and is a candidate for a master's degree in Computer Science at the University of California, Los Angeles.

Reader J. Hatch says, "This book is a great way to get started if you have not done a lot of HTML and are looking for a way to get your feet wet with XML and it's interactions with HTML. It does a great job of step-by-step leading you through most of the basic concepts used in XML. If you don't know much about XML and want to get started and need a quick reference for figuring out what is going on in an XML file, this is a great starting point. It is compact and the approach is to explain something minimally and then show a example. There are example files to download that the book references. To get the most out of this book, you will have to download the examples."

Learn more about XML: Visual Quickstart Guide at

Learn more at