XML Namespaces Made Simple

XML (Extensible Markup Language) is a very powerful language becuase it can be used to create databases that are well formed and human readable. But if you're going to use your XML file for more than just your own local purposes, and exchange it with other organizations, you need a way to validate the XML.

The Document Type Definition (DTD) language was created for that purpose. However DTD has limited support for datatypes. For example it has no way to specify a type as numeric. It has only PCDATA (Parsed Character DATA). Also DTDs have no notation for namespaces. For this reason, on 2 May 2001 the W3C XML Schema 1.0 specification was released.

Namespaces are a source of much complexity and confusion in XML. Why do we need namespaces? Lets consider the XML tag <name>. Maybe it refers to a person's name, maybe the name of a book, maybe a product name. If you use your XML file for just your own local purposes, you know what it refers to, but if you're going to exchange your XML file with other organizations, you need to be more specific. A namespace is a set of names in which all names are unique. We could use the following system.

employee.name
book.name
product.name

The part before the dot would be the namespace. This clears things up a little, but what if several organizations have XML files using the <employee.name>,<book.name>, or <product.name> XML tag? As you can see these names are still not unique enough. The W3C XML Schema 1.0 specification requires XML namespace identifiers to conform to a specific syntax — the syntax for Uniform Resource Identifier (URI) references.

A URI is string of characters for identifying an abstract or physical resource. In most situations, URI references are used to identify physical resources (Web pages, files to download, and so on), but in the case of XML namespaces, URI references identify abstract resources, specifically, namespaces.

There are two forms of URI: Uniform Resource Locators (URL) and Uniform Resource Names (URN). Either type of URI may be used as a namespace identifier.

Shown below are examples of URLs that could be used as namespace identifiers:

http://www.bucarotechelp.com/staff/
http://www.ed.gov/elementary/students

Shown below are examples of URNs that could be used as namespace identifiers:

urn:www-bucarotechelp-com:student
urn:www.ed.gov:elementary.students
urn:uuid:F8F7B313-EF05-E33-4A6C-DB49A6CF88C4

Because of their specificity and length, URLs can be assumed to be unique identifiers. But to guarantee uniquess you would use a URN. Authors must register their URN namespace identifier with an Internet naming authority.

Using Namespaces

XML processors treat namespace identifiers as inactive strings and never actually attempt to access the resource identified by the URI. So it really doesn't matter what you use as a URI, as long as it has the length required to be unique. However most XML namespaces are defined in formal specifications that describe the names of elements and attributes along with their semantics.

The XML Schema working group (http://www.w3.org/XML/Schema) has put together a specification (XML Schema) that defines an XML-based syntax for defining elements, attributes, and types in a namespace.

<schema xmlns="http://www.w3.org/2000/10/XMLSchema"
   targetNamespace="http://www.bucarotechelp.com/student"
   elementFormDefault="qualified">
  <element name="student">
     <complexType>
         <sequence>
            <element name="id" type="long"/>
            <element name="name" type="string"/>
            <element name="language" type="string"/>
            <element name="rating" type="double"/>         
         </sequence>
     </complexType>
   </element>
</schema>

The example schema shown above defines the namespace http://www.bucarotechelp.com/student as containing four named elements: student, id, name, language, and rating. In addition to providing a namespace, this schema also provides metadata, such as the order of student child elements and their datatypes.

<d:student xmlns:d="http://www.bucarotechelp.com/student">
  <d:id>3235329</d:id>
  <d:namevJeff Smith</d:name>
  <d:language>C#</d:language>
  <d:rating>9.5</d:rating>
</d:student>

A namespace identifier (URI) needs to be quite long to be unique. In the XML document shown above we use the namespace prefix d: to qualify the local names of elements and attributes. A prefix is just an abbreviation for the namespace identifier. The prefix is first mapped to the namespace identifier through a namespace declaration. The syntax for a namespace declaration is:

xmlns:<prefix>="<namespace identifier>"

A namespace declaration looks like an element attribute, but they're not considered attributes. A namespace prefix is in-scope on the declaration element as well as on its descendant elements.

Once declared, the prefix can be used in front of any element or attribute name )separated by a colon) e.g. s:student. This complete name including the prefix is a QName (qualified name).

QName = <prefix>:<local name>

The names of both elements and attributes are made up of two parts: a namespace name and a local name. Such a two-part name is known as a qualified name or QName. The prefix associates the element or attribute with the namespace identifier mapped to the prefix currently in scope.

When documents use elements or attributes from more than one namespace, it's common to have multiple namespace declarations, as shown below:

<d:student xmlns:d="http://www.bucarotechelp.com/student"
  xmlns:i="urn:schemas-develop-com:identifiers"
  xmlns:p="urn:schemas-develop-com:programming-languages">
  <i:id>3235329</i:id>
  <name>Jeff Smith</name>
  <p:language>C#</p:language>
  <d:rating>9.5</d:rating>
</d:student>

Here, student and rating are from the same namespace, while id and language are from a different namespace, but name doesn't belong to a namespace. Namespace prefixes can be overridden by re-declaring the prefix at a nested scope, as shown below:

<d:student xmlns:d="http://www.bucarotechelp.com/student">
  <d:id>3235329</d:id> 
  <d:name xmlns:d="urn:names-r-us">Jeff Smith</d:name>
  <d:language>C#</d:language>
  <d:rating>35</d:rating>
</d:student>

In the above example, everything is from the http://www.bucarotechelp.com/student namespace except for the name element, which is from the urn:names-r-us namespace.

Default Namespaces

There is one more type of namespace declaration that can be used to associate namespace identifiers with element names. That is a default namespace declaration.

xmlns="<namespace identifier>"

Note that there is no prefix. When a default namespace declaration is used on an element, all unqualified element names within its scope are automatically associated with the specified namespace identifier. Default namespace declarations, however, have no effect on attributes. The only way to associate a namespace identifier with an attribute with a namespace identifier is through a prefix.

<d:student xmlns:d="http://www.bucarotechelp.com/student"
     xmlns="urn:foo" id="3235329">
  <name>Jeff Smith</name>
  <language xmlns="">C#</language>
  <rating>35</rating>
</d:student>

In the example shown above, student is from the http://www.bucarotechelp.com/student namespace while name and rating are from the default namespace urn:foo. The id attribute doesn't belong to a namespace since attributes aren't automatically associated with the default namespace identifier.

The example shown above also illustrates that you can un-declare a default namespace by setting the default namespace identifier back to an empty string, as in the language element (you can't do this with prefix declarations). As a result, the language element also doesn't belong to a namespace.

The syntax for default namespaces was designed for convenience, but they tend to cause more confusion. The confusion stems from the fact that elements and attributes are treated differently and it's not immediately apparent that nested elements are being assigned the default namespace identifier. Choosing between prefixes and default namespaces is mostly a matter of style, except when attributes come into play.

Article Resource: Aaron Skonnard's Understanding XML Namespaces

Learn more at amazon.com