We've taken an informal approach to structure in our XML so
far. We've assumed an implicit DTD and enforced it through carefully written
code. In fact, we allowed a violation of an implicit DTD when the error didn't
change the meaning of the document. You may be wondering by now if this is such
a good idea. After all, haven't we said that the teams working on our
applications are only loosely connected? Shouldn't there be some way for them
to learn how a specific vocabulary is put together?
In fact, we're going to see that validation is but one use
for "data about data". We'll introduce metadata, see how we use it in
XML today, and look ahead to its future. We'll see that these future uses of
metadata can be more important to networked applications than validation.
Indeed, these uses will enable us to build applications with fewer assumptions
than we can today. In this chapter, you will learn the following:
What is metadata?
How is metadata used today?
A brief overview of W3C metadata proposals, including RDF, MCF, XML Data and DCD
How one metadata implementation can be used to dynamically generate HTML input forms
The W3C has a number of proposals before it dealing with
metadata in one form or another. All are on or beyond the cutting edge in terms
of use in production applications. For that reason alone, we must approach this
chapter as an experiment. The support for metadata in parsers is also spotty.
Nevertheless, we need to see how far we can take metadata in order to see why
we should care about it. Once we've seen what we can do with metadata, we can
begin to organize our applications so they may readily support whatever
metadata proposals are ultimately adopted.
What is Metadata?
We obey some structural rules whenever we write an XML document.
The XML specification itself provides a syntax, which must be followed in order
for the result to be considered well-formed XML. In addition, the vocabulary in
which the document is written imposes more rules. It tells us the names of
allowed tags and the attributes of those tags. It tells us the structure of our
documents by telling us what elements may contain. Metadata's role is in
telling us the rules of a vocabulary. A well-written vocabulary mirrors the
application domain. A vocabulary about banking will inherently teach a layman
something about the nature of banking. The syntactic rules of XML and the rules
of any particular vocabulary are thus data about
data. Philosophers long ago coined the term metadata to describe this.
Our rules convey no information in the vocabulary, but they tell us what may be written under the vocabulary.
Obviously, metadata is important if we want to validate an
XML document. We took an introductory look at DTDs in Chapter 3. Our chosen
parser, MSXML, became a validating parser with version 5.0 of Internet
Explorer, and we can use this feature to avoid syntactic errors in the data we
exchange. Simply adopt a resolution in your organization that all documents
must be valid, provide DTDs, and turn on the validation feature of MSXML. Apart
from a small performance penalty, however, we won't see much change in our
applications merely by enforcing the validity of our documents.
Metadata tells us about our vocabularies, so we should be
able to use it to discover how new vocabularies work. While that is a utopian
ideal, we'll see that we can make shrewd use of metadata in the service of our
third principle: services will be provided as self-describing data. XML
documents become truly self-describing when vocabulary metadata is available.
How Metadata is Used Today
The only use of metadata in the XML 1.0 recommendation is in
the use of Document Type Definitions (DTD). DTDs give us much of what we would
like to know regarding a document. They completely specify the structure of XML
documents. Elements and their attributes are discussed, optional items are
noted, and so forth. DTDs are the only formally approved mechanism for
validating XML documents. They suffer from one great flaw, however. DTDs are
written using a syntax other than XML. You can't use the XML DOM to parse and
traverse a DTD. Obviously, it isn't impossible to write a parser for handling
DTDs as every validating XML parser must include a DTD parser. It is simply
annoying and inconvenient. As a result, there is great interest in replacing
today's DTDs with an XML vocabulary for describing metadata.
Metadata is a broad topic within the W3C and is managed by
the W3C Metadata Activity. The W3C's interest in metadata extends to more than
just XML. One of the earliest efforts was PICS, the Platform for Internet
Content Selection, an initiative to build a mechanism for applying rating
labels to Web sites. Obviously, a rating scheme must be able to describe
content to some degree, so PICS came to be a way to create general rating
systems. Yet PICS was modest in scope; it only attempted to describe what could
be encoded in HTML pages. This certainly simplifies the task, but it makes it
unsuitable for use as a general purpose metadata language.
PICS also inspired other efforts. The broadest is the W3C's
Resource Description Format (RDF). More recently, the XML community has
advanced more specialized proposals such as XML Data and the Document Content
Description (DCD) for XML. Another activity, XML Namespaces, is not precisely a
metadata activity, but as we shall see it can provide us with interesting
information. Since the XML namespaces activity is a W3C recommendation and is
simpler than the proper metadata activities, let's begin there.
W3C documents are
prone to frequent changes in status. You can find a summary of current status