XML is a markup language designed as both human-readable and machine-readable. No programming training or expertise is required to read or write XML documents. As long as one can write an article or an essay, he or she is already qualified to write XML documents. I remember in English writing courses, instructors would usually ask us to first write an outline. To great extent, the outline of an essay is similar to an XML document that describes a resource. In an outline we identify the thesis of an essay, followed by evidences or underlying assumptions for each argument. Such an outline can be converted to an XML document because of its hierarchical nature. In addition to human-readable, the resulting XML document is now machine-readable, too.
Each XML document contains a hierarchy of XML “elements.” For example, a book’s table of contents can be easily translated into an XML document. For a three-level TOC: chapter, section, subsection, each chapter can be represented with an XML element, and each “chapter” element can have a nested list of “section” elements, and so on.
Each XML “element” can have a tag and several attributes. Tag is the category name of an XML element. For example, all XML elements describing chapters in a TOC can have the same tag of “CHAP.” An attribute is a name-value pair describing a property of an element. For example, we can assign properties of title, start page and end page to a “CHAP” element. By reading element tags and attributes, human being or machine can quickly grab a bigger picture and perform designated searches.
The next question is where tag names and attribute names come from. Somebody must have clearly defined them before people can fill in contents in an XML document. Yes, that is exactly why librarians are responsible for designing metadata standards or schema. Without disappointment, XML provides a mechanism, called XML Schema, to specify tags and attributes. For example, MARC standard can be described with XML Schema and each “field” of an MARC record can rendered as an XML element. Once the schema is well defined, an MARC record can be automatically translated into a corresponding XML document. There is no doubt that writing an XML schema needs some training in XML. However, as one is able to write a metadata schema in natural language, the translation into XML schema is just straightforward with adept at XML syntax.
Due to the popularity of XML, many XML editors, parser, or validators are available with or without charge. Life can be a little easier with the help of software tools. Even without software tools, XML is still human-readable just like without Windows Word one still is able to write an article in a notepad.
Native XML is short of describing a graph of data despite its excellence in hierarchical structures. Tree-structure hierarchy is inadequate in describing complicated semantics. Even table structures are not sufficient in many scenarios. To fill such a gap, W3C proposed another markup language called RDF (Resource Description Framework) to encode metadata and digital information. RDF is indeed based on XML and is actually a layer on top of native XML. It is even easier to understand and use. An RDF document is composed tuples (nested or not). Each tuple describes an object. The three fields in a field must be object identifier, an object property, and its property value. RDF is the foundation of semantic web proposed by W3C. I believe there will be more adoptions by digital library communities. In fact, Web Ontology Language is an extension of the RDF data model. Some discovering languages that use RDF include RSS, FOAF, RDFa, etc.
Since XML is not a programming language, librarians need not to become coding experts in order to write metadata or design metadata schema even in native XML. XML is by design both human-readable and machine-readable. Many variants of XML-based markup languages provide higher-level modeling structures that are further easier to use. This blog, in response to one of this week’s discussion topics, is to argue that XML is both learner-friendly and user-friendly, and hence is not a hindrance to creating metadata standards or populating metadata.