Specifications for an OLAC metadata display format and an OLAC-to-OAI_DC crosswalk

Date issued:2003-02-15
Status of document:Draft Informational Note. This is only a preliminary draft that is still under development; it has not yet been presented to the whole community for review.
This version:http://www.language-archives.org/NOTE/olac_display-20030215.html
Latest version:http://www.language-archives.org/NOTE/olac_display.html
Previous version:http://www.language-archives.org/NOTE/olac_display-20020810.html
Abstract:

Specifies OLAC_Display, the OLAC metadata display format implemented by the OLAC Aggregator service. This format is a reader-friendly view of OLAC metadata that incorporates attribute values into the element content and translates coded values into display labels. The document further specifies the transformation from OLAC_Display format to OAI_DC format.

Editors: Gary Simons, SIL International (mailto:gary_simons@sil.org)
Changes since previous version:

Extensively revised to support the changes from the version 0.4 OLAC metadata standard to the 1.0 standard.

Copyright © 2003 Gary Simons (SIL International). This material may be distributed and repurposed subject to the terms and conditions set forth in the Creative Commons Attribution-ShareAlike 2.5 License.

Table of contents

  1. Introduction
  2. Display format strategy
  3. Examples of OLAC_Display format
  4. An OLAC-to-OAI_DC crosswalk
References

1. Introduction

In order to improve recall and precision in searching, the OLAC metadata format [OLAC-Metadata] defines an extension method (involving the xsi:type and olac:code attributes) to support resource description using community-defined controlled vocabularies. Service providers may use these attributes to perform precise searches. However, service providers need to be able to display metadata records to users in a manner that shows all available information in an easy-to-read form. This means that information from these attributes must be combined with the element content to produce a display of all information pertaining to a metadata element. It also requires that coded attribute values (such as three-letter language codes) be translated into friendly display forms.

Transforming OLAC metadata records into such a display format is a non-trivial task that each service provider should not have to implement independently. Thus the OLAC Aggregator [OLACA] offers such a translation service. It supports a metadata format named OLAC_Display. When metadata are harvested using this metadata prefix, the content of any metadata element that uses an extension contains a reader-friendly view of the information expressed by means of the extension. For instance,

http://www.language-archives.org/cgi-bin/olaca2.pl?
   verb=GetRecord&metadataPrefix=olac&identifier=oai:ethnologue.com:AAA

will retrieve the metadata in OLAC format as specified in [OLAC-Metadata], whereas

http://www.language-archives.org/cgi-bin/olaca2.pl?
   verb=GetRecord&metadataPrefix=olac_display&identifier=oai:ethnologue.com:AAA

retrieves the same metadata record in the reader-friendly format specified in this document.

In order to participate in the wider Open Archives Initiative (OAI) community of service providers, OLAC data providers must also publish their metadata records in the Dublin Core format prescribed by the OAI [OAI_DC]. There is no need for data providers to store the records in both formats, however, since the information in the OAI_DC format is a subset of the information in the OLAC format. An OAI_DC record may thus be automatically derived from an OLAC record. A program that transforms a metadata record from one format to another is conventionally called a "crosswalk"; see [Day2001] for other examples of crosswalks and pointers to discussions of crosswalking issues.

It turns out that implementing an OLAC-to-OAI_DC crosswalk involves the same kind of transformation of attribute values that is involved in generating the reader-friendly OLAC_Display format. The final section of this paper describes additional transformations performed by the OLAC Aggregator to achieve an OLAC-to-OAI_DC crosswalk. In addition to documenting the transformation made by the community's centralized OLAC-to-OAI_DC crosswalk, this note can be used as a specification by those who implement an OLAC-to-OAI_DC crosswalk in their own data provider.

2. Display format strategy

The XML schema that implements OLAC metadata uses five devices for recording information:

  1. The element name (whether of a basic DC element or of a refined element, e.g. <dc:date> and <dcterms:issued>)

  2. The value of a metadata element expressed as XML element content

  3. The identification (via the xsi:type attribute) of a metadata extension that more precisely defines the interpretation of the element

  4. A special value (in the olac:code attribute) that is associated with a metadata extension

  5. The language of the element content (expressed in the xml:lang attribute)

A straightforward display of OLAC metadata that shows only the element tag and the element content includes only items 1 and 2. But the attribute values in items 3 and 4 are critical as well, since they qualify and add to the meaning of the element value. Only item 5 can be ignored in producing the display form of a metadata element.

It is not sufficient to incorporate the attribute values directly into a presentation of the element content. This is because the attribute values are typically coded values; the display form must therefore translate the coded values to display labels. Furthermore, there should be standard display templates that uses punctuation in a consistent way to set off the various pieces of information. The templates expressed below are based on the following schematic input:

<element xsi:type="T" olac:code="C">Content</element>

There are two display templates, depending on the nature of the olac:code attribute. In some extensions (such as olac:language), the code attribute is primary in that it gives a precise value for the metadata element. The element content, if used, provides an "escape hatch" to provide an arbitrary value when an appropriate coded value is not available or to provide additional details. An element with this kind of extension uses the following template for its display form:

Label-for-T: Label-for-C, Content

On the other hand, in other extensions (such as olac:role), the element content is the value of the metadata element and the code attribute is secondary since it only qualifies the content in some way. An element with this kind of extension uses the following template for its display form:

Label-for-T: Content (label-for-C)

When an extension is documented (see section 6 of [OLAC-Metadata]), two pieces of information are defined that allow the OLAC Aggregator to correctly generate the OLAC_Display format:

Label  

The display label for the extension

CodeStatus  

Whether the code is primary or secondary

The display labels for the code values are specified in olac:label attributes in the XML schema that enumerates the list of possible code values.

The OLAC_Display format provides the extension information both as attribute values and as display strings incorporated into the element content. Thus, the schema for the olac_display metadata format supported by the OLAC Aggregator is identical to the schema for the olac metadata format. In this way, services that harvest records from the OLAC Aggregator in OLAC_Display format can still use the attribute values to support high recall and precision in queries, and at the same time have the convenience of all the attribute information being incorporated into the element content in a reader-friendly view.

3. Examples of OLAC_Display format

This section illustrates a number of OLAC metadata elements and their equivalents in OLAC_Display format. The examples use the following extensions with the two required parameters defined as follows:

olac:language  
Label:      NULL
CodeStatus: primary
olac:role  
Label:      NULL
CodeStatus: secondary
software:os  
Label:      Operating system:
CodeStatus: primary

For instance, consider the following metadata elements in OLAC format:

<dc:language xsi:type"olac:language" olac:code="x-sil-ban"/>
<dc:language xsi:type"olac:language" olac:code="x-sil-ban">Dschang</dc:language>
<dc:creator xsi:type"olac:role" olac:code="editor">Sapir, Edward</dc:creator>
<dcterms:requires xsi:type"software:os">Windows 95 or higher</dcterms:requires>
<dcterms:requires xsi:type"software:os" olac:code="win2k"/>

These have the following equivalents in OLAC_Display format:

<dc:language xsi:type"olac:language" olac:code="x-sil-ban">Yemba</dc:language>
<dc:language xsi:type"olac:language" olac:code="x-sil-ban">Yemba, Dschang</dc:language>
<dc:creator xsi:type"olac:role" olac:code="editor">Sapir, Edward (editor)</dc:creator>
<dcterms:requires xsi:type"software:os">Operating system: Windows 95 or higher</dcterms:requires>
<dcterms:requires xsi:type"software:os" olac:code="win2k">Operating system: Windows 2000</dcterms:requires>

4. An OLAC-to-OAI_DC crosswalk

The OLAC Aggregator also supports the OAI_DC metadata format. It functions as an OLAC-to-OAI_DC crosswalk since it harvests only OLAC metadata and performs the transformation to OAI_DC upon request. Transforming a metadata record from OLAC format to OLAC_Display format goes most of the way toward implementing the OLAC-to-OAI_DC crosswalk. Three further changes are made to transform an OLAC_Display element to an OAI_DC element:

  1. Remove all the attributes.

    This can be done without loss of information since the information in the attributes is already incorporated into the element content.

  2. "Dumb-down" refined element names to their unqualified equivalent.

    For each element from the dcterms namespace, the tag name is converted to the tag name for the corresponding unqualified element from the dc namespace (as defined in [DC-Q]).

  3. Preserve the refined element names in the element content.

    For each element that has been "dumbed-down", preserve the original element name in a bracketed phrase appended to the element content.

For instance, the five sample metadata elements from the preceding section end up as follows when transformed to OAI_DC format:

<dc:language>Yemba</dc:language>
<dc:language>Yemba, Dschang</dc:language>
<dc:creator>Sapir, Edward (editor)</dc:creator>
<dc:relation>Operating system: Windows 95 or higher [requires]</dc:relation>
<dc:relation>Operating system: Windows 2000 [requires]</dc:relation>

To do

Section 6 of OLAC-Metadata and the schema for documenting an extension need to add the two pieces of information for controlling the display format: extension label and whether the code is primary or secondary.

The second-to-last paragraph of section 2 says that olac:label is used in the XML schema for the extension to give the display labels for the codes. This isn't true yet.


References

[DC-Q]Dublin Core Qualifiers.
<http://dublincore.org/documents/dcmes-qualifiers/>
[Day2001]Day, Michael. Mapping between metadata formats. UK Office for Library and Information Networking.
<http://www.ukoln.ac.uk/metadata/interoperability/>
[OAI_DC]Dublin Core Metadata Element Set, Version 1.1: Reference Description.
<http://dublincore.org/documents/1999/07/02/dces/>
XML schema for OAI implementation of Dublin Core metadata.
<http://www.openarchives.org/OAI/1.1/dc.xsd>
[OLAC-Metadata]OLAC Metadata.
<http://www.language-archives.org/OLAC/metadata.html>
[OLACA]OLAC Aggregator Service.
<http://www.language-archives.org/cgi-bin/olaca.pl>