OLAC Protocol for Metadata Harvesting

Date issued:2002-12-09
Status of document:Former Proposed Standard. This document was withdrawn from the OLAC document process.
Superseded by:http://www.language-archives.org/OLAC/repositories-20021211.html
This version:http://www.language-archives.org/OLAC/protocol-20021209.html
Previous version:http://www.language-archives.org/OLAC/protocol-20011210.html
Abstract:

This document defines the protocol OLAC service providers use to harvest metadata from OLAC data providers. It defines the responses that OLAC data providers must make to the requests of the protocol.

Editors: Gary Simons, SIL International (mailto:gary_simons@sil.org)
Steven Bird, University of Melbourne and University of Pennsylvania (mailto:sb@csse.unimelb.edu.au)
Changes since previous version:

The following changes were introduced to effect the upgrade from the development phase of OLAC to the operational phase :

  1. The base OAI protocol is upgraded from version 1.1 to 2.0.

  2. The OLAC schemas and namespace identifiers are upgraded from version 0.4 to 1.0.

  3. The requirement to support oai_dc metadata format is removed.

  4. A new section on "OAI identifier description" explains, among other things, the new requirement (based on OAI 2.0) that repository identifiers must be based on a registered Internet domain.

  5. The shortLocation element is added to the OLAC archive description.

Copyright © 2002 Gary Simons (SIL International) and Steven Bird (University of Melbourne and University of Pennsylvania). This material may be distributed and repurposed subject to the terms and conditions set forth in the Creative Commons Attribution-ShareAlike 2.5 License.

Table of contents

  1. Introduction
  2. OAI identifier description
  3. OLAC archive description
  4. Responses to OAI requests
References

1. Introduction

The OLAC protocol for metadata harvesting is based on the protocol developed by the Open Archives Initiative [OAI-PMH]. This document assumes familiarity with the OAI protocol.

The implementation of an OLAC data provider has all the features of a minimal repository implementation [OAI-GRI], except that an OLAC data provider need not support the oai_dc metadata format (since the OLAC Aggregator [OLACA] provides that service for data providers that comply with this standard. In addition to the requirements of a minimal repository implementation, an OLAC data provider must comply with three additional requirements. It must:

The remaining sections of this standard describe these additional requirements.

2. OAI identifier description

The resource identifiers supplied by an OLAC data provider must comply with the OAI specification for the format of OAI identifiers as defined in [OAI-Ids]. The data provider must also return a description of its identifiers in an <oai-identifier> element in the response to Identify. In addition to being valid with respect to its schema, these are further requirements on the <oai-identifier> description:

3. OLAC archive description

The Identify request supplies minimal information about an archive, namely, its name, base URL, and administrator email. An OLAC data provider must also return an <olac-archive> element in the response to Identify. This element gives additional information that makes it possible for an OLAC service provider to supply its users with a basic description of a participating archive.

The <olac-archive> element has an obligatory attribute, type, which must have one of two values:

These are the elements within an OLAC archive description:

archiveURL

Optional. The home page of the archive on the Web. This is the home page for human visitors, not the base URL for harvesting.

curator

The name of the person who curates the archive collection. If more than one person has collaborated as personal sponsors of the archive, then this element should contain all the names in the order and format the collaborators want to be cited.

curatorTitle

Optional. The job title of the curator within the sponsoring institution (for an institutional archive) or within the institution of affiliation (for a personal archive).

curatorEmail

Optional. A mailto: URI giving the email address for contacting the curator of the archive. (Note that this is distinct from the <adminEmail> in the Identify response which is the contact address for the maintainer of the data provider.)

institution

The name of the sponsoring institution (for an institutional archive) or the institution of affiliation (for a personal archive). The field is obligatory. If the curator of a personal archive has no affiliation, then a value of Unaffiliated should be given.

institutionURL

Optional. A URL for the home page of the institution.

shortLocation

Obligatory. A brief statement of the location of the institution or the person providing the metadata following the format "City, Country". Multiple locations may be connected with "and". This information is shown in the location column of the table of participating archives at http://www.language-archives/archives.php4.

location

Optional. A single paragraph (not to exceed 1000 characters) describing where an archive that houses a collection of physical holdings is located (for instance, include building name, room number, street address). Other information relevant to visiting the collection, such as opening hours or restrictions on access, may also be described. If the archive is purely an on-line repository, do not use this element.

synopsis

A single paragraph (not to exceed 1000 characters) summarizing the purpose, scope, coverage, and so on of the archive.

access

A single paragraph (not to exceed 1000 characters) summarizing terms of access to the materials described in the published metadata. The statement should mention restrictions on access, licensing requirements, costs, and so on. Individual metadata records should use the Rights element to document such things for particular archive holdings. The purpose of <access> is to broadly characterize the entire archive.

For example,

<olac-archive
      xmlns="http://www.language-archives.org/OLAC/1.0/"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.language-archives.org/OLAC/1.0/
                 http://www.language-archives.org/OLAC/1.0/olac-archive.xsd"
      type="institutional">
   <archiveURL>http://www.ethnologue.com</archiveURL>
   <curator>Raymond G. Gordon, Jr.</curator>
   <curatorTitle>Ethnologue Editor</curatorTitle>
   <curatorEmail>mailto:editor_ethnologue@sil.org</curatorEmail>
   <institution>SIL International</institution>
   <institutionURL>http://www.sil.org</institutionURL>
   <shortLocation>Dallas, USA</shortLocation>
   <location>7500 W. Camp Wisdom Rd., Dallas, TX 75236, U.S.A.</location>
   <synopsis>The Ethnologue data provider gives a metadata record for every
   language entry in the Web edition of the Ethnologue.  The latter provides
   basic information about each of the 7,000+ modern language of the world
   (both living and recently extinct).</synopsis>
   <access>Every resource described by the Ethnologue data provider is a
   public Web page that may be accessed without restriction. Reuse of 
   material on the site is subject to the Terms of Use that are
   posted.</access>
</olac-archive>

The schema for validating an OLAC archive description is at http://www.language-archives.org/OLAC/1.0/olac-archive.xsd.

4. Responses to OAI requests

The OAI protocol for metadata harvesting [OAI-PMH] supports six requests. An OLAC-compliant data provider must support the above two descriptions and the OLAC metadata format [OLAC-Metadata] in its responses to the these six requests. The additional features of the OLAC protocol for metadata harvesting are described below under the request to which they are relevant.

GetRecord

When the metadataPrefix is specified as olac, the <metadata> element of the response must contain an <olac> element that conforms to some version of the XML schema for OLAC metadata [OLAC-Metadata]. The <olac> element must contain an xmlns attribute specifying the URI that identifies the namespace for the version of the OLAC metadata schema that is being used.

Identify

An OLAC data provider must conform to the OAI format for building unique identifiers of records and it must supply an OLAC-specific archive description. These requirements are met in the response to the Identify request. The response must contain at least two <description> elements, one containing an <oai-identifier> element to describe the unique identifier format and another containing an <olac-archive> element to describe the archive. The schema for <oai-identifier> is given in [OAI-Ids]. The schema for <olac-archive> is given below in OLAC archive description.

ListIdentifiers

When the metadataPrefix is specified as olac, this request must respond with at least one record identifier.

ListMetadataFormats

This request (when made with no additional parameters) must respond with at least two <metadataFormat> elements, one for the oai_dc prefix required by OAI and another for the olac prefix required by OLAC. The specification for the olac prefix must also contain the URL (at www.language-archives.org) for the canonical version of the OLAC metadata schema that is being used and the URI for its corresponding namespace. For instance,

<metadataFormat>
   <metadataPrefix>olac</metadataPrefix>
   <schema>http://www.language-archives.org/OLAC/1.0/olac.xsd</schema>
   <metadataNamespace>http://www.language-archives.org/OLAC/1.0/</metadataNamespace>
</metadataFormat>

ListRecords

When the metadataPrefix is specified as olac, every <metadata> element in the response must contain an <olac> element that conforms to some version of the XML schema for OLAC metadata [OLAC-Metadata]. Each <olac> element must contain an xmlns attribute specifying the URI that identifies the namespace for the version of the metadata schema that is being used.

ListSets

The OLAC metadata harvesting protocol places no additional requirements on this request. The data provider may supply any response that is valid with respect to the OAI specification.


To do

The strategy for forming repository identifiers with personal data repositories needs to be addressed.


References

[OAI-GRI]Guidelines for Repository Implementers, Document Version 2002/06/09.
<http://www.openarchives.org/OAI/2.0/guidelines-repository.htm>
[OAI-Ids]Specification and XML Schema for the OAI Identifier Format, Document Version 2002/06/21.
<http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm>
[OAI-PMH]The Open Archives Initiative Protocol for Metadata Harvesting, Version 1.1 (2001-07-02).
<http://www.openarchives.org/OAI/1.1/openarchivesprotocol.htm>
[OLAC-Metadata]OLAC Metadata.
<http://www.language-archives.org/OLAC/metadata.html>
[OLACA]OLAC Aggregator.
<http://www.language-archives.org/cgi-bin/olaca.pl>