Recommended metadata extensions

Date issued:2008-02-22
Status of document:Recommendation. This document embodies an OLAC consensus concerning best current practice.
This version:
Latest version:
Previous version:

This document lists all the metadata extensions that are recommended by the Open Language Archives Community for use in describing language resources.

Editors: Gary Simons, SIL International and Graduate Institute of Applied Linguistics (
Steven Bird, University of Melbourne and University of Pennsylvania (
Changes since previous version:

Minor changes to bring the extension descriptions and the references up to date for OLAC metadata version 1.1.

Copyright © 2008 Gary Simons (SIL International and Graduate Institute of Applied Linguistics) and Steven Bird (University of Melbourne and University of Pennsylvania). This material may be distributed and repurposed subject to the terms and conditions set forth in the Creative Commons Attribution-ShareAlike 2.5 License.

Table of contents

  1. Introduction
  2. Recommended extensions

1. Introduction

The OLAC metadata standard [OLAC-Metadata] follows the generic resource description standard known as "Qualified Dublin Core" [DCQ]. In order to meet the specific needs of the language resources community, the OLAC metadata standard incorporates an extension mechanism that makes it possible to describe language resources with greater precision. The mechanism uses the xsi:type attribute to override the basic definition of a Dublin Core metadata element with a definition that has a more precise semantics. For instance, the following subject description,

<dc:subject>Dschang language</dc:subject>

can be formally identified as pertaining to a language and as relating to the specific language identified by the ISO 639-3 code ybb by employing an OLAC extension: as follows:

<dc:subject xsi:type="olac:language" olac:code="ybb"/>

Details on how the extension mechanism works may be found in the last two sections of [OLAC-Metadata].

Any party may develop a metadata extension and use it in OLAC metadata records. When an extension is proven to work and is judged to have wide relevance across the language resources community, then it may be put forward to the community as a recommended best practice. If, after following the process described in [OLAC-Process], the community reaches consensus that the extension should indeed be used where applicable by all OLAC member archives, then the extension achieves the status of an OLAC Recommendation.

This document provides the complete list of all extensions that have been adopted as OLAC Recommendations. Each extension is described in terms of a title and the following six descriptors:


The symbolic name that is used as the value of the xsi:type attribute to indicate that the extension is being used in a metadata element.


The latest revision date of the extension or its controlled vocabulary.

Applies to

The Dublin Core elements with which the extension may be used.


A summary description of what the extension is used for.


A link to a complete document that defines and exemplifies the extension. If the extension involves a controlled vocabulary, the document should also enumerate and define the terms of the vocabulary.


A link to the XML schema that formally defines the extension.

2. Recommended extensions

The Open Language Archives Community recommends that all participating data providers use the following metadata extensions for describing language resources whenever they are applicable:

Code for Discourse Types

Applies to:dc:type, dc:subject
Description:Provides a controlled vocabulary for identifying approximately ten discourse types. It is used with Type to identify the genre of a language resource (particularly a primary text). It may also be used with Subject to identify a work as being about a particular genre.

Code for Identifying Languages

Applies to:dc:language, dc:subject
Description:Provides codes for identifying all known languages, both living and extinct, from the ISO 639 family of international standards (Parts 1, 2, and 3). It is used with Language to indicate a language the resource is written or spoken in. It is used with Subject to indicate a language the resource is about.

Code for Linguistic Field

Applies to:dc:subject
Description:Provides a controlled vocabulary for describing the subject matter of a resource as relevant to a particular subfield of linguistics.

Code for Linguistic Data Types

Applies to:dc:type
Description:Provides a broad classification of the nature of the resource from a linguistic point of view (namely, as a lexicon, a primary text, or a language description).

Code for Participant Roles

Applies to:dc:contributor
Description:Provides a controlled vocabulary for identifying the role of a Contributor more precisely. The vocabulary identifies approximately twenty roles that are common in the development of language resources.


[DCQ]"5. Qualified Dublin Core," in Guidelines for Implementing Dublin Core in XML.
[OLAC-Metadata]OLAC Metadata.
[OLAC-Process]OLAC Process.