OLAC Metadata Set

Date issued:	2001-10-22
Status of document:	Proposed Standard. This document is in the midst of open review by the community.
This version:	http://www.language-archives.org/OLAC/olacms-20011022.html
Latest version:	http://www.language-archives.org/OLAC/olacms.html
Previous version:	http://www.language-archives.org/OLAC/olacms-20010616.html
Abstract:	This document specifies the metadata set used by the Open Language Archives Community [OLAC] for the interchange of metadata within the framework of the Open Archives Initiative [OAI]. This document is superseded by the OLAC Metadata standard: http://www.language-archives.org/OLAC/metadata.html
Editors:	Gary Simons, SIL International (mailto:gary_simons@sil.org) Steven Bird, University of Pennsylvania (mailto:sb@ldc.upenn.edu)
Changes since previous version:	Adds scheme attribute (see sections 1 and 2). The lang attribute on `<olac>` is renamed to langs. Following [DC-Q], adds alternative as a possible refinement for Title so that the original title can be distinguished from translations. Type.data is renamed as Type.linguistic. These changes are reflected in version 0.4 of the OLAC metadata schema which is being released at the same time. (Another significant change to the schema is that the use of capitalization in element names now conforms to the usage in the examples in this document.) Significant addition to discussion of Coverage. Numerous minor editorial revisions.

Copyright © 2001 Gary Simons (SIL International) and Steven Bird (University of Pennsylvania). This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/).

Introduction
Attributes
Metadata elements
- Contributor
- Coverage
- Creator
- Date
- Description
- Format
- Format.cpu
- Format.encoding
- Format.markup
- Format.os
- Format.sourcecode
- Identifier
- Language
- Publisher
- Relation
- Rights
- Source
- Subject
- Subject.language
- Title
- Type
- Type.functionality
- Type.linguistic

References

1. Introduction

This document describes the OLAC metadata set and its implementation in XML. The OLAC metadata set is based on the Dublin Core metadata set and uses all fifteen elements defined in that standard [DCMES1.1]. (The rationale for following DC is discussed in the OLAC white paper [OLAC-WP].)

In order to meet the specific needs of the language archiving community, the OLAC metadata set qualifies the fifteen DC elements following principles articulated in [DC-Q] and exemplified in [DCQ-HTML]. These documents describe three kinds of qualification: element refinement, encoding scheme, and content language. The OLAC implementation of qualified DC uses three attributes to support these kinds of qualification: refine, scheme, and lang, respectively.

The OLAC metadata standard prescribes a particular encoding scheme for many elements. In these cases, a code attribute is used to store the encoded element value, while the element content is still available for a value that does not conform to the prescribed scheme. (Note that when applying a so-called "dumb down" procedure to map the qualified OLAC metadata onto unqualified DC, the value of the code attribute would be moved into the content of the element. A more intelligent mapping could also translate the coded value to a more readable descriptive label.)

This standard also prescribes a particular encoding scheme for some element refinements. In these cases the refinement is implemented as a new element rather than via the refine attribute. This has been motivated by a limitation of XML schemas for validating XML documents—namely, the definition of validity for one attribute (e.g. code) cannot be made dependent on the value of another attribute (e.g. refine). Thus, when a particular refinement requires that the element value use a different encoding scheme, then a unique element has been defined. The names for these refined elements have been formed as in [DCQ-HTML] by concatenating the DC element name and the refinement name with an intervening dot (e.g. Subject.language).

The XML implementation of the OLAC metadata elements is based on the Open Archives Initiative's implementation of Dublin Core metadata elements [OAI-DC] in that OLAC defines a proper superset of OAI DC. That is, if the container element of a valid OAI DC record is renamed from <dc> to <olac>, the result is still a valid OLAC metadata record.

The most recent version of the XML schema for the OLAC metadata set (which matches this specification) is as follows:

Schema: http://www.language-archives.org/OLAC/0.4/olac.xsd
Example: http://www.language-archives.org/OLAC/0.4/olac.xml

Section 2 below describes the attributes used in implementing the OLAC metadata set. Section 3 then describes each of the elements that make up the OLAC metadata set.

2. Attributes

Four attributes—refine, code, scheme, and lang—are used throughout the XML implementation of the metadata elements. A fifth attribute, langs, is used on complete metadata records. These are described in turn below.

Some elements in the OLAC metadata set use the refine attribute to identify element refinements. These qualifiers make the meaning of an element narrower or more specific. A refined element shares the meaning of the unqualified element, but with a more restricted scope [DC-Q]. When an element does use a refine attribute, the possible values for the attribute are restricted to a specified controlled vocabulary.

Some elements in the OLAC metadata set use the code attribute to hold metadata values that are taken from a specific encoding scheme. The attribute value is always a precise value taken from a controlled vocabulary or a formal notation that is described in another OLAC document. As a result, all service providers are able to interpret uniformly the meaning of any code value. The terms of a controlled vocabulary are also incorporated into the XML schema for OLAC metadata; thus every code value is tested for validity when the OLAC registration server tests a data provider for conformance to OLAC standards [OLAC-Registration]. For elements that have a code attribute, the element content may also be used to specify a freeform elaboration of the coded value.

Every element in the OLAC metadata set may use the scheme attribute. It specifies a standardized name for the scheme that constrains how the text in the content of the element is encoded. A scheme may be a controlled vocabulary or a formal notation. Recommended best practice is to confine the value of scheme to the label for one of the encoding schemes registered with the Dublin Core Metadata Initiative [DCMI-Registry]. Members of the OLAC community who want to use an unregistered scheme should submit it for registration [DCMI-Submit]. When OLAC data providers are tested for conformance to OLAC recommended best practice, only the value of the scheme attribute is tested for inclusion in the list of registered schemes. OLAC does not test the actual element content for conformance to the scheme; a service provider for a subcommunity that has developed a particular scheme could perform such a test.

Every element in the OLAC metadata set may use the lang attribute. It specifies the language in which the text in the content of the element is written. The value for the attribute comes from the controlled vocabulary defined by [OLAC-Language]. By default, the lang attribute has the value "en", for English. Whenever the language of the element content is other than English, recommended best practice is to use the lang attribute to identify the language. By using multiple instances of the metadata elements tagged for different languages, data providers may offer their metadata records in multiple languages.

The langs attribute is used on the <olac> element—the container for all the metadata elements of a given metadata record. It lists the languages in which the metadata record is designed to be read. This attribute holds a space-delimited list of language codes from the [OLAC-Language] controlled vocabulary. By default, this attribute has the value "en", for English, indicating that the record is aimed only at English readers. If an explicit value is given for the attribute, then the record is aimed at readers of all the languages listed.

Service providers should use this information in order to offer localized views of the metadata. When a metadata record lists only one language, then all elements should be displayed (regardless of their individual languages), unless the user has requested to suppress all records in that language. When a metadata record has multiple alternative languages, the user should be able to select one and have display of elements in the other languages suppressed. An element in a language not included in the list of alternatives should always be displayed (for instance, the vernacular title of a work).

3. Metadata elements

Each element of the OLAC metadata set is described in one of the following subsections. The heading gives the generic identifier of the XML tag used to encode the element. Under the heading, the element is described in five ways. Name gives a descriptive label for the tag. Definition is a one-line summary of what the element is used for. Comments offers details on how to use the element; the first paragraph typically repeats the comment from [DCMES1.1], while the remaining paragraphs give further specification for how OLAC uses the element. Attributes describes the XML attributes used with the element. Examples shows samples of properly encoded elements.

In a given metadata record, every element is optional and every element is repeatable.

Contributor

Name	Contributor
Definition	An entity responsible for making contributions to the content of the resource.
Comments	A Contributor may be a person, an organization, or a service. Contributor is closely related to Creator. The difference has to do with degree of responsibility for the content. The Contributor designation is used for those entities whose role in the creation of the resource is not great enough to merit recognition as a primary source of the intellectual content. This element may be used to identify institutions that sponsored or funded the work. It may be used to identify individuals who played a secondary role in the development of the resource. As a rule of thumb, Creator is used when the contribution is such that it would be cited in the "author" field of a bibliographic entry, and Contributor is used otherwise. Recommended best practice is to identify the entity by means of a name and to give the name in a form that is ready for sorting within an index. For the names of persons, this means that the name should be given in inverted order with the surname first. For the names of organizations, this means that any initial article should be omitted. When a resource has more than one contributor, use a separate Contributor element for each one.
Attributes	The refine attribute is optionally used to specify the role (such as transcriber, sponsor, and so on) played by the named entity in the creation of the resource. The role is expressed by means of a controlled vocabulary; see [OLAC-Role] for the definition of the vocabulary. Also lang and scheme (see section 2).
Examples	A generic contributor: <contributor>Smith, John L.</contributor> A funding agency: <contributor refine="sponsor">National Science Foundation</contributor> The person who performed a role for which there is not a suitable code: <contributor>Smith, John L. (format conversion)</contributor>

Coverage

Name	Coverage
Definition	The extent or scope of the content of the resource.
Comments	Coverage will typically include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). [DCMES1.1] Recommended best practice is to select a value from a controlled vocabulary that is identified in the scheme attribute (for example, the Thesaurus of Geographic Names [TGN]). Where appropriate, named places or time periods should be used in preference to numeric identifiers such as sets of coordinates or date ranges. In the OLAC context, service providers already have a database that maps languages to the countries in which they are spoken [OLAC-Language]. Coverage should not be used to duplicate this information; rather service providers will support searches concerning languages spoken in a given country by referring to the language database. Coverage should be used geographically only when the language involved has a wide distribution and the resource focuses on its use in a particular region or geopolitical jurisdiction, or conversely, when the resource deals with a region that encompasses multiple languages.
Attributes	Only lang and scheme (see section 2).
Examples	A resource about English in India: <subject.language code="en"/> <coverage>India</coverage> A resource about languages spoken on Guadalcanal: <coverage scheme="TGN">Guadalcanal (island)</coverage> A resource about language use in the 19th century: <coverage>19th century</coverage>

Creator

Name	Creator
Definition	An entity primarily responsible for making the content of the resource.
Comments	A Creator may be a person, an organization, or a service. Creator is closely related to Contributor. In determining whether an entity is a Creator (as opposed to a Contributor), use the same criteria that are followed for deciding that an entity should be listed in the "author" slot of a bibliographic reference as a primary source of the intellectual content. Entities that do not merit that level of recognition should be treated as Contributors. Recommended best practice is to identify a Creator by means of a name and to give the name in a form that is ready for sorting within an index. For the names of persons, this means that the name should be given in inverted order with the surname first. For the names of organizations, this means that any initial article should be omitted. When a resource has more than one creator, use a separate Creator element for each one.
Attributes	The refine attribute is optionally used to specify the role (such as author, editor, translator, and so on) played by the named entity in the creation of the resource. The role is expressed by means of a controlled vocabulary; see [OLAC-Role] for the definition of the vocabulary. Also lang and scheme (see section 2).
Examples	A personal author: <creator>Bloomfield, Leonard</creator> An institutional author: <creator>Linguistic Society of America</creator> An editor: <creator refine="editor">Sapir, Edward</creator>
*To do*	Develop the controlled vocabulary and write the OLAC-Role document. The MARC vocabulary for "relators" (i.e. the relationship between a named entity and a work) may be a good place to start: http://lcweb.loc.gov/marc/relators/re0002r1.html Here, and with Contributor, we recommend a format that puts surname first. We should probably go a step further and recommend that archives follow the standards described in AACR2. We need help from a librarian to give us a synopsis (or a pointer to such) of the rules for fomatting personal and corporate names.

Date

Name	Date
Definition	A date associated with an event in the life cycle of the resource.
Comments	Recommended best practice is to use a coded value (see below) when possible, since that guarantees that a service provider can do a straight alphanumeric sort to put collections of metadata descriptions in correct chronological order. The element content may always be used to express date-related values or comments that cannot be fitted into the encoding standard.
Attributes	The code attribute is used to hold a standardized encoding of a date. Following [DCMES1.1], OLAC uses the YYYY-MM-DD format as defined in [W3CDTF]. When a month or day value is less than 10, use a leading 0 to maintain a two-digit length. The encoded value may be for the year alone, for the year and month, or for an exact date. A range of years is also a valid value for this attribute; in this case both years must be four-digit values and the earliest year should come first. Thus, the encoded value may match one of the four following patterns: `YYYY`, `YYYY-MM`, `YYYY-MM-DD`, or `YYYY-YYYY`. The refine attribute is optionally used to refine the meaning of the date using values from a controlled vocabulary (for instance, date of creation versus date of issue versus date of modification, and so on). The vocabulary for refinements to Date is defined in [DC-Q]. A Date with no refinement will be assumed to be the date of issue (i.e. publication). A single resource should not have more than one instance of each date refinement. In selecting a single date to associate with a resource, a service provider will select the most recent of date created, issued, or modified. Also lang and scheme (see section 2).
Examples	A typical year of publication: <date code="1992"/> A resource modified on October 16, 1996: <date refine="modified" code="1996-10-16"/> A resource from approximately 1950: <date code="1950">circa 1950</date>

Description

Name	Description
Definition	An account of the content of the resource.
Comments	Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content, or a free-text account of the content. [DCMES1.1] No formatting conventions are defined within the text of Description. Service providers may format the entire Description as a single paragraph, collapsing adjacent white space characters into a single space. When there is a URL for a document that describes the resource, use a separate Description element to encode just that URL. A Description that begins with "http:" will be interpreted by service providers as consisting solely of a URL and will be presented as a link in user interfaces. Service providers are not obliged to search other Description text for the occurrence of URLs.
Attributes	Only lang and scheme (see section 2).
Examples	A prose description of a resource: <description>The CALLHOME Japanese corpus of telephone speech consists of 120 unscripted telephone conversations between native speakers of Japanese. All calls, which lasted up to 30 minutes, originated in North America and were placed to locations overseas (typically Japan). Most participants called family members or close friends. This corpus contains speech data files ONLY, along with the minimal amount of documentation needed to describe the contents and format of the speech files and the software packages needed to uncompress the speech data. </description> A reference to an existing on-line description: <description>http://www.ldc.upenn.edu/Catalog/LDC96S37.html </description>
*To do*	[DC-Q] defines two refinements for Description: table of contents and abstract. Do we want to introduce this?

Format

Name	Format
Definition	The physical or digital manifestation of the resource.
Comments	Typically, Format may include the media-type or dimensions of the resource. Format may be used to determine the software, hardware or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. [DCMES1.1]
Attributes	The code attribute should be used to make a precise identification of the format using a controlled vocabulary. This is especially important for digital resources since service providers may use this information to match data resources with the software tools that are appropriate for manipulating them. See [OLAC-Format] for the definition of the vocabulary. The element content may be left empty, but it would typically be used to add further details about the format of the resource, such as the sampling rate for digital recordings, the size of computer files, the number of pages and dimensions of a book, the appearance of a physical object, and so on. Also lang and scheme (see section 2).
Examples	For a digitally encoded dictionary: <format code="text/xml">5,237 entries in a 1.2M XML file.</format> For a digitally recorded text: <format code="audio/wav">Duration: 153 seconds. Size: 3.3M. Sampling: 1 channel, 22 KHz, 8 bits.</format>
*To do*	We need to develop the vocabulary for Format. It should be based on the list of Internet Media Types [MIME] but we will probably still want our own vocabulary document at least for the purpose of explaining and exemplifying the use of MIME types. But further than that, we may want to pull out a subset of MIME types. We also may want to add some new categories and subtypes for our purposes in order to cover archive holdings that are not digital, e.g. manuscript, print, microform, and so on. The library or archive world probably has such a controlled vocabulary already, e.g. does MARC have standards for this?

Format.cpu

Name	CPU Requirement
Definition	The CPU required to use a software resource.
Comments	This element is used in the description of executable programs to identify the kind of CPU that is needed to run them.
Attributes	The code attribute is used to make a precise identification of the required CPU using a controlled vocabulary. See [OLAC-CPU] for the definition of the vocabulary. The element content is typically left empty, but it can be used to add further details about the hardware required for running the resource. These could go beyond CPU details to include memory, disk, and so on. Also lang and scheme (see section 2).
Examples	Software that runs on a Power PC: <format.cpu code="ppc"/> Software that runs on the Intel family of processors but needs at least 64 megabytes of memory: <format.cpu code="x86">At least 64M memory</format.cpu>
*To do*	We need to develop the vocabulary for CPU.

Format.encoding

Name	Character Encoding
Definition	An encoded character set used by a digital resource.
Comments	For a resource that is a digitally encoded text, Format.encoding names the encoded character set it uses. For a resource that is a font, Format.encoding names an encoded character set that it is able to render. For a resource that is a software application, Format.encoding names an encoded character set that it can read as input or write as output. Service providers will use this information to match data files with the software tools that can be applied to them.
Attributes	The code attribute should be used to identify the character set using a controlled vocabulary. See [OLAC-Encoding] for the definition of the vocabulary. Recommended best practice is to specify the encoding via the code attribute while leaving the element content empty. Use the element content only when the controlled vocabulary does not offer an appropriate code, or when further explanation about a custom encoding is needed. Also lang and scheme (see section 2).
Examples
*To do*	We need to develop the controlled vocabulary for Format.encoding. The IANA registry of character set names [IANA-CS] could be used as a starting point, but we will need to innovate beyond this. For instance, we will need to add something about levels of Unicode conformance as defined by our Character Encoding working group. Does MARC have something for this?

Format.markup

Name	Markup Scheme
Definition	A markup scheme used by a digital resource.
Comments	For a resource that is a text file including markup, Format.markup identifies the markup system it uses, such as the SGML DTD, the XML Schema, the set of Standard Format markers, and the like. For a resource that is a stylesheet or a software application, Format.markup names a markup scheme that it can read as input or write as output. Service providers will use this information to match data files with the software tools that can be applied to them. Recommended best practice is to identify the markup scheme by a URI giving an OAI identifier for the markup scheme as a resource in an OLAC archive. Thus, if the DTD, Schema, or markup documentation is not already archived in an OLAC repository, the depositor of a marked-up resource should also deposit the documentation for the markup scheme. A resource identified in Format.markup should not also be listed with the requires refinement of Relation.
Attributes	Only lang and scheme (see section 2).
Examples
*To do*	Do we want to go a step further and have markup schemes be deposited at OLAC so that we can try to avoid duplicate ids for the same DTD? They could have identifiers like `oai:olac:markup:...` and be defined as belonging to a set named markup at the OLAC community data provider so that a single OAI harvesting request would retrieve the complete set of known markup schemes.

Format.os

Name	Operating System Requirement
Definition	An operating system required to use a software resource.
Comments	This element is used in the description of executable programs to identify the operating system environment that is needed to run them. If the resource is a data file in a format that is typically associated with a particular operating system, use Format only (since cross-platform readers are always a possibility).
Attributes	Recommended best practice is to specify the operating system via the code attribute. See [OLAC-OS] for the definition of the controlled vocabulary used for this purpose. The element content would typically be left empty, but it can be used to add further details about the specific operating system version that is required, or about other system software components that are required. Also lang and scheme (see section 2).
Examples	Software that runs under OS/2: <format.os code="OS2"/> Software that runs only under Windows NT 4.0 or higher: <format.os code="MSWindows">NT 4.0 or higher</format.os>

Format.sourcecode

Name	Source Code Language
Definition	A programming language of software distributed in source form.
Comments	Identifies a programming language used by software that is distributed in source code form.
Attributes	Recommended best practice is to specify the programming language via the code attribute. See [OLAC-Sourcecode] for the definition of the controlled vocabulary used for this purpose. The element content would typically be left empty, but it can be used to add further details about the particular version or dialect that was used. Also lang and scheme (see section 2).
Examples	Source code that is written in C++: <format.sourcecode code="C"/> Source code that is written in Java using the version 1.2 library: <format.sourcecode code="Java">Version 1.2 library</format.sourcecode>
*To do*	We need to develop the vocabulary for Sourcecode.

Identifier

Name	Resource Identifier
Definition	An unambiguous reference to the resource within a given context.
Comments	Recommended best practice is to identify the resource by means of a string or number conforming to a globally-known formal identification system. Example formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI), and the International Standard Book Number (ISBN). [DCMES1.1] In the case of a resource that is not electronically encoded, but is housed in a conventional archive, Identifier may be used to give a local shelf or box number, or whatever scheme is used to locate a resource within the collection. Identifiers that begin with "http:" will be interpreted by service providers as URLs and be presented as links in user interfaces. Note that Identifier is to be used only for a URL that retrieves the actual resource; use Description for a URL that retrieves just a description of the resource. Do not specify the "oai:" identifier for the resource itself as a value of Identifier, since it is already given in `<identifier>` in the `<header>` of the `<record>`s returned by the OAI protocol.
Attributes	Only lang and scheme (see section 2).
Examples	A Uniform Resource Locator for retrieval of an electronically encoded resource: <identifier>http://arxiv.org/abs/cs.CL/0010033</identifier> A local identifier for retrieval within a physical collection: <identifier>Shelf 12, Box 7</identifier>

Language

Name	Audience Language
Definition	A language of the intellectual content of the resource.
Comments	Language is used for a language the resource is in, as opposed to the language it describes (see Subject.language). It is related to the audience for the work in that it identifies a language that the creator of the resource assumes that its eventual user will understand. When a resource is in more than one language, use a separate Language element for each language. For a work of literature or other monolingual document aimed at the speakers of a particular language, use Language to identify that language. For a sound recording, use Language for the language being spoken in the recording. For a grammatical description, for instance, use Language for the language the grammar is written in; use Subject.language for the language whose grammar is being described. For an annotated text, use Language for the language in which the annotations are made; use Subject.language for the language of the base text that is being annotated. For a dictionary, use Language for the language in which the definitions are written; use Subject.language for the language whose words are being defined.
Attributes	The code attribute should be used to make a precise identification of the language using a controlled vocabulary. See [OLAC-Language] for the definition of the vocabulary. Recommended best practice is to specify the language via the code attribute while leaving the element content empty. Use the element content only when the controlled vocabulary does not offer an appropriate code, or when further specification is needed, such as to name a specific dialect or to give an alternate name that differs from the default name given by the controlled vocabulary. Service providers should use the code attribute to support searches by language, and may use the element content in searches by keyword. They also may supply the default language name in keyword searching when the element content is missing. Also lang and scheme (see section 2).
Examples	A resource in English about the Sikaiana language: <language code="en"/> <subject.language code="x-sil-SKY"/> A Yemba-French dictionary, where the alternate name Dschang is preferred. <language code="fr"/> <subject.language code="x-sil-BAN">Dschang</subject.language> The American Heritage Dictionary, which is both in and about American English: <language code="en"/> <subject.language code="en"/> <coverage>United States</coverage> A resource about a language for which the controlled vocabulary does not yet provide a code: <subject.language>Ancient Sumerian</subject.language>
*To do*	Add an example of specifying a dialect. Write the OLAC-Language document.

Publisher

Name	Publisher
Definition	An entity responsible for making the resource available
Comments	Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity. [DCMES1.1] Recommended best practice is to identify a Publisher by means of a name and to give the name in a form that is ready for sorting within an index. For the names of persons, this means that the name should be given in inverted order with the surname first. For the names of organizations, this means that any initial article should be omitted. When a resource has more than one publisher, use a separate Publisher element for each one.
Attributes	Only lang and scheme (see section 2).
Examples	A typical publisher: <publisher>Oxford University Press</publisher> The URL for a publisher: <publisher>http://www.oup.com/</publisher>

Relation

Name	Relation
Definition	A reference to a related resource.
Comments	This element is used to document relationships between resources, for instance, part-whole relationships, version relationships, dependency relationships, and so on. When the related resource is also in a participating archive, the reference to the related resource should be by means of an "oai:" identifier. A Relation that begins with "oai:" should be presented by service providers as an active link that retrieves the metadata for that resource. For a required markup definition (like a DTD or Schema) use Format.markup rather than Relation. When the required software or hardware can be inferred from the Format of the resource, do not also use Relation for this purpose. When the present resource is a derived work, the source resource is typically indicated by Source. However, there are two cases in which refinements of Relation are used for derived works. The isVersionOf refinement is used when the present resource is a new version or edition with added intellectual content that has been developed by the same creators. When the new version is nothing more than a rendition in a new format, then the isForrmatOf refinement is used.
Attributes	The refine attribute should be used to clarify the nature of the relationship using values from a controlled vocabulary (for instance, isReplacedBy, requires, hasPart, isPartOf, and so on). The vocabulary for refinements to Relation is defined in [DC-Q]. Also lang and scheme (see section 2).
Examples	A link to a required font: <relation refine="requires">oai:sil:software/ipafont</relation> Links to the component pieces of a collected work: <relation refine="hasPart">oai:somearchive:holding126</relation> <relation refine="hasPart">oai:somearchive:holding127</relation> <relation refine="hasPart">oai:somearchive:holding128</relation> <relation refine="hasPart">oai:somearchive:holding129</relation> <relation refine="hasPart">oai:somearchive:holding130</relation> A chapter that was published in a book (that does not have an archived metadata record): <relation refine="isPartOf">In Joel Sherzer and Greg Urban (eds.), Native South American discourse , 237-306. Berlin: Mouton. </relation>

Rights

Name	Rights Management
Definition	Information about rights held in and over the resource.
Comments	Typically, a Rights element will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the Rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource. [DCMES1.1]
Attributes	Recommended best practice is to use the code attribute to make a summary statement about rights using a controlled vocabulary. If the coded value adequately documents the rights management issues, leave the element content empty. Otherwise, use the content to add the relevant details. See [OLAC-Rights] for the definition of the vocabulary. Also lang and scheme (see section 2).
Examples
*To do*	Write the OLAC-Rights document. Add examples after we work out the controlled vocabulary.

Source

Name	Source
Definition	A reference to a resource from which the present resource is derived.
Comments	The present resource may be derived from the Source resource in whole or in part. [DCMES1.1] When the Source resource is also available in an on-line archive, recommended best practice is to reference it by means of an "oai:" identifier. In this case service providers should presented the Source as an active link that retrieves the metadata for that resource. In the legal parlance of intellectual property rights, a "derivative work" is one that is based on one or more preexisting works. This includes cases in which a work may be recast, transformed, or adapted in any way, such as by translation, abridgement, dramatization, recording, transcription, or digital encoding. A work consisting of editorial revisions, annotations, elaborations, or other modifications, which, as a whole, represent an original work of authorship is also a derivative work [Copyright]. This legal definition elucidates the sense of "derived" that is intended in the Dublin Core definition of Source. Note, however, that this overlaps with Relation, since the isFormatOf and isVersionOf refinements both refer to derivative works in the technical sense. Recommended best practice for OLAC is to reserve the use of Source for cases when the present resource has a substantially different Type or Creator or Title than the Source resource. When the difference from the original work is only a difference of Date or Format or Publisher or Identifier or edition designator in the Title, then use Relation to encode the relationship to the original work.
Attributes	Only lang and scheme (see section 2).
Examples	Source for a digital encoding of a manuscript in a participating archive: <source>oai:somearchive:holding1023</source> Source for data extracted from a published source: <source>Kwara'ae flora vocabulary extracted from Guide to the Forests of the British Solomon Islands, by T. C. Whitmore. Oxford University Press, 1966.</source>

Subject

Name	Subject and Keywords
Definition	The topic of the content of the resource.
Comments	Typically, a Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. [DCMES1.1] Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme. The scheme attribute is used to identify the scheme.
Attributes	Only lang and scheme (see section 2).
Examples	A Library of Congress subject heading: <subject scheme="LCSH">African languages</subject>
*To do*	Geographic and temporal aspects of language varieties go in Coverage. Subject is where social dimensions of language varieties would go. Does the community need to develop an encoding scheme for this? Add an example.

Subject.language

Name	Subject Language
Definition	A language which the content of the resource describes or discusses.
Comments	See Language for a complete discussion (with examples) of using the Language and Subject.language elements.
Attributes	As with the Language element, a code attribute is used to identify the language precisely. Also lang and scheme (see section 2).
Examples

Title

Name	Title
Definition	A name given to the resource.
Comments	Typically, a Title will be a name by which the resource is formally known. [DCMES1.1] A translation of the title can be supplied in a second Title element. Use the lang attribute to identify the language of these elements.
Attributes	The refine attribute may optionally be used for a single refinement supported by [DC-Q]. The value alternative indicates a form of the title that is used as a substitute or alternative to the formal title of the resource. This qualifier can include abbreviated titles as well as translations. Recommended best practice is that only one instance of Title, namely the original title, be unqualified. Also lang and scheme (see section 2).
Examples	A typical title: <title>A Dictionary of the Nggela Language</title> A vernacular title with translation: <title lang="x-sil-LLU">Na tala 'uria na idulaa diana</title> <title refine="alternative" lang="en">The road to good reading</title>

Type

Name	Resource Type
Definition	The nature or genre of the content of the resource.
Comments	Type includes terms describing general categories, functions, genres, or aggregation levels for content. To describe the physical or digital manifestation of the resource, use the Format element. [DCMES1.1]
Attributes	Recommended best practice is to use the code attribute to identify the type using the Dublin Core Types controlled vocabulary [DC-Type]. The element content is typically left empty. Use the element content only when the controlled vocabulary does not offer an appropriate type, or when further specification is needed. Also lang and scheme (see section 2).
Examples	The resource is a video recording: <type code="Image"/>

Type.functionality

Name	The functionality of a software resource.
Definition	Software Functionality
Comments	This element is used with resources that are software applications to classify what they are used for.
Attributes	Recommended best practice is to use the code attribute to identify the functionality using a controlled vocabulary. See [OLAC-Functionality] for the definition of the vocabulary. Use the element content only when the controlled vocabulary does not offer an appropriate code, or when further explanation about the functionality is needed. Also lang and scheme (see section 2).
Examples
*To do*	Write the OLAC-Functionality document. We may want to base it on the HLT Survey http://cslu.cse.ogi.edu/HLTsurvey/ as advocated by the ACL/DFKI Natural Language Software Registry. Add examples after we figure out the vocabulary.

Type.linguistic

Name	Linguistic Data Type
Definition	The nature or genre of the content of the resource from a linguistic standpoint.
Comments	For a resource that is information in or about a language, Type.linguistic identifies what kind of information it is from a linguistic standpoint. For a resource that is a software tool, Type.linguistic identifies what kind of information it processes. Service providers may use this information to match data files with software tools that might be applied to them.
Attributes	Recommended best practice is to the code attribute to identify the linguistic data type using a controlled vocabulary. See [OLAC-Linguistic-Type] for the definition of the vocabulary. The vocabulary uses a two-level coding system. There are four primary types —transcription, annotation, description, and lexicon— each of which have a number of subtypes. Use the element content only when the controlled vocabulary does not offer an appropriate code, or when further explanation about the data type is needed. Also lang and scheme (see section 2).
Examples	The resource describes the grammar of a language: <type.linguistic code="description/grammar"/> The resource includes the orthographic transcription of text: <type.linguistic code="transcription/orthographic"/>
*To do*	Write the OLAC-Linguistic-Type document.

To do

There is not yet a provision for handling subject categorization by linguistic classification.

Rights.software has been left in limbo. It may be possible to unify it with Rights.

References

[Copyright]	Section 101, Definitions, Copyright Law of the United States of America and Related Laws Contained in Title 17 of the United States Code. <http://www.loc.gov/copyright/title17/92chap1.html#101>
[DC-Q]	Dublin Core Qualifiers. <http://dublincore.org/documents/2000/07/11/dcmes-qualifiers/>
[DC-Type]	DCMI Type Vocabulary. <http://dublincore.org/documents/2000/07/11/dcmi-type-vocabulary/>
[DCMES1.1]	Dublin Core Metadata Element Set, Version 1.1: Reference Description. <http://dublincore.org/documents/1999/07/02/dces/>
[DCMI-Registry]	DCMI Open Metadata Registry. <http://wip.dublincore.org:8080/registry/Registry>
[DCMI-Submit]	DCMI Open Metadata Registry: Submit a Scheme for Registration. <http://wip.dublincore.org:8080/registry/authServlet?req_type=newscheme&lang=enUS>
[DCQ-HTML]	Recording qualified Dublin Core metadata in HTML meta elements. <http://dublincore.org/documents/2000/08/15/dcq-html/>
[IANA-CS]	Internet Character Sets. <http://www.isi.edu/in-notes/iana/assignments/character-sets>
[MIME]	Internet Media Types. <http://www.isi.edu/in-notes/iana/assignments/media-types/media-types>
[OAI]	Open Archives Initiative. <http://www.openarchives.org/>
[OAI-DC]	Schema for OAI implementation of Dublin Core metadata. <http://www.openarchives.org/OAI/1.1/dc.xsd>
[OLAC]	Open Language Archives Community. <http://www.language-archives.org/>
[OLAC-CPU]	OLAC CPU Vocabulary. <http://www.language-archives.org/OLAC/???>
[OLAC-Encoding]	OLAC Encoding Vocabulary. <http://www.language-archives.org/OLAC/???>
[OLAC-Format]	OLAC Format Vocabulary. <http://www.language-archives.org/OLAC/???>
[OLAC-Functionality]	OLAC Functionality Vocabulary. <http://www.language-archives.org/OLAC/???>
[OLAC-Language]	OLAC Language Vocabulary. <http://www.language-archives.org/OLAC/???>
[OLAC-Linguistic-Type]	OLAC Linguistic Data Type Vocabulary. <http://www.language-archives.org/OLAC/???>
[OLAC-OS]	OLAC Operating System Vocabulary. <http://www.language-archives.org/OLAC/???>
[OLAC-Registration]	The OLAC Registration Service. <http://www.language-archives.org/OLAC/???>
[OLAC-Rights]	OLAC Rights Vocabulary. <http://www.language-archives.org/OLAC/???>
[OLAC-Role]	OLAC Role Vocabulary. <http://www.language-archives.org/OLAC/???>
[OLAC-Scheme]	OLAC Vocabulary for Encoding Schemes. <http://www.language-archives.org/OLAC/???>
[OLAC-Sourcecode]	OLAC Source Code Vocabulary. <http://www.language-archives.org/OLAC/???>
[OLAC-WP]	White Paper on Establishing an Infrastructure for Open Language Archiving <http://www.language-archives.org/docs/white-paper.html>
[TGN]	Getty Thesaurus of Geographical Terms. <http://www.getty.edu/research/tools/vocabulary/tgn/index.html>
[W3CDTF]	Date and Time Formats, W3C Note. <http://www.w3.org/TR/NOTE-datetime>

OLAC Metadata Set

Table of contents

1. Introduction

2. Attributes

3. Metadata elements

Contributor

Coverage

Creator

Date

Description

Format

Format.cpu

Format.encoding

Format.markup

Format.os

Format.sourcecode

Identifier

Language

Publisher

Relation

Rights

Source

Subject

Subject.language

Title

Type

Type.functionality

Type.linguistic

To do

References