OLAC Metadata Usage Guidelines

Date issued:2007-11-15
Status of document:Draft Informational Note. This is only a preliminary draft that is still under development; it has not yet been presented to the whole community for review.
This version:http://www.language-archives.org/NOTE/usage-20071115.html
Latest version:http://www.language-archives.org/NOTE/usage.html
Previous version:None.
Abstract:

This document provides guidelines on the meaning and proper usage of the metadata elements used in the metadata standard of the Open Language Archives Community.

Editors: Gary Simons, SIL International (mailto:gary_simons@sil.org)
Steven Bird, University of Pennsylvania (mailto:sb@ldc.upenn.edu)
Joan Spanne, SIL International (mailto:joan_spanne@sil.org)
Copyright © 2007 Gary Simons (SIL International), Steven Bird (University of Pennsylvania), and Joan Spanne (SIL International). This material may be distributed and repurposed subject to the terms and conditions set forth in the Creative Commons Attribution-ShareAlike 2.5 License.

Table of contents

  1. Introduction
  2. All elements
  3. Core elements
  4. Other elements
References

1. Introduction

This document provides guidelines on how to describe language resources using the OLAC metadata standard [OLAC-Metadata]. The standard itself documents the formal syntax of valid metadata records, but does not explain or exemplify the use of the individual metadata elements. That is the purpose of this document.

Section 2 below describes the usage of the attributes that may be used on metadata elements. Section 3 then describes usage in the OLAC context for the fifteen core elements of the Dublin Core metadata set. Section 4 describes other elements that may be used.

2. All elements

OLAC metadata is an application of the Dublin Core metadata element set as defined in [DCMT]. The most fundamental requirement on best practice in OLAC is that each element should be used in a way that conforms to its definition in [DCMT]. This document therefore repeats those definitions and offers usage notes and examples to help the reader understand best use of the elements.

Eight of the elements in the Dublin Core metadata set define refinements that narrow the meaning of the element. A refinement shares the meaning of the generic element, but with a more restricted scope. When the more restricted meaning applies, it is recommended best practice to use the refined version of an element rather than the generic one. This document lists the refinements that are possible for each element.

In theory, every element in the metadata set may use the xsi:type attribute to specify an encoding scheme that defines a controlled vocabulary or a controlled syntax for its values. This document lists possible encoding schemes for eleven of the elements, whether defined by [DCMT] or [OLAC-Extensions]. In the case of the OLAC extensions, the controlled value goes in the olac:code attribute (with the element content being available for an optional freeform elaboration of the coded value), while in the case of DCMI encoding schemes the controlled value goes in the element content. When one of these encoding schemes is applicable, it is recommended best practice to use it since this adds a precision to the value of the element that can be exploited to improve accuracy in searching and to provide domain-specific services. Subcommunities may follow the extension mechanism defined in section 5 of [OLAC-Metadata] to define new encoding schemes.

Every element in the metadata set may use the xml:lang attribute. It specifies the language in which the text in the content of the element is written. The value for the attribute should be a three-letter code from the ISO 639-3 controlled vocabulary as defined in [OLAC-Language]. In the absence of an xml:lang attribute, the element content is assumed to be in English (unless it is from a DCMI encoding scheme). Whenever the language of the element content is other than English, recommended best practice is to use the xml:lang attribute to identify the language. By using multiple instances of a metadata elements tagged for different languages, data providers may offer their metadata records in multiple languages.

In the formal definition of the metadata standard [OLAC-Metadata], every element and refinement is optional and repeatable. However, recommended best practice (as described below for particular elements and in [OLAC-BP]) requires that certain elements be present in order to give a basic breadth of description for a language resource. Recommended best practice also requires that one particular element (namely, Title) not be repeated. Since all other elements and refinements are repeatable, multiple values for an element or refinement should be given in multiple instances of the XML element rather than being listed in a single instance. This makes it possible for service providers to build search indexes on the individual values.

To summarize, the following are recommended best practices that apply to all elements:

Best practice 

Recommended best practice is for the value of each metadata element to conform to the definition of that element as given in [DCMT].

When the meaning of a particular element in a metadata record fits the definition of a refinement, recommended best practice is to use the refinement rather than the generic element.

When possible, recommended best practice is to use the xsi:type attribute to specify an encoding scheme for adding precision to the value of the element.

Whenever the language of the element content is other than English, recommended best practice is to use the xml:lang attribute with a value from [ISO639-3] to identify the language.

When a resource has more than one value for a particular metadata element or refinement, recommended best practice is to use a separate instance of the element or refinement for each value rather than listing all the values in a single instance.

3. Core elements

Each of the fifteen core elements of the Dublin Core metadata set is described in one of the following subsections. These elements are defined in the http://purl.org/dc/elements/1.1/ namespace for which the prefix dc: is used in the examples.

Each subsection heading gives the name of an element; the element is then described under six subheadings. Definition is the definition for the element as given by the Dublin Core Metadata Initiative [DCMT]. Refinements lists the refined element names that may be used in place of the generic element name to specify a more precise meaning for the metadata item. The refinement definitions are copied from [DCMT]. All of the refinements are defined within the http://purl.org/dc/terms/ namespace for which the prefix dcterms: is used in the examples. Schemes lists the encoding schemes that may be used with this element (or its refinements) as the value of the xsi:type attribute. Best practice lists the best practices for the use of this element that are recommended by the OLAC community [OLAC-BP]. Usage notes offers additional notes for the OLAC community on how to use the element in the context of describing language resources. Examples shows samples of properly encoded elements and sample refinements.

For many elements, you can have more than one instance. This would be the case for a work with more than one author, or for which you want to specify two or three languages, multiple subject terms, etc. In such cases, repeat the element for each person, language, etc., rather than placing all of them in a single instance of the field. Most elements can be repeated if needed.

Contributor

DefinitionAn entity responsible for making contributions to the resource.
Refinements

There are no refinements for this element.

Schemes
olac:role  

The olac:role code is optionally used to specify the role (such as transcriber, sponsor, and so on) played by the named entity in the creation of the resource. The role is expressed by means of a controlled vocabulary; see [OLAC-Role] for the definition of the vocabulary.

Best practice 

Recommended best practice is to identify a Contributor by means of a name in a form that is ready for sorting within an index. For the names of persons, this means that the name should be given with the usual entry element first (i.e., an "inverted" order for most Western name forms where the surname is the usual entry element). For the names of organizations, this means that any initial article should be omitted.

Recommended best practice is to use the olac:role scheme to indicate the role of the Contributor.

Usage notes 

A Contributor may be a person, an organization, or a service. Contributor is closely related to Creator. A case for distinguishing the usage of the two elements can be made based on the degree of responsibility for the content. The Contributor designation can be used for those entities whose role in the creation of the resource is not great enough to merit recognition as a primary source of the intellectual content. Thus, the Contributor element would be used to identify institutions that sponsored or funded the work, or to identify individuals who played a secondary role in the development of the resource (e.g., roles such as consultant, depositor, responder). The Creator element would be used to identify individuals or group entities that are primarily responsible for the intellectual content (e.g., roles such as author, performer).

The application of the olac:role schema with its more precise vocabulary makes drawing a distinction between the less precise concepts of Contributor and Creator of much less significance. Thus, an alternative to making the above distinction between Contributor and Creator is to use only the Contributor element, always specifying the more precise nature of the contribution through the use of the olac:role vocabulary (as is recommended best practice in any case).

Contributor and Publisher are also somewhat related elements. In particular, the olac:role "sponsor" can come close to one aspect of participation that a publisher may have in regard to making a given resource available to the public. The olac:role "editor" is frequently one of the individual tasks that are collectively performed by the publisher of a work. A "sponsor" that is not also the formal publisher of a resource may be entered in a Contributor element (with the proper olac:role specified), but the publisher should not be repeated in a Contributor element.

The entry element for a personal name will vary according to conventions used in a given location and with a given language. General practice among librarians is to select as the entry element that part of the name under which the person would normally be listed in an authoritative alphabetical list in his or her own language or country of residence or primary activity. If the entry element is a surname, follow the surname with a comma. If the entry element is the "first name" or "given name," do not follow it with a comma. Consult the Anglo-American Cataloguing Rules, 2nd ed., rev., [AACR2r] chapter 22 for additional guidance in determining entry elements and the normal ordering and forms of names based on language and country, as well as other name characteristics.

Be consistent with corporate name forms. Names that include a primary unit and subsidiary unit should be ordered with the primary unit given first. Location should generally be omitted from the name unless location is an integral part of the proper name of the unit. More guidance on the forms of corporate names may be found in [AACR2r], chapter 24.

Do not use quotation marks to enclose corporate names. If the name is a translation from its usual form or is not usually given in English, use the lang attribute.

Examples

A generic contributor:

<dc:contributor>Smith, John L.</dc:contributor>

A funding agency:

<dc:contributor xsi:type="olac:role" olac:code="sponsor">Smithsonian Institution. 
   Office of Fellowships and Grants.</dc:contributor>

The person who performed a role for which there is not a suitable code:

<dc:contributor>Smith, John L. (format conversion)</dc:contributor>

A corporate name given in a language differing from the majority of the metadata record:

<dc:contributor xml:lang="ces">Česká akademie věd a umění</dc:contributor>

Coverage

DefinitionThe spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.
Refinements
spatial  

Spatial characteristics of the intellectual content of the resource.

temporal  

Temporal characteristics of the intellectual content of the resource.

Schemes
dcterms:Box  

The DCMI Box identifies a region of space using its geographic limits [DCMI-Box].

dcterms:ISO3166  

ISO 3166 Codes for the representation of names of countries [ISO3166].

dcterms:Period  

A specification of the limits of a time interval [DCMI-Period]

dcterms:Point  

The DCMI Point identifies a point in space using its geographic coordinates [DCMI-Point].

dcterms:TGN  

The Getty Thesaurus of Geographic Names [TGN].

Best practice 

Recommended best practice is that a metadata record should contain at least one Description (or one of its refinements) or Coverage (or one of its refinements) or Subject element. While these are different in purpose and content, they nevertheless give the prospective user some idea of the content of the resource that goes beyond the informative potential of just a title alone. Using all of these elements is encouraged.

In the case of Spatial coverage, recommended best practice is to use an encoding scheme to give precise geocoding of the resource.

Usage notes 

Coverage will typically include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). [DCMT]

Coverage is related to Subject in having a topical orientation. Coverage in its temporal aspect should not be confused with Date or one of its refinements (e.g., created, or available). Thus, a data set that was compiled over an extended period of time should be described via dcterms:created whose content is a date range, rather than with Coverage or dcterms:temporal. A data set (or any kind of resource) that contains or addresses historical information would correctly use Coverage or dcterms:temporal to specify the date, date range or period in focus.

In the OLAC context, service providers already have a database that maps languages to the countries in which they are spoken [OLAC-Language]. Coverage should not be used to duplicate this information; rather service providers will support searches concerning languages spoken in a given country by referring to the language database. Coverage should be used geographically only when the language involved has a wide distribution and the resource focuses on its use in a particular region or geopolitical jurisdiction, or conversely, when the resource deals with a topic of study in which the region itself is in focus, e.g., multilingualism, language polilcy, languages in contact, in a given locale.

Examples

A resource about English in India:

<dc:subject xsi:type="olac:ISO639" olac:code="en"/><dcterms:spatial>India</dcterms:spatial>

A resource about languages spoken on Guadalcanal:

<dcterms:spatial xsi:type="dcterms:TGN">Guadalcanal (island)</dcterms:spatial>

A resource about language use in the 19th century:

<dcterms:temporal>19th century</dcterms:temporal>

Creator

DefinitionAn entity primarily responsible for making the resource.
Refinements

There are no refinements for this element.

Schemes
olac:role  

The refine attribute is optionally used to specify the role (such as author, editor, translator, and so on) played by the named entity in the creation of the resource. The role is expressed by means of a controlled vocabulary; see [OLAC-Role] for the definition of the vocabulary.

Best practice 

Recommended best practice is as for the Contributor element above.

Usage notes 

A Creator may be a person, an organization, or a service. Creator is closely related to Contributor. Creator may be differentiated from Contributor based on the degree of involvement as a primary source of the intellectual content. In determining whether an entity is a Creator (as opposed to a Contributor), consider the role in the creation of the resource, using the same criteria that are followed for deciding that an entity should be listed as an "author" in a bibliographic reference of the resource. Entities that do not merit that level of recognition should be treated as Contributors.

The application of the olac:role schema with its more precise vocabulary makes drawing a distinction between the less precise concepts of Contributor and Creator of much less significant. Thus, an alternative to making the above distinction between Contributor and Creator is to use only the Contributor element, always specifying the more precise nature of the contribution through the use of the olac:role vocabulary (as is recommended best practice in any case).

Examples

A personal author:

<dc:creator>Bloomfield, Leonard</dc:creator>

An institutional author:

<dc:creator>Linguistic Society of America</dc:creator>

An editor:

<dc:creator xsi:type="olac:role" olac:code="editor">Sapir, Edward</dc:creator>

Date

DefinitionA point or period of time associated with an event in the lifecycle of the resource.
Refinements
available  

Date (often a range) that the resource will become or did become available.

created  

Date of creation of the resource.

dateAccepted  

Date of acceptance of the resource (e.g. of thesis by university department, of article by journal, etc.).

dateCopyrighted  

Date of a statement of copyright.

dateSubmitted  

Date of submission of the resource (e.g. thesis, articles, etc.).

issued  

Date of formal issuance (e.g., publication) of the resource.

modified  

Date on which the resource was changed.

valid  

Date (often a range) of validity of a resource.

Schemes
dcterms:W3CDTF  

W3C Encoding rules for dates and times - a profile based on ISO 8601 [W3CDTF].

Best practice 

Recommended best practice is that a record have at least one instance of Date (or one of its refinements).

Recommended best practice is that every instance of Date (or one of its refinements) use the dcterms:W3CDTF scheme, or enclose the element value in square brackets if the value does not conform to the encoding scheme (e.g., if it is supplied by the cataloger, is approximate, or is in some doubt).

Usage notes 

A WSCDTF-conformant value may be for the year alone, for the year and month, or for an exact date. A range of years is also a valid value; in this case both years must be four-digit values and the earliest year should come first. Thus, the element value may match one of the four following patterns: YYYY, YYYY-MM, YYYY-MM-DD, or YYYY-YYYY.

Paragraph about non-W3cDTF values.

A Date with no refinement will be assumed to be the date of issue (i.e. publication). Date of publication may differ from its creation date; thus it can be helpful to prospective users to include more than one kind of Date element (refinement). However, a single resource should not have more than one instance of each date refinement, with the possible exception of dcterms:modified. Where the modification history of a resource is significant, it is appropriate to enter this information in a Provenance element, which allows for explanation of modifications in a way that dcterms:modified does not.

‘Available’ as a refinement pertains to Date, not to other aspects of availability.

Do not use additional terms such as “recorded on” or “donated on” in the element text.

Examples

A typical year of publication:

<dc:date xsi:type="dcterms:W3CDTF">1992</dc:date>

A resource modified on October 16, 1996:

<dcterms:modified xsi:type="dcterms:W3CDTF">1996-10-16</dcterms:modified>

A resource from approximately 1950:

<dc:date>[circa 1950]</dc:date>

A resource for which the date has been supplied by the cataloger from external informaiton:

<dcterms:issued>[1932]</dcterms:issued>

Description

DefinitionAn account of the resource.
Refinements
abstract  

A summary of the content of the resource.

tableOfContents  

A list of subunits of the content of the resource.

Schemes

There are no encoding schemes for this element.

Best practice 

Recommended best practice is that a metadata record should contain at least one Description (or one of its refinements) or Coverage (or one of its refinements) or Subject element. While these are different in purpose and content, they nevertheless give the prospective user some idea of the content of the resource that goes beyond the informative potential of just a title alone. Using all of these elements is encouraged.

Usage notes 

Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content, or a free-text summary account of the content. [DCMT]

Description may also offer an annotation, or a qualitative or evaluative comment about the resource, such as a statement about suitability for a particular application or context.

Description element is not a "catch-all" category. The Description element should not be confused with Subject. Description contains free-text (prose) statements, while Subject contains subject headings, faceted terms, or specific subject descriptors, e.g., Library of Congress Subject Headings, or Linguistics and Language Behavior Abstracts thesaurus descriptors. Description should likewise not be used in place of Extent to contain information solely about the length, duration, or size of a work.

No formatting conventions are defined within the text of Description. Service providers may format the entire Description as a single paragraph, collapsing adjacent white space characters into a single space.

When there is a URL for a document that describes the resource, use a separate Description element to encode just that URL. A Description that begins with "http:" will be interpreted by service providers as consisting solely of a URL and will be presented as a link in user interfaces. Service providers are not obliged to search other Description text for the occurrence of URLs.

Examples

A prose description of a resource:

<dc:description>The CALLHOME Japanese corpus of telephone speech consists of 120
   unscripted telephone conversations between native speakers of Japanese. All calls, which lasted
   up to 30 minutes, originated in North America and were placed to locations overseas (typically
   Japan). Most participants called family members or close friends. This corpus contains speech
   data files ONLY, along with the minimal amount of documentation needed to describe the contents
   and format of the speech files and the software packages needed to uncompress the speech data.
   </dc:description>

A contents note for a resource:

<dcterms:tableOfContents>v. 1: Thesaurus of Khmu dialects in Southeast Asia; v. 2:
   Dictionary of Khmu in China; v. 3: Dictionary of Khmu in Laos; v. 4: Dictionary of Khmu in
   Vietnam; v. 5: Dictionary of Khmu in Thailand</dcterms:tableOfContents>

A reference to an existing on-line description:

<dc:description>http://www.ldc.upenn.edu/Catalog/LDC96S37.html</dc:description>

Format

DefinitionThe file format, physical medium, or dimensions of the resource.
Refinements
extent  

The size or duration of the resource.

medium  

The material or physical carrier of the resource.

Schemes
dcterms:IMT  

The Internet media type of the resource [IMT].

Best practice 

In the case of a digital resource, recommended best practice is to express the Format using a MIME type from the dcterms:IMT scheme.

Usage notes 

Typically, Format may include the media-type, physical material, and/or dimensions of the resource. Format may be used to determine the software, hardware or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. [DCMT]

In the case of a physical object, the refinements medium and extent are most applicable. These may respectively contain the physical material of the object (e.g., parchment; reel-to-reel tape; 35 mm slide; NTSC encoded VHS cassette), and the size/duration (e.g. number of pages and dimensions of a book; length of a recording).

Format (and its refinement 'medium'), which encompasses the notion of physical media categories such as photograph, should not be confused with Type, which categorizes the nature or genre of the content of the resource based on the mode in which the user experiences or interacts with the resource, and as such includes terms that describe general categories, functions, or aggregation levels for content.

Examples

For a digitally encoded dictionary:

<dc:format xsi:type="dcterms:IMT">text/xml</dc:format>
   <dcterms:extent>5,237 entries in a 1.2M XML file.</dcterms:extent>

For a digitally recorded text:

<dc:format xsi:type="dcterms:IMT">audio/wav</dc:format>
   <dc:format>Sampling: 1 channel, 22 KHz, 8 bits.</dc:format>
   <dcterms:extent>Duration: 153 seconds. Size: 3.3M. </dcterms:extent>

For 16mm film with sound track

<dcterms:medium>sound, color;  16mm film</dcterms:medium>
   <dcterms:extent>30 min.</dcterms:extent>

Identifier

DefinitionAn unambiguous reference to the resource within a given context.
Refinements
bibliographicCitation  

A bibliographic reference for the resource.

Schemes
dcterms:URI  

A Uniform Resource Identifier [URI].

Best practice 

When the value of Identifier is a Uniform Resource Locator (URL), recommended best practice is to specify the dcterms:URI scheme.

Usage notes 

In the case of a resource that is not electronically encoded, but is housed in a conventional archive, Identifier may be used to give a local shelf or box number, or whatever scheme is used to locate a resource within the collection.

Identifiers that begin with "http:" will be interpreted by service providers as URLs and be presented as links in user interfaces. Complete relative link paths with the base server so that the link will be valid outside the search system of the host archives.

Identifier is of particular value in enabling other resources to unambiguously refer to the resource in focus in terms of a specific relationship. See Relation for best practice and usage notes in creating relationship links between resources.

When using the bibliographicCitation refinement, there is no prescription as to a bibliographic style to use in forming the citation. The only requirement is that sufficient bibliographic detail be included to identify the resource unambiguously.

A unique identifier that has been assigned to a digital resource by the researcher, a research organization or department, or an archives or library for internal reference may continue to be useful in identifying the resource alongside a URI assigned in the context of a digital repository. Such a unique identifier may be significant for continuity of identification for a digital resource throughout its lifecycle, as other resources may have used such an identifier in referencing the object prior to the assignment of the URI later in its lifecycle.

Do not specify the "oai:" identifier for the resource itself as a value of Identifier, since it is already given in <identifier> in the <header> of the <record>s returned by the OAI protocol.

Do not use Identifier for a URL intended to direct the searcher to a Description, availability or usage or other Rights-related information. A URI may be used in each of those elements or refinements to link to such information. Likewise, Identifier should not be used to refer to a general collection site in which the resource is one of many works available.

Examples

A Uniform Resource Locator for retrieval of an electronically encoded resource:

<dc:identifier xsi:type="dcterms:URI">http://arxiv.org/abs/cs.CL/0010033</dc:identifier>

A local identifier for retrieval within a physical collection:

<dc:identifier>Series: A, Box 7</dc:identifier>

The citation for a chapter that was published in a book:

<dcterms:bibliographicCitation>In Joel Sherzer and Greg Urban (eds.), Native South
American discourse, 237-306. Berlin: Mouton. <dcterms:bibliographicCitation>

Language

DefinitionA language of the resource.
Refinements

There are no refinements for this element.

Schemes
olac:ISO639-3  

The ISO 639-3 set of three-letter identifiers for the identification of human languages. See [OLAC-Language] for the definition of the vocabulary.

Best practice 

Recommended best practice is that every record contain at least one Language element. For a resource that does not include language content, include a Language element containing the code zxx for "No linguistic content."

Recommended best practice is to use the olac:ISO639-3 scheme to identify the Language precisely. Specify the language in the olac:code attribute while leaving the element content empty. Use the element content only when the controlled vocabulary does not offer an appropriate code, or when further specification is needed, such as to name a specific dialect or to give an alternate name that differs from the reference name given by the controlled vocabulary.

Usage notes 

Language is used for a language the resource is in, as opposed to the language it describes (see Subject). It is related to the audience for the work in that it identifies a language that the creator of the resource assumes that its eventual user will understand. When a resource is in more than one language, use a separate Language element for each language.

For a work of literature or other monolingual document aimed at the speakers of a particular language, use Language to identify that language. For a sound recording, use Language for the language being spoken in the recording. For a grammatical description, for instance, use Language for the language the grammar is written in; use Subject with the ISO639-3 scheme (hereafter referred to as "Subject.language") to specify the language whose grammar is being described. For an annotated text, use Language for the language in which the annotations are made; use Subject.language for the language of the base text that is being annotated. For a dictionary, use Language for the language in which the definitions are written; use Subject.language for the language whose words are being defined. A bilingual dictionary may have both languages identified in both kinds of elements.

For scholarly works, and works intended for a wider audience of communication, Language will usually be a major language (English, French, Chinese, etc.). For vernacular works in a minority language, Language should contain the code for the minority language, and an appropriate OLAC Linguistic Type (see Type) should be selected.

Service providers should use the code attribute to support searches by language, and may use the element content in searches by keyword. They also may supply the default language name in keyword searching when the element content is missing.

Examples

A resource in English about the Sikaiana language:

<dc:language xsi:type="olac:ISO639-3" olac:code="eng"/> 
   <dc:subject xsi:type="olac:ISO639-3" olac:code="sky"/>

A Yemba-French dictionary, where the alternate name Dschang is preferred.

<dc:language xsi:type="olac:ISO639-3" olac:code="fra"/>
   <dc:subject xsi:type="olac:ISO639-3" olac:code="ybb">Dschang</dc:subject>

The American Heritage Dictionary, which is both in and about American English:

<dc:language xsi:type="olac:ISO639-3" olac:code="eng"/>
   <dc:subject xsi:type="olac:ISO639-3" olac:code="eng"/>
   <dcterms:spatial>United States</dcterms:spatial>

A resource in the Saracatsan dialect of modern Greek

<dc:language xsi:type="olac:ISO639-3" olac:code="ell">Saracatsan dialect</dc:language>

Publisher

DefinitionAn entity responsible for making the resource available.
Refinements

There are no refinements for this element.

Schemes

There are no encoding schemes for this element.

Best practice 

Recommended best practice is to identify a Publisher by means of a name in a form that is ready for sorting within an index. For the names of persons, this means that the name should be given with the surname first (i.e., an "inverted" order for most Western name forms). For the names of organizations, this means that any initial article should be omitted.

Usage notes 

Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity. [DCMT] The Publisher is generally the business, organization, or sometimes individual who takes responsibility (and provides financial and production resources) for putting the resource into a form suitable for making it publicly available, whether in multiple hard copies (printing, pressing of a disc, etc), by broadcasting (radio, television, webcast), or posting to a public website. A published resource may be offered for sale or freely, but in either case, it is released for public use.

Many materials that are placed in archival repositories are not formally published in that they have not undergone an editorial, composition and production process, involving review, modification, and packaging by persons other than the original creator(s). While such items may be offered publicly, they are not considered "published" in a formal sense and their metadata should not ordinarily specify a publisher.

For an entity that has funded the creation or development of the resource in some way, but has not otherwise participated in its development through editing, composition, production, printing, marketing, posting, etc., use the Contributor element specifying the [OLAC-Role] code="sponsor".

Be consistent with corporate name forms. Names that include a primary unit and subsidiary unit should be ordered with the primary unit given first. Location should generally be omitted from the name unless location is an integral part of the proper name of the unit.

Examples

A typical publisher:

<dc:publisher>Oxford University Press</dc:publisher>

The URL for a publisher:

<dc:publisher xsi:type="dcterms:URI">http://www.oup.com</dc:publisher>

Relation

DefinitionA related resource.
Refinements
conformsTo  

A reference to an established standard to which the resource conforms.

hasFormat  

The described resource pre-existed the referenced resource, which is essentially the same intellectual content presented in another format.

hasPart  

The described resource includes the referenced resource either physically or logically.

hasVersion  

The described resource has a version, edition, or adaptation, namely, the referenced resource.

isFormatOf  

The described resource is the same intellectual content of the referenced resource, but presented in another format.

isPartOf  

The described resource is a physical or logical part of the referenced resource.

isReferencedBy  

The described resource is referenced, cited, or otherwise pointed to by the referenced resource.

isReplacedBy  
isRequiredBy  

The described resource is required by the referenced resource, either physically or logically.

isVersionOf  

The described resource is a version, edition, or adaptation of the referenced resource. Changes in version imply substantive changes in content rather than differences in format.

references  

The described resource references, cites, or otherwise points to the referenced resource.

replaces  

The described resource supplants, displaces, or supersedes the referenced resource.

requires  

The described resource requires the referenced resource to support its function, delivery, or coherence of content.

Schemes
dcterms:URI  

A Uniform Resource Identifier [URI].

Best practice 

When the related resource is also held in a participating archive, recommended best practice is to identify the related resource by means of its OAI identifier. A Relation that begins with "oai:" will typically be presented by service providers as an active link that retrieves the metadata for that resource.

If the related resource is not cataloged in the system of a participating archive, recommended best practice is to identify the related resource through a standard unique identifier. If the related resource is available online, specify the dcterms:URI scheme and give a stable Uniform Resource Locator (URL).

Usage notes 

This element is used to document relationships between resources, for instance, part-whole relationships, version relationships, dependency relationships, and so on

When the present resource is a derived work, the source resource is typically indicated by Source. However, there are two cases in which refinements of Relation are used for derived works. The isVersionOf refinement is used when the present resource is a new version or edition with added intellectual content that has been developed by the same creators. When the new version is nothing more than a rendition in a new format, then the isForrmatOf refinement is used.

Metadata that is migrated from the records of a collector or depositor or an earlier repository system might make use of an identifier assigned at that earlier stage in specifying relationship information. Relationships metadata using such internal identifiers in their referencing will be made more useful if they are updated to the OAI identifier at such a time as the metadata for the related resource is also migrated or uploaded to the OAI compliant system.

Examples

A link to a required font:

<dcterms:requires>oai:sil:software/ipafont</dcterms:requires>

Links to the component pieces of a collected work:

<dcterms:hasPart>oai:somearchive:holding126</dcterms:hasPart>
   <dcterms:hasPart>oai:somearchive:holding127</dcterms:hasPart>
   <dcterms:hasPart>oai:somearchive:holding128</dcterms:hasPart>
   <dcterms:hasPart>oai:somearchive:holding129</dcterms:hasPart>
   <dcterms:hasPart>oai:somearchive:holding130</dcterms:hasPart>

Link to a specification of a schema to which the resource conforms:

<dcterms:conformsTo xsi:type="dcterms:URI">http://someschemadef</dcterms:conformsTo>

To do

Do we need a scheme for olac:OAI to represent an OLAC OAI identifier? Or should we just use dcterms:URI and match on initial "oai:"?

Rights

DefinitionInformation about rights held in and over the resource.
Refinements
accessRights  

Information about who can access the resource or an indication of its security status.

license  

A legal document giving official permission to do something with the resource.

Schemes

There are no encoding schemes for this element.

Usage notes 

Typically, a Rights element will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the Rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource. [Copyright]

A rights management statement that contains a license or directs a potential user to a license statement that specifies conditions and granted permissions should use the dcterms:license refinement. The dcterms:accessRights refinement should be used to document conditions under which access to the resource is granted. In either case, a statement may be included directly within the metadata, or a URI reference to the statement or a service may be given. Examples of license statements can be found at Creative Commons [CC].

Use Rights Holder to specify a person or corporate body (organization, legally constituted entity representing a group or community), rather than Rights.

Copyright is inherent in primary linguistic and cultural research materials, such as stories, descriptions, songs or art work, which researchers collect in the course of their work. The rights, in most cases, belong to the creator—not the researcher or his organization or sponsor—or, in some cases, to the language community (if it may be constituted as a legal entity). Intellectual property law as it applies to personal or community cultural property is beyond the scope of these notes. However, documenting the creators of such works, the conditions under which they came into the hands of the researcher, and any known conditions on use (generally non-commercial use) are critical aspects of archiving. To publish or otherwise distribute such works should have the approval of the owner. Only with information about the origins and conditions of collecting can the holding repository or interested researchers make use of such resources in ways that respect the rights and sensibilities of those whose languages and communities are documented.

A fundamental right of a creator is the right to be associated with his or her creation (whether verbal, textual, visual, etc.), regardless of the retention or subsequent assignment of additional rights (to display, distribute copies, modify). Upholding this right must be balanced with a person's right to privacy with regard to identifying information (including the use of photographs). Demographic data about a creator, contributor or research participant, along with identifying information, is often included in research data. Identifying information may be removed prior to deposit, or as a condition of release, or a Rights element may document a statement of permission to use a resource or parts of a resource in a particular way that may otherwise have been an infringement of privacy.

Examples

To do

Mention Creative Commons in the usage notes and give an example of a CC license statement.

Source

DefinitionThe resource from which the described resource is derived.
Refinements

There are no refinements for this element.

Schemes
dcterms:URI  

A Uniform Resource Identifier [URI].

Best practice 

Recommended best practice is as for the Relation element above.

Usage notes 

The present resource may be derived from the Source resource in whole or in part. [DCMT]

In the legal parlance of intellectual property rights, a "derivative work" is one that is based on one or more preexisting works. This includes cases in which a work may be recast, transformed, or adapted in any way, such as by translation, abridgement, dramatization, recording, transcription, or digital encoding. A work consisting of editorial revisions, annotations, elaborations, or other modifications, which, as a whole, represent an original work of authorship is also a derivative work [Copyright].

This legal definition elucidates the sense of "derived" that is intended in the Dublin Core definition of Source. Note, however, that this overlaps with Relation, since the isFormatOf and isVersionOf refinements both refer to derivative works in the technical sense.

Use Source (as opposed to one of the above two Relation refinements) in cases when the present resource has a substantially different Type or Creator or Title than the Source resource. When the difference from the original work is only a difference of Date or Format or Publisher or Identifier or edition, then use Relation (or the appropriate refinement) to encode the relationship to the original work.

Examples

Source for a digital encoding of a manuscript in a participating archive:

<dc:source>oai:somearchive:holding1023</dc:source>

Source for data extracted from a published source:

<dc:source>Kwara'ae flora vocabulary extracted from Guide to the Forests of the
   British Solomon Islands, by T. C. Whitmore. Oxford University Press, 1966.</dc:source>

Subject

DefinitionThe topic of the resource.
Refinements

There are no refinements for this element.

Schemes
olac:ISO639-3  

The ISO 639-3 set of three-letter identifiers for the identification of human languages. See Language for a complete discussion (with examples) of using this scheme.

Service providers should use the code attribute to support searches by language, and may use the element content in searches by keyword. They also may supply the default language name in keyword searching when the element content is missing.

olac:linguistic-field  

The OLAC vocabulary for describing the content of a resource as relevant to a particular subfield of linguistic science [OLAC-Linguistic-Field].

Best practice 

Recommended best practice is that a metadata record should contain at least one Description (or one of its refinements) or Coverage (or one of its refinements) or Subject element. While these are different in purpose and content, they nevertheless give the prospective user some idea of the content of the resource that goes beyond the informative potential of just a title alone. Using all of these elements is encouraged.

When the subject is a human language, recommended best practice is to use the olac:ISO639-3 scheme to identify it precisely. Specify the language in the olac:code attribute while leaving the element content empty. Use the element content only when the controlled vocabulary does not offer an appropriate code, or when further specification is needed, such as to name a specific dialect or to give an alternate name that differs from the reference name given by the controlled vocabulary.

When the subject matter falls within the field of linguistics, recommended best practice is to use the olac:linguistic-field scheme to identify the subfield.

Usage notes 

Typically, a Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. [DCMT]

To describe the spatial or temporal topic of the resource, use the Coverage element.

The Subject element should not be confused with Description. Subject contains subject headings, while Description contains free-text descriptions.

Avoid mapping the DC Subject element to a variety of cataloger-defined natural vocabularies such as 'topic', 'category', and 'keyword.'

Examples

A Library of Congress subject heading:

<dc:subject xsi:type="dcterms:LCSH">African languages</dc:subject>

A resource about a language for which the controlled vocabulary does not yet provide a code:

<dc:subject xsi:type="olac:language">Medieval Greek</dc:subject>

A resource on multilingual education in Uganda involving national, regional and local languages.

<dc:subject>Multilingual education</dc:subject>
   <dc:subject xsi:type="olac:ISO639-3" olac:code="swh"/>
   <dc:subject xsi:type="olac:ISO639-3" olac:code="eng"/>
   <dc:subject xsi:type="olac:ISO639-3" olac:code="xog"/>
   ...
   <dcterms:spacial xsi:type="dcterms:ISO3166">UG</dcterms:spacial>

Title

DefinitionA name given to the resource.
Refinements
alternative  

Any form of the title used as a substitute or alternative to the formal title of the resource. Comment: This qualifier can include Title abbreviations as well as translations.

Schemes

There are no encoding schemes for this element.

Best practice 

Recommended best practice is that every record must have an instance of the Title element. When the resource does not have an inherent title, the cataloger should supply a descriptive title and enclose it in square brackets.

Recommended best practice is that only one instance of the unqualified Title element, namely, for the original title. All others (e.g. translations) should be specified as the Alternative refinement.

Usage notes 

Typically, a Title will be a name by which the resource is formally known. [DCMT]

A translation of the title can be supplied in a dcterms:alternative element. Use the xml:lang attribute to identify the language of the metadata content using an identifier from [ISO639-3].

The Title can be a name given to a work to identify it if a formal title is not used, such as for a data set or collection of texts. A formal title (for catalogers, the title as found on a primary source of information), is generally expected to be in the language in which the work is written or recorded. If the title is in a language different from the language identified in a Language element, use the xml:lang attribute to specify the language in which the Title is actually given. {This guidance should then mean that the second example below should not employ the xsi:type with olac:code (or a lang attribute) in the main title, but should only use it in the alternate}

Examples

A typical title:

<dc:title>A Dictionary of the Nggela Language</dc:title>

A vernacular title with translation:

<dc:title xsi:type="olac:ISO639-3" olac:code="llu">Na tala 'uria na idulaa diana</dc:title> 
   <dcterms:alternative xml:lang="eng" >The road to good reading</dcterms:alternative>

A parallel title, as typically employed for a diglot work:

<dc:title>Xunivaun jau yohipahonni = Cuentos de nuestros antepasados</dc:title>

Type

DefinitionThe nature or genre of the resource.
Refinements

There are no refinements for this element.

Schemes
dcterms:DCMIType  

A list of types used to categorize the nature or genre of the content of the resource. [DCMI-Type].

olac:linguistic-type  

The nature or genre of the content of the resource from a linguistic standpoint [OLAC-Linguistic-Type]. For a resource that is information in or about a language, use this scheme to identify what kind of information it is from a linguistic standpoint.

olac:discourse-type  

Best practice 

Recommended best practice is that every record should contain at least one Type element that uses the dcterms:DCMIType scheme.

Recommended best practice is that every record should contain at least one Type element that uses the olac:linguistic-type scheme to identify its linguistic data type. When no particular data type is relevant, specify the not_applicable value.

Usage notes 

Type includes terms describing general categories, functions, genres, or aggregation levels for content. The DCMIType vocabulary specifies a controlled vocabulary to be used to with Type. These relate to the intended use or mode of perception by a user, e.g., whether it is read (as text), viewed (as image), heard, etc. Thus, a computer image file of a text (such as a scanned document) is type 'text', not type 'image', because the common use involves reading text. To describe the physical or digital manifestation of the resource, use the Format element. [DCMT]

Linguistic Type schema should only be used in a Type element if a resource represents one of the structural types in the range set. The element may be repeated if a resource represents more than one linguistic type. So, for example, a primary text accompanied by a lexicon of vocabulary items would be described as both a primary text and a lexicon.

Examples

The resource is a video recording, stored as a MPEG:

<dc:type xsi:type="dcterms:DCMIType">MovingImage</cd:type>
    <dc:format xsi:type="dcterms:IMT">image/mpeg</dc:format>

The resource is a video recording of a story teller relating a traditional story, recorded on Super-8 film::

<dc:type xsi:type="dcterms:DCMIType">MovingImage</cd:type>
   <dc:type xsi:type="olac:linguistic-type" olac:code="primary_text"/>
   <dc:medium>Super 8mm film</dc:medium>

To do

The following is leftover from 2002; do we still want it? For a resource that is a software tool, Type.linguistic identifies what kind of information it processes. Service providers may use this information to match data files with software tools that might be applied to them.

4. Other elements

The following subsections describe other elements that have been defined by the Dublin Core Metadata Initiative [DCMT]. They are defined within the http://purl.org/dc/terms/ namespace for which the prefix dcterms: is used in the examples.

Provenance

DefinitionA statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity and interpretation.
Refinements

There are no refinements for this element.

Schemes

There are no encoding schemes for this element.

Usage notes 

The Provenance element is a dcterms element, one of two element-level terms (the other is RightsHolder) not specified in the original core set of Dublin Core elements, but nonetheless an element and not a refinement of some other element. Its correct specification is with the dcterms namespace as shown in the example.

Provenance should be distinguished in usage from Description. Provenance metadata should be used to track and/or convey important information regarding the custody and/or management of the resource throughout its history, which has a bearing upon interpretation of the informational content of the resource, or the upon its artifactual integrity and authenticity.{A WORK IN PROGRESS}

Examples

A group of texts and recordings constituting a small collection that has undergone some reorganization by a subsequent handler since original creation:

<dc:contributor xsi:type="olac:role" olac:code="researcher">Gardner, Richard</dc:contributor>
                  <dc:contributor xsi:type="olac:role" olac:code="compiler">Anderson, Judi Lynn</dc:contributor>
                  <dc:created>1962-1976</dc:created>
                  <dc:available>2007</dc:available>
                  <dc:description>Collection of 23 texts, with original recordings of 17 of the texts. Texts were originally 
                  transcribed using a variety of othographic conventions, some with interlinear analytical transcriptions,
                  some also with Spanish free translations.</dc:description>
                  <dcterms:provenance>Collection of texts recorded by Gardner, transcribed with the assistance of 
                  a number of local residents. Body of materials were transferred to branch custody in 1980s, 
                  subsequently organized, collated and annotated by J.L. Anderson, a researcher in a closely 
                  related Chinantec language. Annotations particularly relate to changes in orthographic and tone
                  marking conventions over the years of collection and as compared to current practices.</dcterms:provenance>

Rights Holder

DefinitionA person or organization owning or managing rights over the resource.
Refinements

There are no refinements for this element.

Schemes

There are no encoding schemes for this element.

Usage notes 

Use the URI or name of the Rights Holder to indicate the entity.

Examples

References

[AACR2r]Anglo-American Cataloguing Rules, 2nd ed., 2002 revision. Ottawa and Chicago: Canadian Library Association and American Library Association, 2002-.
[CC]Creative Commons Licenses.
<http://creativecommons.org/licenses/>
[Copyright]Section 101, Definitions, Copyright Law of the United States of America and Related Laws Contained in Title 17 of the United States Code.
<http://www.loc.gov/copyright/title17/92chap1.html#101>
[DCMI-Box]DCMI Box Encoding Scheme: specification of the spatial limits of a place, and methods for encoding this in a text string.
<http://dublincore.org/documents/dcmi-box/>
[DCMI-Period]DCMI Period Encoding Scheme: specification of the limits of a time interval, and methods for encoding this in a text string.
<http://dublincore.org/documents/dcmi-period/>
[DCMI-Point]DCMI Point Encoding Scheme: a point location in space, and methods for encoding this in a text string.
<http://dublincore.org/documents/dcmi-point/>
[DCMI-Type]DCMI Type Vocabulary.
<http://dublincore.org/documents/dcmi-type-vocabulary/>
[DCMT]Dublin Core Metadata Terms.
<http://dublincore.org/documents/dcmi-terms/>
[IMT]MIME Media Types.
<http://www.iana.org/assignments/media-types/>
[ISO3166]ISO 3166: Codes for the representation of names of countries.
<http://www.iso.org/iso/country_codes/iso_3166_code_lists.htm>
[ISO639-3]ISO 639: Codes for the representation of names of languages - Part 3: Alpha-3 code for comprehensive coverage of languages.
<http://www.sil.org/iso639-3/>
[OLAC-BP]Best Practice Recommendations for Language Resource Description.
<http://www.language-archives.org/REC/bpr.html>
[OLAC-Extensions]Recommended metadata extensions.
<http://www.language-archives.org/REC/olac-extensions.html>
[OLAC-Language]OLAC Language Vocabulary.
<http://www.language-archives.org/REC/language.html>
[OLAC-Linguistic-Field]OLAC Linguistic Subject Vocabulary.
<http://www.language-archives.org/REC/field.html>
[OLAC-Linguistic-Type]OLAC Linguistic Data Type Vocabulary.
<http://www.language-archives.org/REC/type.html>
[OLAC-Metadata]OLAC Metadata.
<http://www.language-archives.org/OLAC/metadata.html>
[OLAC-Role]OLAC Role Vocabulary.
<http://www.language-archives.org/REC/role.html>
[TGN]Getty Thesaurus of Geographic Names.
<http://www.getty.edu/research/tools/vocabulary/tgn/index.html>
[URI]Uniform Resource Identifiers (URI): Generic Syntax.
<http://www.ietf.org/rfc/rfc2396.txt>
[W3CDTF]Date and Time Formats, W3C Note.
<http://www.w3.org/TR/NOTE-datetime>