OLAC Metadata Usage Guidelines

Date issued:2008-07-11
Status of document:Informational Note. This document provides background information related to an OLAC standard, recommendation, or service.
This version:http://www.language-archives.org/NOTE/usage-20080711.html
Latest version:http://www.language-archives.org/NOTE/usage.html
Previous version:http://www.language-archives.org/NOTE/usage-20080531.html
Abstract:

This document provides guidelines on the meaning and proper usage of the metadata elements used in the metadata standard of the Open Language Archives Community.

Editors: Gary Simons, SIL International and Graduate Institute of Applied Linguistics (mailto:gary_simons@sil.org)
Steven Bird, University of Melbourne and University of Pennsylvania (mailto:sb@ldc.upenn.edu)
Joan Spanne, SIL International (mailto:joan_spanne@sil.org)
Changes since previous version:

Minor improvements resulting from review by OLAC Council prior to promotion to Candidate status: in best practice recommendations involving use of an encoding scheme the wording is expanded to say "use a value from", better example with Subject for a language lacking a language code, and discussion of xml:lang under Title brought into line with best practice recommendation in section 2.

Copyright © 2008 Gary Simons (SIL International and Graduate Institute of Applied Linguistics), Steven Bird (University of Melbourne and University of Pennsylvania), and Joan Spanne (SIL International). This material may be distributed and repurposed subject to the terms and conditions set forth in the Creative Commons Attribution-ShareAlike 2.5 License.

Table of contents

  1. Introduction
  2. All elements
  3. Core elements
  4. Other elements
  5. Granularity of resources
References

1. Introduction

This document provides guidelines on how to describe language resources using the OLAC metadata standard [OLAC-Metadata]. The standard itself documents the formal syntax of valid metadata records, but does not explain or exemplify the use of the individual metadata elements. That is the purpose of this document.

Section 2 below describes the usage of the attributes that may be used on metadata elements. Section 3 then describes usage in the OLAC context for the fifteen core elements of the Dublin Core metadata set. Section 4 describes other elements that may be used. Section 5 concludes with remarks about the granularity of resources; that is, it addresses the question of what size of thing (from file up to entire corpus) is treated as an item in an OLAC repository.

2. All elements

OLAC metadata is an application of the Dublin Core metadata element set as defined in [DCMT]. The most fundamental requirement on best practice in OLAC is that each element should be used in a way that conforms to its definition in [DCMT]. This document therefore repeats those definitions and offers usage notes and examples to help the reader understand best use of the elements in the context of describing language resources.

Eight of the elements in the Dublin Core metadata set define refinements that narrow the meaning of the element. A refinement shares the meaning of the generic element, but with a more restricted scope. When the more restricted meaning applies, it is recommended best practice to use the refined version of an element rather than the generic one. This document lists the refinements that are possible for each element.

In theory, every element in the metadata set may use the xsi:type attribute to specify an encoding scheme that defines a controlled vocabulary or a controlled syntax for its values. This document lists possible encoding schemes for eleven of the elements, whether defined by [DCMT] or [OLAC-Extensions]. In the case of the OLAC extensions, the controlled value goes in the olac:code attribute (with the element content being available for an optional freeform elaboration of the coded value), while in the case of DCMI encoding schemes the controlled value goes in the element content. When one of these encoding schemes is applicable, it is recommended best practice to use it since this adds precision to the value of the element that can be exploited to improve accuracy in searching and to provide domain-specific services. Subcommunities may follow the extension mechanism defined in section 5 of [OLAC-Metadata] to define new encoding schemes.

Every element in the metadata set may use the xml:lang attribute. It specifies the language in which the text within the content of the metadata element is written. Though the [XML-Lang] specification permits a wider range of values for this attribute, it is recommended for purposes of interoperation in the OLAC context that only identifiers that conform to the [OLAC-Language] specification be used. The OLAC harvester normalizes values of xml:lang in the way described in [OLAC-Language]. In the absence of an xml:lang attribute, the element content is assumed to be in English (unless it is a coded value from an encoding scheme). Whenever the language of the element content is other than English, recommended best practice is to use the xml:lang attribute to identify the language. By using multiple instances of a metadata elements tagged for different languages, data providers may offer their metadata records in multiple languages. The use of the xml:lang attribute as the language of the metadata should not be confused with the use of the olac:language scheme with Language and Subject to specify, respectively, the content language of the resource itself and the subject of the resource. In fact, the xml:lang attribute may be used within these elements, as well, to specify an alternative name of the identified language, given in the language specified in xml:lang (see Language for an example).

In the formal definition of the metadata standard [OLAC-Metadata], every element and refinement is optional and repeatable. However, recommended best practice (as described below for particular elements and in [OLAC-BP]) requires that certain elements be present in order to give a basic breadth of description for a language resource. Recommended best practice also requires that one particular element (namely, Title) not be repeated. Since all other elements and refinements are repeatable, multiple values for an element or refinement should be given in multiple instances of the XML element rather than being listed in a single instance. This makes it possible for service providers to build search indexes on the individual values.

To summarize, the following are recommended best practices that apply to all elements:

Best practice 

Recommended best practice is for the value of each metadata element to conform to the definition of that element as given in [DCMT].

When the meaning of a particular element in a metadata record fits the definition of a refinement, recommended best practice is to use the refinement rather than the generic element.

When applicable, recommended best practice is to use the xsi:type attribute to specify an encoding scheme so as to express the value of the element with precision.

Whenever the language of the element content is other than English, recommended best practice is to use the xml:lang attribute with a value from [OLAC-Language] to identify the language.

When a resource has more than one value for a particular metadata element or refinement, recommended best practice is to use a separate instance of the element or refinement for each value rather than listing all the values in a single instance.

3. Core elements

Each of the fifteen core elements of the Dublin Core metadata set is described in one of the following subsections. These elements are defined in the http://purl.org/dc/elements/1.1/ namespace for which the prefix dc: is used in the examples.

Each subsection heading gives the name of an element; the element is then described under six subheadings. Definition is the definition for the element as given by the Dublin Core Metadata Initiative [DCMT]. Refinements lists the refined element names that may be used in place of the generic element name to specify a more precise meaning for the metadata item. The refinement definitions are copied from [DCMT]. All of the refinements are defined within the http://purl.org/dc/terms/ namespace for which the prefix dcterms: is used in the examples. Schemes lists the encoding schemes that may be used with the element (or its refinements) as the value of the xsi:type attribute. Best practice lists the best practices for the use of the element that are recommended by the OLAC community [OLAC-BP]. Usage notes offers additional notes for the OLAC community on how to use the element in the context of describing language resources. Examples shows samples of properly encoded elements and sample refinements.

Contributor

DefinitionAn entity responsible for making contributions to the resource.
Refinements

There are no refinements for this element.

Schemes
olac:role  

The olac:role scheme is optionally used to specify the role (such as author, transcriber, sponsor, and so on) played by the named entity in the creation of the resource. The role is expressed by means of a controlled vocabulary; see [OLAC-Role] for the definition of the vocabulary.

Best practice 

Recommended best practice is to identify a Contributor by means of a name in a form that is ready for sorting within an alphabetical index.

Recommended best practice is to use a value from the olac:role scheme to indicate the role of the Contributor.

Usage notes 

A Contributor may be a person, an organization, or a service—any entity that has sufficient involvement in the creation or development of the resource to warrant explicit identification. A term from olac:role should be used to specify the role of the Contributor. Contributor is related to Creator since many of the terms in the role vocabulary (like author or photographer) name the role of the primary creator of a resource. The OLAC recommendation is that Contributor (with the appropriate role designation) be used in these cases, and that Creator be used only when there is not an appropriate term in the olac:role vocabulary for the role played by the entity that created the resource. A participating OLAC repository need not be concerned about the fact that other metadata services based on Dublin Core may prefer and even require the use of the Creator element. The OLAC-to-Simple-DC crosswalk built into the OLAC Aggregator automatically maps certain OLAC Contributor roles like author and photographer to the Creator element. See the discussion of Creator for more on its use.

Contributor and Publisher are also somewhat related elements. In particular, the olac:role “sponsor” can come close to one aspect of participation that a publisher may have in regard to making a given resource available to the public. The olac:role “editor” can encompass a spectrum of tasks that are collectively performed by the publisher of a work. A sponsor or editor that is not also the formal publisher of a resource may be entered in a Contributor element (with the proper olac:role specified), but the publisher should not be repeated in a Contributor element.

The form of a personal name should follow the form in which the person's name would normally be listed in an alphabetical index in his or her own language or country of residence, e.g., “FamilyName, GivenName” for an English speaker. If the first element is the family name, follow the family name with a comma. If the first element is the given name, do not follow it with a comma. Consult the Anglo-American Cataloguing Rules, 2nd ed., rev., [AACR2r] chapter 22 for additional guidance in determining the normal ordering and forms of names based on language and country, as well as other name characteristics.

The form of a corporate name (e.g., name of an institution or organization or any other corporate body) should also be ready for sorting in an alphabetical index. This means that an initial article should be omitted. In cases where a hierarchy present, list the parts of the hierarchy from largest to smallest, separated by period (full stop) and space. Location should generally be omitted from the name unless location is an integral part of the proper name of the unit. Do not use quotation marks to enclose corporate names. If the name is a translation from its usual form or is not usually given in English, use the xml:lang attribute on the element. More guidance on the forms of corporate names may be found in [AACR2r], chapter 24.

In general, if there is no suitable role in the olac:role scheme, no additional information regarding the particular role of the contributor should be included in content of the element, as this would have the effect of making separate indexed forms for the same person based on engaging in different kinds of contributory activities.

Examples

A personal author:

<dc:contributor xsi:type="olac:role" olac:code="author">Bloomfield, 
   Leonard</dc:contributor>

An editor:

<dc:contributor xsi:type="olac:role" olac:code="editor">Sapir, 
   Edward</dc:contributor>

A funding agency with a hierarchical corporate name:

<dc:contributor xsi:type="olac:role" olac:code="sponsor">Smithsonian 
   Institution. Office of Fellowships and Grants.</dc:contributor>

The person who performed a role for which there is not a suitable code:

<dc:contributor>Smith, John L.</dc:contributor>

An institutional author whose name is given in the Czech language:

<dc:contributor xsi:type="olac:role" olac:code="author" 
   xml:lang="ces">Česká akademie věd a umění</dc:contributor>

Coverage

DefinitionThe spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.
Refinements
dcterms:spatial  

Spatial characteristics of the intellectual content of the resource.

dcterms:temporal  

Temporal characteristics of the intellectual content of the resource.

Schemes
dcterms:Box  

Identifies a region of space using its geographic limits [DCMI-Box].

dcterms:ISO3166  

Codes for the representation of names of countries [ISO3166].

dcterms:Period  

Specifies the limits of a time interval [DCMI-Period].

dcterms:Point  

Identifies a point in space using its geographic coordinates [DCMI-Point].

dcterms:TGN  

The Getty Thesaurus of Geographic Names [TGN].

Best practice 

Recommended best practice is that a metadata record should contain at least one Coverage (or one of its refinements) or Description (or one of its refinements) or Subject element in order to give the prospective user some idea of the content of the resource that goes beyond the informative potential of just a title alone. Using all of these elements is encouraged.

In the case of spatial coverage, recommended best practice is to use a value from an encoding scheme to give precise geocoding of the resource.

Usage notes 

Coverage will typically include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). [DCMT]

Coverage is related to Subject in having a topical orientation. Coverage in its temporal aspect should not be confused with Date or one of its refinements (e.g., created, or available). Thus, a data set that was compiled over an extended period of time should be described via a dcterms:created element whose content is a date range, rather than with Coverage or dcterms:temporal. A data set (or any kind of resource) that contains or addresses historical information would correctly use Coverage or dcterms:temporal to specify the date, date range or period in focus.

Coverage should be used geographically when the language variety in focus in the resource has a different or narrower distribution than the language has in general. It should also be used when the resource deals with a topic of study in which the location or region itself is in focus, e.g., multilingualism, language policy, languages in contact, in a given locale.

Geospatial coordinates are appropriate Coverage metadata for a text recording when they identify the location where the exemplified speech variety is spoken. Typically this coincides with the location where the text was collected, but not always. If the text was collected outside a community where it is spoken (such as in a laboratory setting), then information about the circumstances of collection would be included in the Description element.

Examples

A resource about English in India:

<dc:subject xsi:type="olac:language" olac:code="en"/>
<dcterms:spatial xsi:type="dcterms:ISO3166">IN</dcterms:spatial>

A resource about languages spoken on Guadalcanal:

<dcterms:spatial xsi:type="dcterms:TGN">Guadalcanal (island)</dcterms:spatial>

A resource involving a diaspora group from a language community:

<dc:title>“Como Se Hayan”: Zapotec Indigenous Migrant
   Expressions of Belonging</dc:title>
<dc:coverage>Los Angeles, California</dc:coverage>
<dc:subject xsi:type="olac:language" olac:code="zpu"/>

A resource about language use in the 19th century:

<dcterms:temporal>19th century</dcterms:temporal>

A dataset covering a period of time, relating to a particular community, collected over a span of time:

<dc:title>Agta Demographic Database: chronicle of a hunter-gatherer 
   community in transition</dc:title>
<dcterms:temporal xsi:type="dcterms:Period">start=1900; end=2006;
   </dcterms:temporal>
<dcterms:spatial xsi:type="dcterms:TGN">San Ildefonso, Cape (cape)
   </dcterms:spatial>
<dcterms:created xsi:type=dcterms:Period>start=1962; end=2006;
   </dcterms:created>

Creator

DefinitionAn entity primarily responsible for making the resource.
Refinements

There are no refinements for this element.

Schemes

There are no encoding schemes for this element.

Best practice 

Recommended best practice is to use the Contributor element instead of Creator, except in cases where there is significant creative involvement by the person or organization and there is no suitable refinement term from the olac:role scheme to use with Contributor.

Usage notes 

A Creator may be a person, an organization, or a service. It refers to an entity with a primary role in the creation of the intellectual or artistic content of the resource [DC-Lib]. Creator is closely related to Contributor. The OLAC recommendation is that Creator be viewed semantically as a refinement of Contributor, though it retains its status (and therefore its syntax) as an element for backward compatibility. (This is in line with the latest recommendations of the DCMI; see [DCMT] and [DC-Lib].) Thus, Creator is appropriate to use in cases where the entity to be identified has significant involvement as a primary originator of the content, but in such a way that none of the role terms available in the olac:role scheme is a suitable description of that involvement. Examples are artist (e.g., sculptor, carver, weaver) or modeler.

A participating OLAC repository need not be concerned about the fact that other metadata services based on Dublin Core may prefer and even require the use of the Creator element. The OLAC-to-Simple-DC crosswalk built into the OLAC Aggregator automatically maps certain OLAC Contributor roles like author and photographer to the Creator element.

See the Contributor element for information about the proper form for a personal name or a corporate name.

Examples

A weaver of a basket or textile resource that accompanies a description of its making:

<dc:creator>Datsolalee</dc:creator>

See Contributor for examples of author and other creative roles.

Date

DefinitionA point or period of time associated with an event in the lifecycle of the resource.
Refinements
dcterms:available  

Date (often a range) that the resource will become or did become available.

dcterms:created  

Date of creation of the resource.

dcterms:dateAccepted  

Date of acceptance of the resource (e.g., of thesis by university department, of article by journal, etc.).

dcterms:dateCopyrighted  

Date of a statement of copyright.

dcterms:dateSubmitted  

Date of submission of the resource (e.g., thesis, article, etc.).

dcterms:issued  

Date of formal issuance (e.g., publication) of the resource.

dcterms:modified  

Date on which the resource was changed.

dcterms:valid  

Date (often a range) of validity of a resource.

Schemes
dcterms:W3CDTF  

W3C Encoding rules for dates and times - a profile based on ISO 8601 [W3CDTF].

dcterms:Period  

A specification of the limits of a time interval [DCMI-Period]

Best practice 

Recommended best practice is that a record have at least one instance of Date (or one of its refinements).

Recommended best practice is that every instance of Date (or one of its refinements) use a value matching the dcterms:W3CDTF scheme, or enclose the element value in square brackets if the value does not conform to the encoding scheme (e.g., if it is supplied by the cataloger, is approximate, or is in some doubt).

Usage notes 

A WSCDTF-conformant value may be for the year alone, for the year and month, or for an exact date. A range of years is also a valid value; in this case both years must be four-digit values and the earliest year should come first. Thus, the element value may match one of the four following patterns: YYYY, YYYY-MM, YYYY-MM-DD, or YYYY-YYYY.

A Date with no refinement will be assumed to be the date of issue (i.e. publication). Date of publication may differ from its creation date; thus it can be helpful to prospective users to include more than one kind of Date element (refinement). However, a single resource should not have more than one instance of each date refinement, with the possible exception of dcterms:modified. Where the modification history of a resource is significant, it is appropriate to enter this information in a Provenance element, which allows for explanation of modifications in a way that dcterms:modified does not.

‘Available’ as a refinement pertains to Date, not to other aspects of availability.

Do not use additional terms such as “recorded on” or “donated on” in the element text.

Examples

A typical year of publication:

<dc:date xsi:type="dcterms:W3CDTF">1992</dc:date>

A resource modified on October 16, 1996:

<dcterms:modified xsi:type="dcterms:W3CDTF">1996-10-16</dcterms:modified>

A resource from approximately 1950:

<dc:date>[circa 1950]</dc:date>

A resource for which the date has been supplied by the cataloger from external informaiton:

<dcterms:issued>[1932]</dcterms:issued>

Description

DefinitionAn account of the resource.
Refinements
dcterms:abstract  

A summary of the content of the resource.

dcterms:tableOfContents  

A list of subunits of the content of the resource.

Schemes

There are no encoding schemes for this element.

Best practice 

Recommended best practice is that a metadata record should contain at least one Description (or one of its refinements) or Coverage (or one of its refinements) or Subject element in order to give the prospective user some idea of the content of the resource that goes beyond the informative potential of just a title alone. Using all of these elements is encouraged.

Usage notes 

Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content, or a free-text summary account of the content. [DCMT]

Description may also offer an annotation, or a qualitative or evaluative comment about the resource, such as a statement about suitability for a particular application or context.

Description element is not a “catch-all” category. The Description element should not be confused with Subject. Description contains free-text (prose) statements, while Subject contains subject headings, faceted terms, or specific subject descriptors, e.g., Library of Congress Subject Headings, or Linguistics and Language Behavior Abstracts thesaurus descriptors. Description should likewise not be used in place of Extent to contain information solely about the length, duration, or size of a work.

No formatting conventions are defined within the text of Description. Service providers may format the entire Description as a single paragraph, collapsing adjacent white space characters into a single space.

When there is a URL for a document that describes the resource, use a separate Description element to encode just that URL. A Description that begins with “http:” will be interpreted by service providers as consisting solely of a URL and will be presented as a link in user interfaces. Service providers are not obliged to search other Description text for the occurrence of URLs.

Examples

A prose description of a resource:

<dc:description>The CALLHOME Japanese corpus of telephone speech consists of 
   120 unscripted telephone conversations between native speakers of Japanese. 
   All calls, which lasted up to 30 minutes, originated in North America and 
   were placed to locations overseas (typically Japan). Most participants called 
   family members or close friends. This corpus contains speech data files 
   ONLY, along with the minimal amount of documentation needed to describe the 
   contents and format of the speech files and the software packages needed 
   to uncompress the speech data.</dc:description>

A contents note for a resource:

<dcterms:tableOfContents>v. 1: Thesaurus of Khmu dialects in Southeast Asia; 
   v. 2: Dictionary of Khmu in China; v. 3: Dictionary of Khmu in Laos; 
   v. 4: Dictionary of Khmu in Vietnam; v. 5: Dictionary of Khmu in Thailand
   </dcterms:tableOfContents>

A reference to an existing on-line description:

<dc:description>http://www.ldc.upenn.edu/Catalog/LDC96S37.html</dc:description>

Format

DefinitionThe file format, physical medium, or dimensions of the resource.
Refinements
dcterms:extent  

The size or duration of the resource.

dcterms:medium  

The material or physical carrier of the resource.

Schemes
dcterms:IMT  

The Internet media type of the resource [IMT].

Best practice 

In the case of a digital resource, recommended best practice is to express the Format using a MIME type value from the dcterms:IMT scheme.

Usage notes 

Typically, Format may include the media-type, physical material, and/or dimensions of the resource. Format may be used to determine the software, hardware or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. [DCMT]

In the case of a physical object, the refinements medium and extent are most applicable. These may respectively contain the physical material of the object (e.g., parchment; reel-to-reel tape; 35 mm slide; NTSC encoded VHS cassette), and the size/duration (e.g., number of pages and dimensions of a book; length of a recording).

Format (and its refinement 'medium'), which encompasses the notion of physical media categories such as photograph, should not be confused with Type, which categorizes the nature or genre of the content of the resource based on the mode in which the user experiences or interacts with the resource, and as such includes terms that describe general categories, functions, or aggregation levels for content.

Examples

For a digitally encoded dictionary:

<dc:format xsi:type="dcterms:IMT">text/xml</dc:format>
<dcterms:extent>5,237 entries in a 1.2M XML file.</dcterms:extent>

For a digitally recorded text:

<dc:format xsi:type="dcterms:IMT">audio/wav</dc:format>
<dc:format>Sampling: 1 channel, 22 KHz, 8 bits.</dc:format>
<dcterms:extent>Duration: 153 seconds. Size: 3.3M.</dcterms:extent>

For 16mm film with sound track

<dcterms:medium>sound, color;  16mm film</dcterms:medium>
<dcterms:extent>30 min.</dcterms:extent>

Identifier

DefinitionAn unambiguous reference to the resource within a given context.
Refinements
dcterms:bibliographicCitation  

A bibliographic reference for the resource.

Schemes
dcterms:URI  

A Uniform Resource Identifier [URI].

Best practice 

When the value of Identifier is a Uniform Resource Locator (URL), recommended best practice is to specify the dcterms:URI scheme.

Usage notes 

In the case of a resource that is not electronically encoded, but is housed in a conventional archive, Identifier may be used to give a local shelf or box number, or whatever scheme is used to locate a resource within the collection.

Identifiers that begin with “http:” will be interpreted by service providers as URLs and be presented as links in user interfaces. Complete relative link paths with the base server so that the link will be valid outside the search system of the host archives.

Identifier is of particular value in enabling other resources to unambiguously refer to the resource in focus in terms of a specific relationship. See Relation for best practice and usage notes in creating relationship links between resources.

When using the bibliographicCitation refinement, there is no prescription as to a bibliographic style to use in forming the citation. The only requirement is that sufficient bibliographic detail be included to identify the resource unambiguously.

A unique identifier that has been assigned to a digital resource by the researcher, a research organization or department, or an archives or library for internal reference may continue to be useful in identifying the resource alongside a URI assigned in the context of a digital repository. Such a unique identifier may be significant for continuity of identification for a digital resource throughout its lifecycle, as other resources may have used such an identifier in referencing the object prior to the assignment of the URI later in its lifecycle.

Do not specify the “oai:” identifier for the resource itself as a value of Identifier, since it is already given in <identifier> in the <header> of the <record>s returned by the OAI protocol.

Do not use Identifier for a URL intended to direct the searcher to a Description, availability or usage or other Rights-related information. A URI may be used in each of those elements or refinements to link to such information. Likewise, Identifier should not be used to refer to a general collection site in which the resource is one of many works available.

Examples

A Uniform Resource Locator for retrieval of an electronically encoded resource:

<dc:identifier xsi:type="dcterms:URI"
   >http://arxiv.org/abs/cs.CL/0010033</dc:identifier>

A local identifier for retrieval within a physical collection:

<dc:identifier>Series: A, Box 7</dc:identifier>

The citation for a chapter that was published in a book:

<dcterms:bibliographicCitation>In Joel Sherzer and Greg Urban (eds.), 
   Native South American discourse, 237-306. Berlin: Mouton. 
   <dcterms:bibliographicCitation>

Language

DefinitionA language of the resource.
Refinements

There are no refinements for this element.

Schemes
olac:language  

The set of two- and three-letter identifiers from ISO 639 for the identification of individual human languages. See [OLAC-Language] for the definition of the vocabulary.

Best practice 

Recommended best practice is that every record contain at least one Language element. For a resource that does not include language content, include a Language element containing the code zxx for “No linguistic content.”

Recommended best practice is to use a value from the olac:language scheme with the Language element to identify an individual language precisely.

Usage notes 

Language is used for a language the resource is in, as opposed to the language it describes (see Subject). When a resource is in more than one language, use a separate Language element for each language.

Best practice is to specify a language code in the olac:code attribute. The element content is typically left empty. When the controlled vocabulary does not offer an appropriate code, the element content may be used to identify the language. It may be used in addition to the code to supply further specificity, such as to give a note about the particular variety or to name a specific dialect. Element content may also be used to give an alternate name that differs from the reference name given by the controlled vocabulary or to give the name of the language in another language.

For a resource of a particular [OLAC-Linguistic-Type] (see Type), the uses of the Language element and the Subject element with the olac:language scheme (hereafter referred to as “Subject.language”) can be generally described as follows:

  • For linguistic type primary_text: use the Language element to specify the language of the text. In addition, if it is annotated, use a second Language element to specify the language in which the annotations are made, as well as a Subject.language element repeating the language of the text, which is (in effect) the "subject" of the annotations.

  • For linguistic type lexicon: use Language for the language in which the definitions are written; use Subject.language for the language whose words are being defined. A bilingual dictionary may have both languages identified in both kinds of elements.

  • For linguistic type language_description: use Language for the language the description is written in, and use Subject.language to specify the language being described. In addition, if the resource contains fairly extensive examples in the language being described (e.g., many sentence-length examples as might be the case in an extensive grammatical description), it is appropriate to add a Language element for the language being described in order to convey the presence of content in that language.

For a work of literature or other monolingual document aimed at the speakers of a particular language, use Language to identify that language. For a sound recording, use Language for the language being spoken in the recording.

Service providers should use the code attribute to support searches by language, and may use the element content in searches by keyword. They also may supply the default language name in keyword searching when the element content is missing.

Examples

A resource in English about the Sikaiana language:

<dc:language xsi:type="olac:language" olac:code="eng"/> 
<dc:subject xsi:type="olac:language" olac:code="sky"/>

Frönsk-islenzk ordabok ; Dictionnaire français-islandais, a French-Icelandic dictionary, described for an audience where the French names are preferred. (A system could also display the French name of the language based on just the code in the metadata, if a service that maps the codes to French names were available.)

<dc:language xsi:type="olac:language" olac:code="fra" xml:lang="fra">français</dc:language>
<dc:language xsi:type="olac:language" olac:code="isl" xml:lang="fra">islandais</dc:language>
<dc:subject xsi:type="olac:language" olac:code="fra" xml:lang="fra">français</dc:subject>
<dc:subject xsi:type="olac:language" olac:code="isl" xml:lang="fra">islandais</dc:subject>

The American Heritage Dictionary, which is both in and about American English:

<dc:language xsi:type="olac:language" olac:code="eng"/>
<dc:subject xsi:type="olac:language" olac:code="eng"/>
<dcterms:spatial xsi:type="dcterms:ISO3166">US</dcterms:spatial>

A resource in the Saracatsan dialect of modern Greek

<dc:language xsi:type="olac:language" olac:code="ell">Saracatsan dialect</dc:language>

Publisher

DefinitionAn entity responsible for making the resource available.
Refinements

There are no refinements for this element.

Schemes

There are no encoding schemes for this element.

Best practice 

Recommended best practice is to identify a Publisher by means of a name in a form that is ready for sorting within an index.

Usage notes 

Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity. [DCMT] The Publisher is generally the business, organization, or sometimes individual who takes responsibility (and provides financial and production resources) for putting the resource into a form suitable for making it publicly available, whether in multiple hard copies (printing, pressing of a disc, etc), by broadcasting (radio, television, webcast), or posting to a public website. A published resource may be offered for sale or freely, but in either case, it is released for public use.

Many materials that are placed in archival repositories are not formally published in that they have not undergone an editorial, composition and production process, involving review, modification, and packaging by persons other than the original creator(s). While such items may be offered publicly, they are not considered “published” in a formal sense and their metadata should not ordinarily specify a publisher.

For an entity that has funded the creation or development of the resource in some way, but has not otherwise participated in its development through editing, composition, production, printing, marketing, posting, etc., use the Contributor element specifying the [OLAC-Role] as sponsor.

The name of the publisher should be given in a form that is ready for sorting within an index. See the usage notes above on Contributor for a discussion of the form that should be used for personal names and for corporate names.

Examples

A typical publisher:

<dc:publisher>Oxford University Press</dc:publisher>

The URL for a publisher:

<dc:publisher xsi:type="dcterms:URI">http://www.oup.com</dc:publisher>

Relation

DefinitionA related resource.
Refinements
dcterms:conformsTo  

An established standard to which the described resource conforms.

dcterms:hasFormat  

A related resource that is substantially the same as the pre-existing described resource, but in another format.

dcterms:hasPart  

A related resource that is included either physically or logically in the described resource.

dcterms:hasVersion  

A related resource that is a version, edition, or adaptation of the described resource.

dcterms:isFormatOf  

A related resource that is substantially the same as the described resource, but in another format.

dcterms:isPartOf  

A related resource in which the described resource is physically or logically included.

dcterms:isReferencedBy  

A related resource that references, cites, or otherwise points to the described resource.

dcterms:isReplacedBy  

A related resource that supplants, displaces, or supersedes the described resource.

dcterms:isRequiredBy  

A related resource that requires the described resource to support its function, delivery, or coherence.

dcterms:isVersionOf  

A related resource of which the described resource is a version, edition, or adaptation. Changes in version imply substantive changes in content rather than differences in format.

dcterms:references  

A related resource that is referenced, cited, or otherwise pointed to by the described resource.

dcterms:replaces  

A related resource that is supplanted, displaced, or superseded by the described resource.

dcterms:requires  

A related resource that is required by the described resource to support its function, delivery, or coherence.

Schemes
dcterms:URI  

A Uniform Resource Identifier [URI].

Best practice 

When the related resource is also held in a participating archive, recommended best practice is to identify the related resource by means of its OAI identifier. A Relation that begins with “oai:” will typically be presented by service providers as an active link that retrieves the metadata for that resource.

If the related resource is not cataloged in the system of a participating archive, recommended best practice is to identify the related resource through a standard unique identifier. If the related resource is available online, specify the dcterms:URI scheme and give a stable Uniform Resource Locator (URL).

Usage notes 

This element is used to document relationships between resources, for instance, part-whole relationships, version relationships, dependency relationships, and so on

When the present resource is a derived work, the source resource is typically indicated by Source. However, there are two cases in which refinements of Relation are used for derived works. The isVersionOf refinement is used when the present resource is a new version or edition with added intellectual content that has been developed by the same creators. When the new version is nothing more than a rendition in a new format, then the isForrmatOf refinement is used.

Metadata that is migrated from the records of a collector or depositor or an earlier repository system might make use of an identifier assigned at that earlier stage in specifying relationship information. Relationships metadata using such internal identifiers in their referencing will be made more useful if they are updated to the OAI identifier at such a time as the metadata for the related resource is also migrated or uploaded to the OAI compliant system.

Examples

A link to a required font:

<dcterms:requires>oai:sil:software/ipafont</dcterms:requires>

Links to the component pieces of a collected work:

<dcterms:hasPart>oai:somearchive:holding126</dcterms:hasPart>
<dcterms:hasPart>oai:somearchive:holding127</dcterms:hasPart>
<dcterms:hasPart>oai:somearchive:holding128</dcterms:hasPart>
<dcterms:hasPart>oai:somearchive:holding129</dcterms:hasPart>
<dcterms:hasPart>oai:somearchive:holding130</dcterms:hasPart>

An XML document that conforms to the TEI Lite DTD:

<dc:format xsi:type="dcterms:IMT">text/xml</dc:format>
<dcterms:conformsTo xsi:type="dcterms:URI">
  http://www.tei-c.org/Guidelines/Customization/Lite/DTD/teixlite.dtd
  </dcterms:conformsTo>

Rights

DefinitionInformation about rights held in and over the resource.
Refinements
dcterms:accessRights  

Information about who can access the resource or an indication of its security status.

dcterms:license  

A legal document giving official permission to do something with the resource.

Schemes

There are no encoding schemes for this element.

Usage notes 

Typically, a Rights element will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the Rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource.

A rights management statement that contains a license or directs a potential user to a license statement that specifies conditions and granted permissions should use the dcterms:license refinement. The dcterms:accessRights refinement should be used to document conditions under which access to the resource is granted. In either case, a statement may be included directly within the metadata, or a URI reference to the statement or a service may be given. Examples of license statements can be found at Creative Commons [CC].

Use Rights Holder (rather than Rights) to specify a person or corporate body (organization, legally constituted entity representing a group or community) that holds the rights to the resource.

Copyright is inherent in primary linguistic and cultural research materials, such as stories, descriptions, songs or art work, which researchers collect in the course of their work. The rights, in most cases, belong to the creator—not the researcher or sponsor—or, in some cases, to the language community (particularly, if it is constituted as a legal entity). Intellectual property law as it applies to personal or community cultural property is beyond the scope of these notes. However, documenting the creators of such works, the conditions under which they came into the hands of the researcher, and any known conditions on use (generally non-commercial use) are critical aspects of archiving. To publish or otherwise distribute such works should have the approval of the owner. Only with information about the origins and conditions of collecting can the holding repository or interested researchers make use of such resources in ways that respect the rights and sensibilities of those whose languages and communities are documented.

A fundamental right of a creator is the right to be associated with his or her creation (whether verbal, textual, visual, etc.), regardless of the retention or subsequent assignment of additional rights (to display, distribute copies, modify). Upholding this right must be balanced with a person's right to privacy with regard to identifying information (including the use of photographs). Demographic data about a creator, contributor or research participant, along with identifying information, is often included in research data. Identifying information may be removed prior to deposit, or as a condition of release, or a Rights element may document a statement of permission to use a resource or parts of a resource in a particular way that may otherwise have been an infringement of privacy.

Examples

A description of the terms of use:

<dc:rights>The Surrey Syncretisms Database can be accessed without 
   cost. Users agree not to pass on the Database to third parties and  
   to properly acknowledge the Surrey Syncretism Database as the source 
   of information in publications or manuscripts that make use of its 
   data.</dc:rights>

Instructions on how to get access:

<dcterms:accessRights>Apply in writing for permission to use this 
   resource.</dcterms:accessRights>

A resource made available under the Creative Commons Attribution-ShareAlike license:

<dcterms:license>http://creativecommons.org/licenses/by-sa/3.0/
   </dcterms:license>

Source

DefinitionThe resource from which the described resource is derived.
Refinements

There are no refinements for this element.

Schemes
dcterms:URI  

A Uniform Resource Identifier [URI].

Best practice 

Recommended best practice is as for the Relation element above.

Usage notes 

The present resource may be derived from the Source resource in whole or in part. [DCMT]

In the legal parlance of intellectual property rights, a “derivative work” is one that is based on one or more preexisting works. This includes cases in which a work may be recast, transformed, or adapted in any way, such as by translation, abridgement, dramatization, recording, transcription, or digital encoding. A work consisting of editorial revisions, annotations, elaborations, or other modifications, which, as a whole, represent an original work of authorship is also a derivative work [Copyright].

This legal definition elucidates the sense of “derived” that is intended in the Dublin Core definition of Source. Note, however, that this overlaps with Relation, since the isFormatOf and isVersionOf refinements both refer to derivative works in the technical sense.

Use Source (as opposed to one of the above two Relation refinements) in cases when the present resource has a substantially different Type or Creator or Title than the Source resource. When the difference from the original work is only a difference of Date or Format or Publisher or Identifier or edition, then use Relation (or the appropriate refinement) to encode the relationship to the original work.

Examples

Source for a digital encoding of a manuscript in a participating archive:

<dc:source>oai:somearchive:holding1023</dc:source>

Source for data extracted from a published source:

<dc:source>Kwara'ae flora vocabulary extracted from Guide to the Forests 
   of the British Solomon Islands, by T. C. Whitmore. Oxford University 
   Press, 1966.</dc:source>

Subject

DefinitionThe topic of the resource.
Refinements

There are no refinements for this element.

Schemes
dcterms:LCSH  

The set of labeled concepts specified by the Library of Congress Subject Headings.

olac:language  

The set of two- and three-letter identifiers from ISO 639 for the identification of individual human languages. See [OLAC-Language] for the definition of the vocabulary and see the Language element for a complete discussion (with examples) of using this scheme.

olac:linguistic-field  

The OLAC vocabulary for describing the content of a resource as relevant to a particular subfield of linguistic science [OLAC-Linguistic-Field].

Best practice 

Recommended best practice is that a metadata record should contain at least one Subject or Coverage (or one of its refinements) or Description element (or one of its refinements) in order to give the prospective user some idea of the content of the resource that goes beyond the informative potential of just a title alone. Using all of these elements is encouraged.

When the subject is a human language, recommended best practice is to use a value from the olac:language scheme with the Subject element to identify an individual language precisely.

When the subject matter falls within the field of linguistics, recommended best practice is to use a value from the olac:linguistic-field scheme to identify the subfield.

Usage notes 

Typically, a Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. [DCMT]

The usage notes for Language are also relevant to Subject when a human language is a subject of the resource. In particular, those notes discuss how element content may be used in conjunction with the olac:language scheme and how the olac:language scheme is used with Subject for resources of different linguistic types. It is also appropriate to use the olac:language scheme with Subject for works that are not of any particular olac:linguistic-type, but that do treat a language (or a community of its users) as a topic of study.

To describe the spatial or temporal topic of the resource, use the Coverage element.

The Subject element should not be confused with Description. Subject contains subject headings or concise individual topic terms, while Description contains free-text, prose descriptions.

Examples

A Library of Congress subject heading:

<dc:subject xsi:type="dcterms:LCSH">African languages</dc:subject>

A resource about a language for which the controlled vocabulary does not yet provide a code:

<dc:subject xsi:type="olac:language">Wulguru</dc:subject>

A resource on multilingual education in Uganda involving national, regional and local languages.

<dc:subject>Multilingual education</dc:subject>
<dc:subject xsi:type="olac:language" olac:code="swh"/>
<dc:subject xsi:type="olac:language" olac:code="eng"/>
<dc:subject xsi:type="olac:language" olac:code="xog"/>
...
<dcterms:spatial xsi:type="dcterms:ISO3166">UG</dcterms:spatial>

See also the examples for Language.

Title

DefinitionA name given to the resource.
Refinements
dcterms:alternative  

Any form of the title used as a substitute or alternative to the formal title of the resource. Comment: This qualifier can include an abbreviated or shortened title, a "cover title" that differs from the formal title, or a translated title.

Schemes

There are no encoding schemes for this element.

Best practice 

Recommended best practice is that every record must have an instance of the Title element. When the resource does not have a formal title, the cataloger should supply a descriptive title and enclose it in square brackets.

Recommended best practice is that there be only one instance of the unqualified Title element, namely, for the original title (except in the case of parallel titles on a diglot work). All other titles (e.g., translations) should be specified as the dcterms:alternative refinement.

Usage notes 

Typically, a Title will be a name by which the resource is formally known. [DCMT]

Normally the title of a resource is given to the resource by its creator, be that an author, artist, photographer, or compiler (as the creator of a collection). For a resource that has no formal title given by its creator, the cataloger or archivist should give the work some brief descriptive label as a title. Such a supplied title is typically enclosed in square brackets.

A related use of square brackets is for supplying additional information where, in the opinion of the cataloger, the title itself needs additional clarifying information, or where the formal title contains a typographical or factual error. In such a case, the formal title is transcribed as given, with additional or corrective information supplied in square brackets.

The section above on All elements says that, "Whenever the language of the element content is other than English, recommended best practice is to use the xml:lang attribute to identify the language." This applies to the Title element as well, even when the title is in the same language as that identified in a lone Language element.

A translation of the title is supplied in a dcterms:alternative element. Use the xml:lang attribute to identify the language of the translated title using an identifier from [OLAC-Language].

The only approved use of repeated instances of the Title element is in the case of a polyglot work, where the text is given in two or more languages in some kind of parallel format (whether facing columns or pages, or roughly equal, separate sections) and the title on the title page (or equivalent source of information on/in the resource itself) is given in each of the text languages. This is what librarians refer to as “parallel titles.” In this case, it is permissible to repeat the Title element. It will be expected that there are also multiple Language elements, and that each of the Title elements carries an xml:lang attribute that specifies a language that is also given in one of the Language elements.

Examples

A typical title:

<dc:title>A grammar of the Nggela language</dc:title>

A vernacular title with a translated title:

<dc:title xml:lang="llu">Na tala 'uria na idulaa diana</dc:title> 
<dcterms:alternative xml:lang="eng">The road to good reading</dcterms:alternative>

A work with parallel titles, as typically employed for a diglot work:

<dc:title xml:lang="amc">Xunivaun jau yohipahonni</dc:title>
<dc:title xml:lang="spa">Cuentos de nuestros antepasados</dc:title>
<dc:language xsi:type="olac:language" olac:code="amc"/>
<dc:language xsi:type="olac:language" olac:code="spa"/>

A work (a photograph) without a formal title, for which the cataloger has supplied a title:

<dc:title>[J.P. Harrington with three Tule Indians, making 
   dictaphone records of language and songs of the Cuna,
   in the Smithsonian Institution]</dc:title>

A title of a work to which the cataloger added some information to clarify the nature of the resource (originally cataloged in a system prior to the addition of linguistic-type):

<dc:title>Subanon [language texts]</dc:title>

Type

DefinitionThe nature or genre of the resource.
Refinements

There are no refinements for this element.

Schemes
dcterms:DCMIType  

A list of types used to categorize the nature or genre of the content of the resource. [DCMI-Type].

olac:linguistic-type  

The nature or genre of the content of the resource from a linguistic standpoint [OLAC-Linguistic-Type]. For a resource that is information in or about a language, use this scheme to identify what kind of information it is from a linguistic standpoint.

olac:discourse-type  

The genre of the content of the resource as representing a particular type of discourse [OLAC-Discourse].

Best practice 

Recommended best practice is that every record should contain at least one Type element that uses a value from the dcterms:DCMIType scheme to identify the nature or genre of the content of the resource.

Recommended best practice is that every record for which it is applicable should contain at least one Type element that uses a value from the olac:linguistic-type scheme to identify its linguistic data type.

Usage notes 

Type includes terms describing general categories, functions, genres, or aggregation levels for content. The [DCMI-Type] vocabulary specifies a controlled vocabulary to be used to with Type. These relate to the intended use or mode of perception by a user, e.g., whether it is read (as text), viewed (as image), heard, etc. Thus, a computer image file of a text (such as a scanned document) is type Text, not type Image, because the common use involves reading text. To describe the physical or digital manifestation of the resource, use the Format element. [DCMT]

The DCMI type Collection should be used in conjunction with one or more other applicable type terms to represent the nature of the resource as an aggregate of specific, but closely related resources, resources which could also stand on their own but were grouped together as a unit by their creator or a subsequent compiler. See Granularity of resources below for more discussion.

Whereas the DCMI Type scheme should be relevant to all resources, include a Type element with the [OLAC-Linguistic-Type] scheme only if a resource represents one of the structural types covered in that vocabulary. The element may be repeated if a resource represents more than one linguistic type. So, for example, a primary text accompanied by a lexicon of vocabulary items would be described as both a primary text and a lexicon.

Examples

The resource is a video recording, stored in MPEG format:

<dc:type xsi:type="dcterms:DCMIType">MovingImage</cd:type>
<dc:format xsi:type="dcterms:IMT">image/mpeg</dc:format>

The resource is a video recording of a story teller relating a traditional story, recorded on Super-8 film:

<dc:type xsi:type="dcterms:DCMIType">MovingImage</cd:type>
<dc:type xsi:type="olac:linguistic-type" olac:code="primary_text"/>
<dc:medium>Super 8mm film</dc:medium>

The resource is a collection of written texts:

<dc:title>Historia araona: Narrada por Bani Huali</dc:title>
<dc:type xsi:type="dcterms:DCMIType">Collection</cd:type>
<dc:type xsi:type="dcterms:DCMIType">Text</cd:type>
<dc:type xsi:type="olac:linguistic-type" olac:code="primary_text"/>

A database resource (offering multiple user formats) that also integrates textual reports and embedded photographs:

<dc:title>Agta Demographic Database: chronicle of a hunter-gatherer 
   community in transition</dc:title>
<dc:type xsi:type="dcterms:DCMIType">Dataset</cd:type>
<dc:format xsi:type="dcterms:IMT">application/msaccess</dc:format>
<dc:format xsi:type="dcterms:IMT">text/xml</dc:format>
<dc:format xsi:type="dcterms:IMT">text/csv</dc:format>
<dc:type xsi:type="dcterms:DCMIType">Text</cd:type>
<dc:type xsi:type="dcterms:DCMIType">StillImage</cd:type>

4. Other elements

The following subsections describe other elements that have been defined by the Dublin Core Metadata Initiative [DCMT]. They were not specified in the original core set of Dublin Core elements, but have been subsequently defined as elements in their own right (rather than being refinements of other elements).They are defined within the http://purl.org/dc/terms/ namespace for which the prefix dcterms: is used in the examples below.

Provenance

DefinitionA statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity and interpretation.
Refinements

There are no refinements for this element.

Schemes

There are no encoding schemes for this element.

Usage notes 

Provenance should be distinguished in usage from Description. Provenance metadata should be used to track or convey important information regarding the custody or management of the resource throughout its history, beginning with its creation. Include any information that has a bearing on interpreting content of the resource or upon its artifactual integrity and authenticity. In the case of a collection, provenance has to do with the origin of the collection as a whole and its history as a collection.

Examples

A group of texts and recordings constituting a small collection that has undergone some reorganization by a subsequent handler since original creation:

<dc:contributor xsi:type="olac:role" olac:code="compiler">Gardner, 
   Richard</dc:contributor>
<dc:contributor xsi:type="olac:role" olac:code="annotator">Anderson, Judi 
   Lynn</dc:contributor>
<dc:created>1962-1976</dc:created>
<dc:available>2007</dc:available>
<dc:description>Collection of 23 texts, with original recordings of 17 of the 
   texts. Texts were originally transcribed using a variety of othographic 
   conventions, some with interlinear analytical transcriptions, some also 
   with Spanish free translations.</dc:description>
<dcterms:provenance>Collection of texts recorded by Gardner, transcribed 
   with the assistance of a number of local residents. Body of materials 
   were transferred to custody of SIL Mexico Branch in 1980s, subsequently 
   organized,  collated and annotated by J. L. Anderson, a researcher in a 
   closely related Chinantec language. Annotations particularly relate to 
   changes in orthographic and tone marking conventions over the years of 
   collection and as compared to current practices.</dcterms:provenance>

Rights Holder

DefinitionA person or organization owning or managing rights over the resource.
Refinements

There are no refinements for this element.

Schemes

There are no encoding schemes for this element.

Usage notes 

Use the URI or name of the Rights Holder to indicate the entity.

Examples

Name of the copyright holder:

<dcterms:rightsHolder>	Copyright Regents of the University of California</dcterms:rightsHolder>

5. Granularity of resources

Determining the right level for units to be described as language resources in the OLAC context involves multiple factors. The level of unit appropriate for inclusion in an aggregated catalog like OLAC's may be different (typically higher) than the level desirable for the catalog of a specific institution's holdings, which in turn is typically higher than the level desirable for describing the detailed contents of a resource. Section 6 of the [OLAC-Repositories] standard establishes the following basic guideline regarding the granularity of the records in an OLAC repository:

A metadata repository should treat resources with a single provenance as constituting a single unit with respect to OLAC metadata and should, therefore, describe them within a single record.

The following discussion is aimed at assisting an OLAC participant to find the right level of description.

For a resource that has been published in some form, the appropriate unit of description for the OLAC record is the unit of the publication itself. A collective work (e.g., a festschrift) may warrant separate records for the separate papers contained within it (which should be related to the record for the work as a whole through the isPartOf and hasPart relationships; see Relation). In general, the OLAC record parallels a citable source. Thus, for published works, granularity does not really pose a problem. However, many archival resources have not been formally published. Unpublished papers that present findings of research closely parallel typical publications and can be treated in a comparable way as units for archival description.

Granularity poses the greatest problem with primary source materials (e.g., recordings, transcriptions, annotations, notes, data sets). The typical practice of archivists is to gather such materials into collections, which in turn become the primary units of archival description (i.e., the result is resources of DCMI type collection; see Type). The collection, for example, of a single field trip may contain a large number of distinct components—the separate pieces of documentation that comprise the records of a number of distinct linguistic events (each with recording, transcription, translation, annotation, etc.). However, these components are not the units for description at the OLAC level. It is the unit of the collection that forms the basic unit for OLAC description.

The foremost factor in determining what materials belong together as a single collection is their Provenance. In the typical case, the collected resources have a common provenance, that is, they have a common origin and history. Common origin includes who was responsible for collecting the materials, as well as when and where they were collected. It could be a single researcher or a research team, either in the context of a single trip or a series of related trips. It could also be a project that draws together materials from disparate sources for a single new research purpose, thus creating a new collection based on a secondary use of the materials. Common history is also relevant; the fact that the set of resources has been moved or changed hands or processed as a whole since it was originally collected helps to establish its identity as a single unit for archival description.

In other cases, materials may have been placed together at some point in time well after their creation, possibly by the archive itself. If this has occurred, there should be some coherent organizing principle or intermediate stage of development that is still relevant for the proper understanding and use of the resources. Materials without a common provenance generally should not be made to constitute a collection and should not be described collectively in an OLAC description.

Situations of shared provenance that constitute a collection for arrangement and description purposes will be distinguished by the high degree of commonality of metadata elements, e.g., same researcher(s), author(s), subject language(s), content language(s), approximate dates, coverage, linguistic type. Other metadata elements that may also be important for resource discovery but that may differ for items in the collection (e.g., format or discourse type) can be repeated at the OLAC description level for as many values as are significantly present in the collection. Alternatively, the collection may be divided into sub-collections by discourse genre, speaker, or other significant feature. The sub-collections can then be treated in distinct OLAC records (and related to the whole through the isPartOf and hasPart relationships; see Relation).

A collection is generally described in greater detail through the use of a "finding aid" that gives the details of organization and highlights features of interest. Further characteristics of individual items within a collection (such as topic, additional contributors, event specifics, extent, format) are documented at a finer level of granularity by creating such a collection description. This should also be included as a part of the collection, preferably as a structured document or metadata set. The Object Reuse and Exchange standard currently under development by the Open Archives Initiative [OAI-ORE] offers a means of handling such descriptions as a different kind of harvestable metadata in the OAI-PMH context. The session-level detail of [IMDI] description typically aligns with the finer level of description in OAI-ORE rather than the level of Dublin Core description in OAI-PMH.


References

[AACR2r]Anglo-American Cataloguing Rules, 2nd ed., 2002 revision. Ottawa and Chicago: Canadian Library Association and American Library Association, 2002-.
[CC]Creative Commons Licenses.
<http://creativecommons.org/licenses/>
[DC-Lib]DC-Library Application Profile.
<http://dublincore.org/documents/library-application-profile/index.shtml#creator>
[DCMI-Box]DCMI Box Encoding Scheme: specification of the spatial limits of a place, and methods for encoding this in a text string.
<http://dublincore.org/documents/dcmi-box/>
[DCMI-Period]DCMI Period Encoding Scheme: specification of the limits of a time interval, and methods for encoding this in a text string.
<http://dublincore.org/documents/dcmi-period/>
[DCMI-Point]DCMI Point Encoding Scheme: a point location in space, and methods for encoding this in a text string.
<http://dublincore.org/documents/dcmi-point/>
[DCMI-Type]DCMI Type Vocabulary.
<http://dublincore.org/documents/dcmi-type-vocabulary/>
[DCMT]Dublin Core Metadata Terms.
<http://dublincore.org/documents/dcmi-terms/>
[IMDI] IMDI [ISLE Metadata Initiative] Metadata Elements for Session Descriptions.
<http://www.mpi.nl/ISLE/documents/draft/ISLE_MetaData_2.5.pdf>
[IMT]MIME Media Types.
<http://www.iana.org/assignments/media-types/>
[ISO3166]ISO 3166: Codes for the Representation of Names of Countries.
<http://www.iso.org/iso/country_codes/iso_3166_code_lists.htm>
[OAI-ORE]Open Archives Initiative Object Reuse and Exchange.
<http://www.openarchives.org/ore/>
[OLAC-BP]Best Practice Recommendations for Language Resource Description.
<http://www.language-archives.org/REC/bpr.html>
[OLAC-Discourse]OLAC Discourse Type Vocabulary.
<http://www.language-archives.org/REC/discourse.html>
[OLAC-Extensions]Recommended Metadata Extensions.
<http://www.language-archives.org/REC/olac-extensions.html>
[OLAC-Language]OLAC Language Extension.
<http://www.language-archives.org/REC/language.html>
[OLAC-Linguistic-Field]OLAC Linguistic Subject Vocabulary.
<http://www.language-archives.org/REC/field.html>
[OLAC-Linguistic-Type]OLAC Linguistic Data Type Vocabulary.
<http://www.language-archives.org/REC/type.html>
[OLAC-Metadata]OLAC Metadata.
<http://www.language-archives.org/OLAC/metadata.html>
[OLAC-Repositories]OLAC Repositories.
<http://www.language-archives.org/OLAC/repositories.html>
[OLAC-Role]OLAC Role Vocabulary.
<http://www.language-archives.org/REC/role.html>
[TGN]Getty Thesaurus of Geographic Names.
<http://www.getty.edu/research/tools/vocabulary/tgn/index.html>
[URI]Uniform Resource Identifiers (URI): Generic Syntax.
<http://www.ietf.org/rfc/rfc2396.txt>
[W3CDTF]Date and Time Formats, W3C Note.
<http://www.w3.org/TR/NOTE-datetime>
[XML-Lang]Extensible Markup Language (XML) 1.0 (Fourth Edition), W3C Recommendation 16 August 2006. Section 2.12, Language Identification.
<http://www.w3.org/TR/REC-xml#sec-lang-tag>