OLAC Linguistic Data Type Vocabulary

Date issued:2002-12-12
Status of document:Proposed Recommendation. This document is in the midst of open review by the community.
This version:http://www.language-archives.org/REC/type-20021212.html
Latest version:http://www.language-archives.org/REC/type.html
Previous version:http://www.language-archives.org/REC/type-20020628.html

This document specifies the codes, or controlled vocabulary, for the Linguistic Data Type extension of the DCMI Type element. These codes describe the content of a resource from the standpoint of recognized structural types of linguistic information.

Editors: Helen Aristar Dry (mailto:hdry@linguistlist.org), Heidi Johnson (mailto:ailla@ailla.org)
Changes since previous version:

Adds: Language Description, Primary Text

Deletes: Transcription section, Annotation section, Dataset, Text section (changed to Primary Text), Description (changed to Language Description), subtypes of Lexicon.

Copyright © Helen Aristar Dry (Eastern Michigan University), Heidi Johnson (University of Texas at Austin) . This material may be distributed and repurposed subject to the terms and conditions set forth in the Creative Commons Attribution-ShareAlike 2.5 License.

Table of contents

  1. Introduction
  2. Linguistic Data Type

1. Introduction

Like all elements in the OLAC metadata standard, the Type element is both optional and repeatable. Since it is optional, the Linguistic Type extension of the Type element should only be used if a resource contains a significant amount of primary data and represents one of the structural types described below. Many linguistic papers and analyses thus will not be described using this element. However, the element may be repeated if a resource represents more than one linguistic type. So, for example, a primary text accompanied by a lexicon of vocabulary items would be described as both a primary text and a lexicon.

In many cases, adequate description of a linguistic resource will require that the Linguistic Type extension be used in conjunction with the Discourse Type extension of the Type element, or the Linguistic extension of the Subject element. For example, a narrative text might be described as both a primary text (OLAC Linguistic Type) and a narrative (OLAC Discourse Type).

Note that a dataset should be described using the DCMI Type vocabulary, e.g. Dataset, Collection. See: [DC-TYPE].

2. Linguistic Data Type

Each term in the controlled vocabulary is described in one of the following subsections. The heading gives the encoded value for the term that is to be used as the value of the code attribute of the "OLAC-Linguistic-Type" extension of the Type metadata element [OLAC-MS]. Under the heading, the term is described in four ways. Name gives a descriptive label for the term. Definition is a one-line summary of what the term means. Comments offers more details on what the term represents. Examples may also be given to illustrate how the term is meant to be applied.


DefinitionThe resource includes a systematic listing of lexical items.

Lexicon may be used to describe any resource which includes a systematic listing of lexical items. Each lexical item may, but need not, be accompanied by a definition, a description of the referent (in the case of proper names), or an indication of the item's semantic relationship to other lexical items.


Examples include word lists (including comparative word lists), thesauri, wordnets, framenets, and dictionaries, including specialized dictionaries such as bilingual and multilingual dictionaries, dictionaries of terminology, and dictionaries of proper names. Non-word-based examples include phrasal lexicons and lexicons of intonational tunes.


NamePrimary Text
DefinitionLinguistic material which is itself the object of study, typically material in the subject language which is a performance of a speech event, or the written analog of such an event.

Primary Text is used to describe material in the subject language; it reflects a speech event, or is the written analog, and embodies linguistic features which make it an object of analysis or research. Typically, a primary text has temporal structure. Most often in a primary text, the timecourse of some real or fictional linguistic event, e.g., a conversation, elicitation session, or imagined scene, is carried over into the archived artifact.

Note: Primary_text describes the content of the resource, not the physical format. A primary text may be recorded in different physical media, e.g. as videotaped, audiotaped, or written text. Physical format is described using the Format element.


Examples of primary texts include transcribed interlinear texts, letters, (audiotaped) elicitation sessions, (videotaped) rituals or story-telling sessions, and any other recorded speech or writing which exemplifies language structure or use in such a way as to become an object of study. When a corpus is a collection of primary texts, it should be described both as a primary text (OLAC Linguistic Type) and as a collection (DC Type).


NameLanguage Description
DefinitionThe resource describes a language or some aspect(s) of a language via a systematic documentation of linguistic structures.

Not every resource commonly termed 'descriptive' should be described with this code; for example, a description of Zamenhof, the creator of Esperanto, would not be classified using this code. Instead, the code should be reserved for a resource that describes a language or some aspect of a language, typically in prose with interspersed examples.

The similarly named code, Language Documentation (in the Linguistic Subject extension) refers to a field of study. This category, Language Description, refers to recognized structural types of linguistic information, which are the products of the field of study, e.g., grammars and field notes.


Examples of descriptions include a formal grammar, a sketch grammar, field notes, a phonological sketch.


[DC-TYPE]Dublin Core Type Vocabulary.
[OLAC-MS]OLAC Metadata Set.