OLAC Linguistic Data Type Vocabulary

Date issued:2002-06-12
Status of document:WithdrawnRecommendation.
This version:http://www.language-archives.org/REC/type-20020612.html
Latest version:http://www.language-archives.org/REC/type.html
Previous version:http://www.language-archives.org/REC/type-20010320.html
Abstract:

This document specifies the controlled vocabulary of language resource types used by OLAC. The linguistic data type vocabulary describes the nature of the content of a resource from a linguistic standpoint.

Editors: Heidi Johnson (mailto:ailla@ailla.org)
Helen Aristar Dry (mailto:hdry@linguistlist.org)
Changes since previous version:

Adds: transcription/translation, transcription/phonological, transcription/semantic, transcription/eye-gaze, transcription/facial-expression, annotation/translation, annotation/phonological, annotation/semantic, annotation/eye-gaze, annotation/facial-expression, description/phonetic, description/prosodic, description/gestural, description/morphological, description/syntactic, description/part-of-speech, description/semantic, description/discourse, description/musical, description/eye-gaze, description/facial-expression, dataset/orthographic, dataset/phonetic, dataset/prosodic, dataset/gestural, dataset/phonological, dataset/morphological, dataset/syntactic, dataset/part-of-speech, dataset/semantic, dataset/discourse, dataset/musical, dataset/eye-gaze, dataset/facial-expression, text/narrative, text/oratory, text/dialogue, text/singing, text/drama, text/formulaic, text/procedural, text/report, text/ludic, text/unintelligible speech.

Deletes: description/grammatical, description/paradigms, description/pedagogical, description/dialectal, description/comparative, Genre Type section.

Copyright © Heidi Johnson (University of Texas at Austin), Helen Aristar Dry (Eastern Michigan University) . This material may be distributed and repurposed subject to the terms and conditions set forth in the Creative Commons Attribution-ShareAlike 2.5 License.

Table of contents

  1. Introduction
  2. Linguistic Data Type
References

1. Introduction

Key points: two-level systems, multiple categories for a single resource, parallelism of the transcription and annotation subcategories.

2. Linguistic Data Type

Each term of the controlled vocabulary is described in one of the following subsections. The heading gives the encoded value for the term that is to be used as the value of the code attribute of the Type.linguistic metadata element [OLAC-MS]. Under the heading, the term is described in four ways. Name gives a descriptive label for the term. Definition is a one-line summary of what the term means. Comments offers more details on what the term represents. Examples may also be given to illustrate how the term is meant to be applied.

A further label, Subterms, appears when the term permits more specific refinements. In such cases, the generic (top-level) terms may be chosen, or one of its more specific refinements.

transcription

NameTranscription
DefinitionThe resource includes a transcription of a linguistic event.
Subterms

transcription/translation

Name
Definition

transcription/orthographic

NameOrthographic transcription
DefinitionThe resource includes orthographic transcription.

transcription/phonetic

NamePhonetic transcription
DefinitionThe resource includes phonetic transcription.
Comments

Phonetic transcription may be narrow or broad, and will typically use the International Phonetic Alphabet [IPA] in a standard encoding (e.g. [Unicode-IPA], [SAMPA]). Phonological transcriptions are also classified here.

transcription/prosodic

NameProsodic transcription
DefinitionThe resource includes prosodic transcription.
Comments

A prosodic transcription is a symbolic record of intonation, stress, tone or other suprasegmental features that is expressed independently of regular phonetic transcription.

transcription/gestural

NameGestural transcription
DefinitionThe resource includes gestural transcription.

transcription/phonological

Name
Definition

transcription/morphological

NameMorphological transcription
DefinitionThe resource includes morphological transcription.

transcription/syntactic

NameSyntactic transcription
DefinitionThe resource includes syntactic transcription.

transcription/part-of-speech

NamePart-of-speech transcription
DefinitionThe resource includes part-of-speech tags.

transcription/semantic

NameSemantic transcription
Definition

transcription/discourse

NameDiscourse transcription
DefinitionThe resource includes discourse transcription.

transcription/musical

NameMusical transcription
DefinitionThe resource includes musical transcription.

transcription/eye-gaze

Name
Definition

transcription/facial-expression

Name
Definition

annotation

NameAnnotation
DefinitionThe resource includes information which annotates some other linguistic record.
Comments

A linguistic annotation is defined as structured linguistic information that is explicitly aligned to some spatial and/or temporal extent of some other linguistic record.

Subterms

annotation/translation

Name
Definition

annotation/orthographic

NameOrthographic annotation
DefinitionThe resource includes orthographic annotation.

annotation/phonetic

NamePhonetic annotation
DefinitionThe resource includes phonetic annotation.
Comments

An example of a phonetic annotation is the TIMIT database, in which each element of phonetic transcription is associated with a range of samples in a digital audio file [TIMIT]. Phonological annotations are also classified here.

annotation/prosodic

NameProsodic annotation
DefinitionThe resource includes prosodic annotation.

annotation/gestural

NameGestural annotation
DefinitionThe resource includes gestural annotation.

annotation/phonological

Name
Definition

annotation/morphological

NameMorphological annotation
DefinitionThe resource includes morphological annotation.
Comments

A morphological annotation is a morphological transcription where the component morphemes are aligned with some other linguistic record, such as an orthographic transcription or a speech signal. An example of morphological annotation is interlinear text with aligned morphemic glosses.

annotation/syntactic

NameSyntactic annotation
DefinitionThe resource includes aligned syntactic transcription.

annotation/part-of-speech

NamePart-of-speech annotation
DefinitionThe resource includes aligned part-of-speech tags.

annotation/semantic

Name
Definition

annotation/discourse

NameDiscourse annotation
DefinitionThe resource includes aligned discourse transcription.

annotation/musical

NameMusical annotation
DefinitionThe resource includes musical annotation.

annotation/eye-gaze

Name
Definition

annotation/facial-expression

Name
Definition

dataset

NameDataset
DefinitionThe resource is a structured set of data items.
Comments

A dataset is a collection of items organized in a structured format for some specific research purpose. Examples of datasets are: a database of sentences illustrating deictic terms; an inflectional affix paradigm; a list of utterance tokens in a uniform context (e.g. "Say [pat] now.").

Subterms are the same as for annotation and transcription

Subterms

dataset/orthographic

NameOrthographic data
DefinitionThe dataset includes orthographic data.

dataset/phonetic

NamePhonetic dataset
DefinitionThe dataset includes phonetic data.

dataset/prosodic

NameProsodic dataset
DefinitionThe dataset includes prosodic data.

dataset/gestural

NameGestural dataset
DefinitionThe dataset includes gestural data.

dataset/phonological

Name
Definition

dataset/morphological

NameMorphological dataset
DefinitionThe dataset includes morphological data.
Comments

A morphological annotation is a morphological transcription where the component morphemes are aligned with some other linguistic record, such as an orthographic transcription or a speech signal. An example of morphological annotation is interlinear text with aligned morphemic glosses.

dataset/syntactic

NameSyntactic dataset
DefinitionThe dataset includes syntactic data.

dataset/part-of-speech

NamePart-of-speech dataset
DefinitionThe dataset includes part-of-speech data.

dataset/semantic

NameSemantic dataset
DefinitionThe dataset includes semantic data.

dataset/discourse

NameDiscourse dataset
DefinitionThe dataset includes discourse data.

dataset/musical

NameMusical dataset
DefinitionThe dataset includes musical data.

dataset/eye-gaze

NameEye Gaze dataset
DefinitionThe dataset includes eye-gaze data.

dataset/facial-expression

NameFacial Expression dataset
DefinitionThe dataset includes facial-expression data.

description

NameDescription
DefinitionThe resource includes linguistic description.
Comments

A description is any description or analysis of a language. Unlike a transcription or an annotation, the structure of a description is independent of the structure of the linguistic events that it describes.

Subterms

description/grammatical

NameGrammatical description
DefinitionThe resource includes grammatical description.

description/phonological

NamePhonological description
DefinitionThe resource includes phonological description.

description/orthographic

NameOrthographic description
DefinitionThe resource includes documentation of a writing system.

description/paradigms

NameLinguistic paradigms
DefinitionThe resource includes linguistic paradigms.
Comments

A paradigm is a tabulation of linguistic forms designed to illustrate one or more systematic contrasts.

description/pedagogical

NamePedagogical description
DefinitionThe resource includes pedagogical description.
Comments

A pedagogical description is a style of presentation intended for use in teaching people to use the language.

description/dialectal

NameDialectal description
DefinitionThe resource includes dialectal description.

description/comparative

NameComparative description
DefinitionThe resource includes comparative or typological description.

lexicon

NameLexicon
DefinitionThe resource includes a systematic listing of lexical items.
Subterms

lexicon/dictionary

NameDictionary
DefinitionThe resource includes a dictionary.
Comments

This includes any resource that lists words or morphemes and defines them. It contrasts with a word list in that the definitions are complex (rather than being one-word equivalents) and the entries may include other information like part of speech, related words, and illustrative sentences.

lexicon/wordlist

NameWord list
DefinitionThe resource includes a word list.
Comments

A word list is a list of reference words in a major language for which the nearest equivalent word in a target language has been elicited (for instance, the Swadesh 100-word list).

lexicon/wordnet

NameWordNet
DefinitionThe resource includes a semantic wordnet.
Comments

Whereas a dictionary documents the meanings of words by means of definitions, a word net documents meanings by building a web of semantic relationships [WordNet].

lexicon/thesaurus

NameThesaurus
DefinitionThe resource includes a thesaurus.
Comments

A thesaurus is a list of words or concepts arranged according to sense.

lexicon/terminology

NameTerminology
DefinitionThe resource includes a terminological lexicon.
Comments

A terminological lexicon is a glossary of domain-specific terms. Examples are technical terminology, kinship terms, color terms, acronyms, ...

lexicon/proper-names

NameName Dictionary
DefinitionThe resource includes proper names.

lexicon/bilingual

NameBilingual Lexicon
DefinitionThe resource includes definitions in another language.

lexicon/etymological

NameEtymological Lexicon
DefinitionThe lexicon contains etymological information.

lexicon/phonetic

NamePhonetic Lexicon
DefinitionThe lexicon contains phonetic information, including pronunciation, phonology, stress, rhymes.

lexicon/frequency

NameFrequency Lexicon
DefinitionThe lexicon contains frequency information.

lexicon/analytical

NameAnalytical Lexicon
DefinitionThe lexicon contains analytical information.
Comments

Analytical information includes such things as morphological derivation, grammatically related forms, argument structure, ...

text

NameText
DefinitionThis is a primary resource: the object of study.
Comments

A text is defined as any primary resource or research material, such as a literary work, film, or recording of natural discourse.

Subterms

text/narrative

NameNarrative
DefinitionA monologic discourse: a history, tale, story, or recital.
Comments

Types of narratives include historical, traditional, and personal narratives, myths, folktales, fables, and humorous stories.

text/oratory

NameOratory
Definition"The art of the orator or of public speaking; the art of speaking eloquently according to definite rules, so as to please or persuade; rhetoric" (OED).
Comments

Examples of oratory include sermons, lectures, political speeches, and invocations.

text/dialogue

NameDialogue
DefinitionAn interactive discourse with two or more participants.
Comments

Examples of dialogues include conversations, interviews, correspondence, consultations, greetings and leave-takings.

text/singing

NameSinging
DefinitionThe resource consists chiefly of language that is sung, rather than spoken.
Comments

Examples of singing include chants, songs, and choruses.

text/drama

NameDrama
DefinitionA planned, creative, rendition of discourse with two or more participants.

text/formulaic

NameFormulaic
DefinitionThe resource is a ritually or conventionally structured discourse.
Comments

Examples of formulaic discourse are prayers, curses, blessings, charms, curing rituals, marriage vows, and oaths.

text/procedural

NameProcedural
DefinitionAn explanation or description of a method, process, or situation.
Comments

Examples of procedural discourses include recipes, instructions, plans, and descriptions.

text/report

NameReport
DefinitionAn objective account of some event or circumstance.
Comments

Examples of reports include news reports, essays, and commentaries.

text/ludic

NameLudic
DefinitionLudic discourse is language whose primary function is to be part of play, or a style of speech that involves a creative manipulation of the structures of the language.
Comments

Examples of ludic discourse are play languages, jokes, secret languages, and speech disguises.

text/unintelligible speech

NameUnintelligible speech
DefinitionThe resource consists of utterances that are not intended to be interpretable as ordinary language.
Comments

Examples of unintelligible speech include sacred languages, speaking in tongues, and singing syllables (fa-la-la).


To do

Write the introduction. Explain that typical resources will contain multiple types.


References

[OLAC-MS]OLAC Metadata Set.
<http://www.language-archives.org/OLAC/olacms.html>
[SAMPA]Speech Assessment Methods Phonetic Alphabet
<http://www.phon.ucl.ac.uk/home/sampa/home.htm>
[TIMIT]TIMIT Acoustic-Phonetic Continuous Speech Corpus
<http://www.ldc.upenn.edu/Catalog/LDC93S1.html>
[Unicode-IPA]Unicode IPA Extensions
<http://www.unicode.org/unicode/uni2book/ch07.pdf>
[WordNet]WordNet - a Lexical Database for English
<http://www.cogsci.princeton.edu/~wn/>