OLAC Record
oai:scholarspace.manoa.hawaii.edu:10125/26113

Metadata
Title:Technology in documentation: TEI and the Nxa'amxcín Dictionary
Bibliographic Citation:Czaykowska-Higgins, Ewa, Holmes, Martin, Czaykowska-Higgins, Ewa, Holmes, Martin; 2013-03-02; Expanding use of technology in endangered language documentation has increased interest in the development of digital standards for lexical information. Many digital lexica developed by linguists make use of standards like LIFT/GOLD (e.g., SIL's Toolbox, FLEX), or LMF/DCR (e.g., LEXUS -VICOS; Aristar-Dry et al 2012), but few are reported to use TEI, even though TEI is a Digital Humanities standard with a dictionary module (TEI, ch. 9; Romary and Wegstein 2012). In this paper, we outline a project for Nxa'amxcín (Salish) that uses TEI structure and markup. We argue that TEI is a useful tool for endangered language lexica. The original Nxa'amxcín print-dictionary project, begun using Lexware (Hsu 1985), WordPerfect, and DOS, was exemplary in 1991, but dependence on customized character-sets, obsolete printer fonts, macros, and a Hercules graphics card, made the data unusable by 2005. A lengthy process retrieved and converted the data to a modern format (Author and Newton 2008). In the absence of a stable non-proprietary standard (ISO 24613 was released only in 2008), and following guidelines for interoperability, portability (Bird and Simon 2003) and use of open formats (see, e.g. Good 2011), TEI seemed an obvious choice in 2005: it is widely used for born-digital documents and provides a wide range of tags for dictionaries, linguistic analysis and corpus linguistics (chs. 15-18). In our paper we show that, as an open, mature standard, TEI is a useful encoding strategy for our entire project, providing a reliable archival format for Nxa'amxcín data. Its infrastructure is more than a set of schemas and encoding guidelines (ch. 23), and it enables users to tightly constrain schemas to consist only of elements and attributes required by a specific project. It provides flexibility to encode morphological relationships, which is invaluable for the complex, Salish morphology of Nxa'amxcín. TEI also generates project-specific documentation embedded directly into a RelaxNG schema, providing inline help for XML encoders, incorporates peripheral data into the same digital corpus, and links across collections easily. The XML data serves as the basis for an online digital dictionary, for print dictionaries, wordlists, the dictionary website structure and supplementary material, and teaching and practice materials. Finally, editing with a well-documented TEI schema is relatively easy, and not dependent on an externally-controlled web application for data entry. Because TEI is not widely used for endangered languages, we conclude by comparing TEI, LMF/DCR and LIFT/GOLD as they might apply in a Nxa'amxcín lexicon.; Kaipuleohone University of Hawai'i Digital Language Archive;http://hdl.handle.net/10125/26113.
Contributor (speaker):Czaykowska-Higgins, Ewa
Holmes, Martin
Creator:Czaykowska-Higgins, Ewa
Holmes, Martin
Date (W3CDTF):2013-03-02
Description:Expanding use of technology in endangered language documentation has increased interest in the development of digital standards for lexical information. Many digital lexica developed by linguists make use of standards like LIFT/GOLD (e.g., SIL's Toolbox, FLEX), or LMF/DCR (e.g., LEXUS -VICOS; Aristar-Dry et al 2012), but few are reported to use TEI, even though TEI is a Digital Humanities standard with a dictionary module (TEI, ch. 9; Romary and Wegstein 2012). In this paper, we outline a project for Nxa'amxcín (Salish) that uses TEI structure and markup. We argue that TEI is a useful tool for endangered language lexica. The original Nxa'amxcín print-dictionary project, begun using Lexware (Hsu 1985), WordPerfect, and DOS, was exemplary in 1991, but dependence on customized character-sets, obsolete printer fonts, macros, and a Hercules graphics card, made the data unusable by 2005. A lengthy process retrieved and converted the data to a modern format (Author and Newton 2008). In the absence of a stable non-proprietary standard (ISO 24613 was released only in 2008), and following guidelines for interoperability, portability (Bird and Simon 2003) and use of open formats (see, e.g. Good 2011), TEI seemed an obvious choice in 2005: it is widely used for born-digital documents and provides a wide range of tags for dictionaries, linguistic analysis and corpus linguistics (chs. 15-18). In our paper we show that, as an open, mature standard, TEI is a useful encoding strategy for our entire project, providing a reliable archival format for Nxa'amxcín data. Its infrastructure is more than a set of schemas and encoding guidelines (ch. 23), and it enables users to tightly constrain schemas to consist only of elements and attributes required by a specific project. It provides flexibility to encode morphological relationships, which is invaluable for the complex, Salish morphology of Nxa'amxcín. TEI also generates project-specific documentation embedded directly into a RelaxNG schema, providing inline help for XML encoders, incorporates peripheral data into the same digital corpus, and links across collections easily. The XML data serves as the basis for an online digital dictionary, for print dictionaries, wordlists, the dictionary website structure and supplementary material, and teaching and practice materials. Finally, editing with a well-documented TEI schema is relatively easy, and not dependent on an externally-controlled web application for data entry. Because TEI is not widely used for endangered languages, we conclude by comparing TEI, LMF/DCR and LIFT/GOLD as they might apply in a Nxa'amxcín lexicon.
Identifier (URI):http://hdl.handle.net/10125/26113
Language:English
Language (ISO639):eng
Rights:Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
Table Of Contents:26113.mp3
26113.pdf

OLAC Info

Archive:  Language Documentation and Conservation
Description:  http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:scholarspace.manoa.hawaii.edu:10125/26113
DateStamp:  2017-05-11
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Czaykowska-Higgins, Ewa; Holmes, Martin. 2013. Language Documentation and Conservation.
Terms: area_Europe country_GB iso639_eng


http://www.language-archives.org/item.php/oai:scholarspace.manoa.hawaii.edu:10125/26113
Up-to-date as of: Mon Mar 11 1:36:09 EDT 2024