OLAC Record
oai:scholarspace.manoa.hawaii.edu:10125/25264

Metadata
Title:Integrating descriptive and computational approaches in language documentation and resource development
Bibliographic Citation:Morrison, Michelle, Green, Christopher, Adams, Nikki, Smith Crabb, Erin, Morrison, Michelle, Green, Christopher, Adams, Nikki, Smith Crabb, Erin; 2015-02-26; The benefits of interdisciplinary teams as well as the creation of documentary products of a variety of types and media has been discussed widely in the literature (see, for example, Gippert et al 2006). However, computational resources, such as morphological parsers or automated part-of-speech taggers, are often not part of the suite of materials produced by a language documentation or description project. Moreover, descriptive and computational resources are frequently created by completely different sets of researchers who may have little to no contact with one another. Under such an approach, descriptive resources are considered to be foundational for the creation of computational resources. Thus it is typically the case that computational resources are built on, but do not inform, descriptive resources. We argue that simultaneous creation of both descriptive and computational resources allow each resource to not only inform, but also to significantly enhance the creation of the other. The authors of this paper are currently working on a project that focuses on creation of a number of resources for Somali. Objectives of the project include writing a descriptive reference grammar, creating a morphological parser and part-of-speech tagger, enhancing existing lexical resources, and developing computational aids designed to help electronic dictionary users who are unsure how to spell Somali words. The project uses a multi-faceted approach to data collection, including work with native speaker consultants, collection and transcription of narratives and conversations, creation and tagging of pedagogical corpora, and large-scale corpus mining of internet data. We describe our methodology, workflow, and some of our research outcomes and illustrate the ways in which simultaneous creation of computational and descriptive resources has significantly improved our products. For example, writing of the descriptive grammar and development of the morphological parser have been done in tandem. At many points along the way, problems encountered in the programming of the parser shed light on shortcomings in both our description and understanding of Somali structures. We are currently in the process of validating the output of the parser against a large internet corpus of Somali data. We are using automatic scripts to identify words which cannot be parsed using our tools (either due to errors in the parser or gaps in the dictionary). The output of this process allows us to refine our grammar and parser, as well as provide enhancements and modernizations to a published Somali dictionary (Zorc & Osman 2002).; Kaipuleohone University of Hawai'i Digital Language Archive;http://hdl.handle.net/10125/25264.
Contributor (speaker):Morrison, Michelle
Green, Christopher
Adams, Nikki
Smith Crabb, Erin
Creator:Morrison, Michelle
Green, Christopher
Adams, Nikki
Smith Crabb, Erin
Date (W3CDTF):2015-03-12
Description:The benefits of interdisciplinary teams as well as the creation of documentary products of a variety of types and media has been discussed widely in the literature (see, for example, Gippert et al 2006). However, computational resources, such as morphological parsers or automated part-of-speech taggers, are often not part of the suite of materials produced by a language documentation or description project. Moreover, descriptive and computational resources are frequently created by completely different sets of researchers who may have little to no contact with one another. Under such an approach, descriptive resources are considered to be foundational for the creation of computational resources. Thus it is typically the case that computational resources are built on, but do not inform, descriptive resources. We argue that simultaneous creation of both descriptive and computational resources allow each resource to not only inform, but also to significantly enhance the creation of the other. The authors of this paper are currently working on a project that focuses on creation of a number of resources for Somali. Objectives of the project include writing a descriptive reference grammar, creating a morphological parser and part-of-speech tagger, enhancing existing lexical resources, and developing computational aids designed to help electronic dictionary users who are unsure how to spell Somali words. The project uses a multi-faceted approach to data collection, including work with native speaker consultants, collection and transcription of narratives and conversations, creation and tagging of pedagogical corpora, and large-scale corpus mining of internet data. We describe our methodology, workflow, and some of our research outcomes and illustrate the ways in which simultaneous creation of computational and descriptive resources has significantly improved our products. For example, writing of the descriptive grammar and development of the morphological parser have been done in tandem. At many points along the way, problems encountered in the programming of the parser shed light on shortcomings in both our description and understanding of Somali structures. We are currently in the process of validating the output of the parser against a large internet corpus of Somali data. We are using automatic scripts to identify words which cannot be parsed using our tools (either due to errors in the parser or gaps in the dictionary). The output of this process allows us to refine our grammar and parser, as well as provide enhancements and modernizations to a published Somali dictionary (Zorc & Osman 2002).
Identifier (URI):http://hdl.handle.net/10125/25264
Rights:Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
Table Of Contents:25264.mp3

OLAC Info

Archive:  Language Documentation and Conservation
Description:  http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:scholarspace.manoa.hawaii.edu:10125/25264
DateStamp:  2017-05-11
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Morrison, Michelle; Green, Christopher; Adams, Nikki; Smith Crabb, Erin. 2015. Language Documentation and Conservation.


http://www.language-archives.org/item.php/oai:scholarspace.manoa.hawaii.edu:10125/25264
Up-to-date as of: Mon Mar 11 1:36:16 EDT 2024