OLAC Record
oai:scholarspace.manoa.hawaii.edu:10125/26157

Metadata
Title:Software tools for integrated development of the corpus, the lexicon,and community materials
Bibliographic Citation:Nakhimovsky, Alexander, Myers, Tom, Nakhimovsky, Alexander, Myers, Tom; 2013-02-28; We present an integrated approach to corpus and lexicon development, both for the language archive and a repository of materials for local community. We assume that the target audiences of the archive and the repository have different interests in the same underlying body of data, and we seek to construct that body of data in such a way that both sets of interests can be addressed. This involves integration of three pieces of software: ELAN, for work with digital video/audio and its transcription; FLEx, for grammatical analysis and lexicon development; and MannX, a browser-based video player for language learning. Integrating corpus and the lexicon means the following functionality: • Each lexicon entry has links to its tokens in the corpus, which are in turn linked, via time alignment, to the media segments in which the tokens occur. • Each word in the corpus has a link to its lexicon entry. To achieve this, the corpus and the lexicon must be integrated throughout their development: • The lexicon maintains the list of all lexical and grammatical morphemes. • When a morpheme that is already in the lexicon is encountered in the corpus, interlinear glosses are filled in from the lexicon. • When a new morpheme is encountered, a new entry is created. • A change in the lexical entry is automatically propagated through the corpus. We seek to achieve this kind of integration by building a software "bridge" between ELAN and FLEx that supports the following workflow: • Starting in ELAN, do transcription and time-alignment at the utterance level ("phrase" in FLEx). • Export to FLEx for lexical and morphological analysis. • Export the results back into ELAN as symbolic subdivision or symbolic association tiers. • Further annotate in ELAN; perhaps time-align at the word level. As of August 2012, a functioning version of software has been implemented in JavaScript. It is being reimplemented in Java as a Web application. We expect to put the Java version on the Web for testing in September, and also upgrade the ELAN-MannX conversion. This process results in a corpus of media files, associated annotation files, and a FLEx-created lexicon, with links between them. A subset of these materials, transformed into different formats, will form the basis of a community repository. This will be a Web application, running on a remote or localhost server, that can run on a laptop or on an Android phone.; Kaipuleohone University of Hawai'i Digital Language Archive;http://hdl.handle.net/10125/26157.
Contributor (speaker):Nakhimovsky, Alexander
Myers, Tom
Creator:Nakhimovsky, Alexander
Myers, Tom
Date (W3CDTF):2013-02-28
Description:We present an integrated approach to corpus and lexicon development, both for the language archive and a repository of materials for local community. We assume that the target audiences of the archive and the repository have different interests in the same underlying body of data, and we seek to construct that body of data in such a way that both sets of interests can be addressed. This involves integration of three pieces of software: ELAN, for work with digital video/audio and its transcription; FLEx, for grammatical analysis and lexicon development; and MannX, a browser-based video player for language learning. Integrating corpus and the lexicon means the following functionality: • Each lexicon entry has links to its tokens in the corpus, which are in turn linked, via time alignment, to the media segments in which the tokens occur. • Each word in the corpus has a link to its lexicon entry. To achieve this, the corpus and the lexicon must be integrated throughout their development: • The lexicon maintains the list of all lexical and grammatical morphemes. • When a morpheme that is already in the lexicon is encountered in the corpus, interlinear glosses are filled in from the lexicon. • When a new morpheme is encountered, a new entry is created. • A change in the lexical entry is automatically propagated through the corpus. We seek to achieve this kind of integration by building a software "bridge" between ELAN and FLEx that supports the following workflow: • Starting in ELAN, do transcription and time-alignment at the utterance level ("phrase" in FLEx). • Export to FLEx for lexical and morphological analysis. • Export the results back into ELAN as symbolic subdivision or symbolic association tiers. • Further annotate in ELAN; perhaps time-align at the word level. As of August 2012, a functioning version of software has been implemented in JavaScript. It is being reimplemented in Java as a Web application. We expect to put the Java version on the Web for testing in September, and also upgrade the ELAN-MannX conversion. This process results in a corpus of media files, associated annotation files, and a FLEx-created lexicon, with links between them. A subset of these materials, transformed into different formats, will form the basis of a community repository. This will be a Web application, running on a remote or localhost server, that can run on a laptop or on an Android phone.
Identifier (URI):http://hdl.handle.net/10125/26157
Language:English
Language (ISO639):eng
Rights:Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
Table Of Contents:26157.mp3

OLAC Info

Archive:  Language Documentation and Conservation
Description:  http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:scholarspace.manoa.hawaii.edu:10125/26157
DateStamp:  2017-05-11
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Nakhimovsky, Alexander; Myers, Tom. 2013. Language Documentation and Conservation.
Terms: area_Europe country_GB iso639_eng


http://www.language-archives.org/item.php/oai:scholarspace.manoa.hawaii.edu:10125/26157
Up-to-date as of: Mon Mar 11 1:36:05 EDT 2024