OLAC Record: Why documenting different languages necessitates different data

OLAC Record
oai:scholarspace.manoa.hawaii.edu:10125/26138

Metadata

Title: Why documenting different languages necessitates different data

Bibliographic Citation: McDonnell, Bradley, McDonnell, Bradley; 2013-02-28; Since its inception as a sub-discipline of linguistics, most theoretical issues in Documentary Linguistics have revolved around data collection. A particularly lively discussion has focused on the balance between constructing a corpus of naturally occurring discourse and collecting isolated examples through direct elicitation (cf. Evans 2008, Himmelmann 2012). While some advocate for a strong focus on constructing corpora of naturally occurring discourse (i.e., Himmelmann 2006), others have raised concerns that over-emphasizing the role of discourse data will result in aspects of the language being neglected (i.e., Rhodes et. al. 2006, Chelliah & de Ruese 2011). Using languages of North America, Mithun (2001) demonstrates that an optimal balance between discourse data and elicited data will differ depending on the linguistic domain in question (i.e., lexicon, phonology, grammar). The question remains, however, to what extent is this balance similar for languages with different typological profiles (i.e., isolating, agglutinating, or polysynthetic languages) and in various sociolinguistic situations (i.e., monolingual, bilingual, or multilingual speech communities)? Drawing on the documentation of Besemah, a Malay language of southwest Sumatra, this paper presents case studies of (1) the periphrastic passive, (2) the headless relative clause construction, and (3) the syncretic causative/applicative construction, in order to show that expanding the role of naturalistic discourse data increases both the depth and insight of the grammatical analyses of the language. Like most Malay languages of Sumatra, Besemah is considered an ‘underspecified’ language (cf. Gil 2001), as it lacks grammatical marking of nominal categories like person, number, and case as well as verbal categories like tense and agreement. Additionally, Besemah represents a complex sociolinguistic setting, being polyglossic with two other Malay varieties, Standard Indonesian (the language of government, education, and the media) and Palembang Indonesian (the lingua franca of southern Sumatra). The relative similarity of these Malay varieties coupled with the sociolinguistic complexity with which they are used make it difficult to obtain reliable elicited grammatical examples and/or grammaticality judgments. At the same time, the nature of the ‘underspecified’ grammar requires much less direct elicitation of paradigms, allomorphy, etc. Based on these factors, the aim of this paper is to show that striking the balance between discourse data and elicited data is not one-size-fits-all, but is dependent on a number of factors. An important outcome of this study, therefore, is to illustrate that language documentation projects need to calibrate data collection methodologies to accommodate the individual differences that each language possesses.; Kaipuleohone University of Hawai'i Digital Language Archive;http://hdl.handle.net/10125/26138.

Contributor (speaker): McDonnell, Bradley

Creator: McDonnell, Bradley

Date (W3CDTF): 2013-02-28

Description: Since its inception as a sub-discipline of linguistics, most theoretical issues in Documentary Linguistics have revolved around data collection. A particularly lively discussion has focused on the balance between constructing a corpus of naturally occurring discourse and collecting isolated examples through direct elicitation (cf. Evans 2008, Himmelmann 2012). While some advocate for a strong focus on constructing corpora of naturally occurring discourse (i.e., Himmelmann 2006), others have raised concerns that over-emphasizing the role of discourse data will result in aspects of the language being neglected (i.e., Rhodes et. al. 2006, Chelliah & de Ruese 2011). Using languages of North America, Mithun (2001) demonstrates that an optimal balance between discourse data and elicited data will differ depending on the linguistic domain in question (i.e., lexicon, phonology, grammar). The question remains, however, to what extent is this balance similar for languages with different typological profiles (i.e., isolating, agglutinating, or polysynthetic languages) and in various sociolinguistic situations (i.e., monolingual, bilingual, or multilingual speech communities)? Drawing on the documentation of Besemah, a Malay language of southwest Sumatra, this paper presents case studies of (1) the periphrastic passive, (2) the headless relative clause construction, and (3) the syncretic causative/applicative construction, in order to show that expanding the role of naturalistic discourse data increases both the depth and insight of the grammatical analyses of the language. Like most Malay languages of Sumatra, Besemah is considered an ‘underspecified’ language (cf. Gil 2001), as it lacks grammatical marking of nominal categories like person, number, and case as well as verbal categories like tense and agreement. Additionally, Besemah represents a complex sociolinguistic setting, being polyglossic with two other Malay varieties, Standard Indonesian (the language of government, education, and the media) and Palembang Indonesian (the lingua franca of southern Sumatra). The relative similarity of these Malay varieties coupled with the sociolinguistic complexity with which they are used make it difficult to obtain reliable elicited grammatical examples and/or grammaticality judgments. At the same time, the nature of the ‘underspecified’ grammar requires much less direct elicitation of paradigms, allomorphy, etc. Based on these factors, the aim of this paper is to show that striking the balance between discourse data and elicited data is not one-size-fits-all, but is dependent on a number of factors. An important outcome of this study, therefore, is to illustrate that language documentation projects need to calibrate data collection methodologies to accommodate the individual differences that each language possesses.

Identifier (URI): http://hdl.handle.net/10125/26138

Language: English

Language (ISO639): eng

Rights: Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported

Table Of Contents: 26138.mp3

OLAC Info

Archive: Language Documentation and Conservation

Description: http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:scholarspace.manoa.hawaii.edu:10125/26138

DateStamp: 2017-05-11

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: McDonnell, Bradley. 2013. Language Documentation and Conservation.
Terms: area_Europe country_GB iso639_eng

http://www.language-archives.org/item.php/oai:scholarspace.manoa.hawaii.edu:10125/26138
Up-to-date as of: Thu Sep 25 0:31:38 EDT 2025

Metadata
Title:		Why documenting different languages necessitates different data
Bibliographic Citation:		McDonnell, Bradley, McDonnell, Bradley; 2013-02-28; Since its inception as a sub-discipline of linguistics, most theoretical issues in Documentary Linguistics have revolved around data collection. A particularly lively discussion has focused on the balance between constructing a corpus of naturally occurring discourse and collecting isolated examples through direct elicitation (cf. Evans 2008, Himmelmann 2012). While some advocate for a strong focus on constructing corpora of naturally occurring discourse (i.e., Himmelmann 2006), others have raised concerns that over-emphasizing the role of discourse data will result in aspects of the language being neglected (i.e., Rhodes et. al. 2006, Chelliah & de Ruese 2011). Using languages of North America, Mithun (2001) demonstrates that an optimal balance between discourse data and elicited data will differ depending on the linguistic domain in question (i.e., lexicon, phonology, grammar). The question remains, however, to what extent is this balance similar for languages with different typological profiles (i.e., isolating, agglutinating, or polysynthetic languages) and in various sociolinguistic situations (i.e., monolingual, bilingual, or multilingual speech communities)? Drawing on the documentation of Besemah, a Malay language of southwest Sumatra, this paper presents case studies of (1) the periphrastic passive, (2) the headless relative clause construction, and (3) the syncretic causative/applicative construction, in order to show that expanding the role of naturalistic discourse data increases both the depth and insight of the grammatical analyses of the language. Like most Malay languages of Sumatra, Besemah is considered an ‘underspecified’ language (cf. Gil 2001), as it lacks grammatical marking of nominal categories like person, number, and case as well as verbal categories like tense and agreement. Additionally, Besemah represents a complex sociolinguistic setting, being polyglossic with two other Malay varieties, Standard Indonesian (the language of government, education, and the media) and Palembang Indonesian (the lingua franca of southern Sumatra). The relative similarity of these Malay varieties coupled with the sociolinguistic complexity with which they are used make it difficult to obtain reliable elicited grammatical examples and/or grammaticality judgments. At the same time, the nature of the ‘underspecified’ grammar requires much less direct elicitation of paradigms, allomorphy, etc. Based on these factors, the aim of this paper is to show that striking the balance between discourse data and elicited data is not one-size-fits-all, but is dependent on a number of factors. An important outcome of this study, therefore, is to illustrate that language documentation projects need to calibrate data collection methodologies to accommodate the individual differences that each language possesses.; Kaipuleohone University of Hawai'i Digital Language Archive;http://hdl.handle.net/10125/26138.
Contributor (speaker):		McDonnell, Bradley
Creator:		McDonnell, Bradley
Date (W3CDTF):		2013-02-28
Description:		Since its inception as a sub-discipline of linguistics, most theoretical issues in Documentary Linguistics have revolved around data collection. A particularly lively discussion has focused on the balance between constructing a corpus of naturally occurring discourse and collecting isolated examples through direct elicitation (cf. Evans 2008, Himmelmann 2012). While some advocate for a strong focus on constructing corpora of naturally occurring discourse (i.e., Himmelmann 2006), others have raised concerns that over-emphasizing the role of discourse data will result in aspects of the language being neglected (i.e., Rhodes et. al. 2006, Chelliah & de Ruese 2011). Using languages of North America, Mithun (2001) demonstrates that an optimal balance between discourse data and elicited data will differ depending on the linguistic domain in question (i.e., lexicon, phonology, grammar). The question remains, however, to what extent is this balance similar for languages with different typological profiles (i.e., isolating, agglutinating, or polysynthetic languages) and in various sociolinguistic situations (i.e., monolingual, bilingual, or multilingual speech communities)? Drawing on the documentation of Besemah, a Malay language of southwest Sumatra, this paper presents case studies of (1) the periphrastic passive, (2) the headless relative clause construction, and (3) the syncretic causative/applicative construction, in order to show that expanding the role of naturalistic discourse data increases both the depth and insight of the grammatical analyses of the language. Like most Malay languages of Sumatra, Besemah is considered an ‘underspecified’ language (cf. Gil 2001), as it lacks grammatical marking of nominal categories like person, number, and case as well as verbal categories like tense and agreement. Additionally, Besemah represents a complex sociolinguistic setting, being polyglossic with two other Malay varieties, Standard Indonesian (the language of government, education, and the media) and Palembang Indonesian (the lingua franca of southern Sumatra). The relative similarity of these Malay varieties coupled with the sociolinguistic complexity with which they are used make it difficult to obtain reliable elicited grammatical examples and/or grammaticality judgments. At the same time, the nature of the ‘underspecified’ grammar requires much less direct elicitation of paradigms, allomorphy, etc. Based on these factors, the aim of this paper is to show that striking the balance between discourse data and elicited data is not one-size-fits-all, but is dependent on a number of factors. An important outcome of this study, therefore, is to illustrate that language documentation projects need to calibrate data collection methodologies to accommodate the individual differences that each language possesses.
Identifier (URI):		http://hdl.handle.net/10125/26138
Language:		English
Language (ISO639):		eng
Rights:		Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
Table Of Contents:		26138.mp3
OLAC Info
Archive:		Language Documentation and Conservation
Description:		http://www.language-archives.org/archive/ldc.scholarspace.manoa.hawaii.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:scholarspace.manoa.hawaii.edu:10125/26138
DateStamp:		2017-05-11
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		McDonnell, Bradley. 2013. Language Documentation and Conservation.
Terms:		area_Europe country_GB iso639_eng