OLAC Record
oai:www.ldc.upenn.edu:LDC2009T10

Metadata
Title:Language Understanding Annotation Corpus
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Diab, Mona, et al. Language Understanding Annotation Corpus LDC2009T10. Web Download. Philadelphia: Linguistic Data Consortium, 2009
Contributor:Diab, Mona
Dorr, Bonnie
Levin, Lori
Mitamura, Teruko
Passonneau, Rebecca
Rambow, Owen
Ramshaw, Lance
Date (W3CDTF):2009
Date Issued (W3CDTF):2009-03-17
Description:*Introduction* The Language Understanding Annotation Corpus, Linguistic Data Consortium (LDC) catalog number LDC2009T10 and isbn 1-58563-513-8, emerged from a series of interdisciplinary meetings on semantics and pragmatics hosted by the Human Language Technology Center of Excellence at Johns Hopkins University. The participants were researchers from BBN Technologies, Carnegie Mellon University and Columbia University who were developing representations of text semantics, machine translation and summarization systems. The resulting corpus contains over 9000 words of English text (6949 words) and Arabic text (2183 words) annotated for committed belief, event and entity coreference, dialog acts and temporal relations. The source materials were chosen from various genres to represent "informal input," that is, text that contains colloquial forms. The documents in the corpus include excerpts from newswire stories, telephone conversation transcripts, emails, contracts and written instructions. The problem was modeled as an extended exercise in extracting information elements from a "document" (that is, from discrete language records in written or spoken forms). The goal was to answer two broad questions: * What are the elements of knowledge that can be derived from a document? * Can the representation, and hence, the annotation, be laid out in terms of iterative layers, the accumulation of which would represent the sum of the knowledge? The annotations attempted to resolve these questions in the following ways: * Belief/Opinion/Confidence. Committed belief annotation distinguishes between statements which assert belief or opinion, those which contain speculation, and statements which convey facts or otherwise do not convey belief. The goal is to be able to determine automatically from a given text what beliefs can be ascribed to the author and with what strength the author holds those beliefs. * Dialog Acts. Dialog act annotation seeks to determine the forward and backward links between pairs of dialog acts. * Coreference (entities and events). Event coreferences indicate which events are related to other events at the document level. Entity relations within these related events provide further information about e.g., the main actors, targets and causes of the events. * Temporal relations. Temporal annotations mark the temporal relationship between the different events and time anchors mentioned in a document, that is, it highlights what the text is saying about the time line of time-mentions.
Extent:Corpus size: 5447 KB
Identifier:LDC2009T10
https://catalog.ldc.upenn.edu/LDC2009T10
ISBN: 1-58563-513-8
ISLRN: 775-964-514-342-7
OAI: oai:www.ldc.upenn.edu:LDC2009T10
Language:English
Standard Arabic
Arabic
Language (ISO639):eng
arb
ara
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Rights Holder:Portions © 2000, 2002 Agence France Presse, © 2000 Al Hayat, © 2000 The Associated Press, © 2003, 2005 Cable News Network, LP, LLLP, © 1987-1989 Dow Jones & Company, Inc., © 2003 Indiana Center for Intercultural Communication, © 2000 New York Times, © 2000 Xinhua News Agency, © 1992, 1993, 1997, 2009 Trustees of the University of Pennsylvania
Type (DCMI):Text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2009T10
DateStamp:  2014-07-17
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Diab, Mona; Dorr, Bonnie; Levin, Lori; Mitamura, Teruko; Passonneau, Rebecca; Rambow, Owen; Ramshaw, Lance. 2009. Linguistic Data Consortium.
Terms: area_Asia area_Europe country_GB country_SA dcmi_Text iso639_ara iso639_arb iso639_eng


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2009T10
Up-to-date as of: Tue Sep 23 0:17:32 EDT 2014