OLAC Record

Title:Chinese Discourse Treebank 0.5
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Zhou, Yuping, et al. Chinese Discourse Treebank 0.5 LDC2014T21. Web Download. Philadelphia: Linguistic Data Consortium, 2014
Contributor:Zhou, Yuping
Lu, Jill
Zhang, Jennifer
Xue, Nianwen
Date (W3CDTF):2014
Date Issued (W3CDTF):2014-10-15
Description:*Introduction* Chinese Discourse Treebank 0.5 was developed at Brandeis University as part of the Chinese Treebank Project and consists of approximately 73,000 words of Chinese newswire text annotated for discourse relations. It follows the lexically grounded approach of the Penn Discourse Treebank (PDTB) (LDC2008T05) with adaptations based on the linguistic and statistical characteristics of Chinese text. Discourse relations are lexically anchored by discourse connectives (e.g., because, but, therefore), which are viewed as predicates that take abstract objects such as propositions, events and states as their arguments. Along with PDTB-style schemes for English, Turkish, Hindi and Czech, Chinese Discourse Treebank provides an additional perspective on how the PDTB approach can be extended for cross-lingual annotation of discourse relations. *Data* Data was selected from the newswire material in Chinese Treebank 8.0 (LDC2013T21), specifically, from Xinhua News Agency stories. There are approximately 5,500 annotation instances. Following the PDTB format, each annotation instance consists of 27 vertical bar delimited fields. The fields specify the attributes of the discourse relation as a whole, as well as the attributes of its two arguments. Not all fields are filled in this release. Filled fields are indicated by a pair of angle brackets; the remaining fields are place holders for future releases. *Samples* Please view this annotation sample and raw sample. *Updates* None at this time.
Extent:Corpus size: 4056 KB
ISBN: 1-58563-692-4
ISLRN: 492-150-006-320-6
DOI: 10.35111/njb6-wb02
Language:Mandarin Chinese
Language (ISO639):cmn
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2014T21
Rights Holder:Portions © 1994-1998, 2006 Xinhua News Agency, © 2001, 2004, 2005, 2007, 2009, 2010, 2013, 2014 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text


Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2014T21
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Zhou, Yuping; Lu, Jill; Zhang, Jennifer; Xue, Nianwen. 2014. Linguistic Data Consortium.
Terms: area_Asia country_CN dcmi_Text iso639_cmn iso639_zho olac_primary_text

Up-to-date as of: Sun Nov 26 6:53:33 EST 2023