OLAC Record
oai:www.ldc.upenn.edu:LDC2015T10

Metadata
Title:RST Signalling Corpus
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Das, Debopam, Maite Taboada, and Paul McFetridge. RST Signalling Corpus LDC2015T10. Web Download. Philadelphia: Linguistic Data Consortium, 2015
Contributor:Das, Debopam
Taboada, Maite
McFetridge, Paul
Date (W3CDTF):2015
Date Issued (W3CDTF):2015-06-15
Description:*Introduction* RST Signalling Corpus was developed at Simon Fraser University and contains annotations for signalling information added to RST Discourse Treebank (LDC2002T07). RST Discourse Treebank (RST-DT) is a collection of English news texts annotated for rhetorical relations under the RST (Rhetorical Structure Theory) framework. In RST Signalling Corpus, information about textual signals -- such as although, because, thus -- and signals such as tense, lexical chains or punctuation were added as an annotation layer to examine how rhetorical relations are signalled in discourse. *Data* The source data consists of 385 Wall Street Journal news articles from the Penn Treebank annotated for rhetorical relations in RST Discourse Treebank. As in RST-DT, the data in this release is divided into a training set (347 articles) and a test set (38 articles). The signalling annotation in this data set was performed using the UAM CorpusTool version 2.8.12. Files are presented as UTF-8 encoded XML and plain text. The corpus is divided into three annotation sub-directories: training, test and full. All sub-directories include source, metadata, signalling annotation, and dtd files. *Samples* Please view the following samples: * Metadata Sample * Signal Sample * Text Sample *Updates* None at this time.
Extent:Corpus size: 38176 KB
Identifier:LDC2015T10
https://catalog.ldc.upenn.edu/LDC2015T10
ISBN: 1-58563-719-X
ISLRN: 256-234-245-630-4
DOI: 10.35111/5sm9-m096
Language:English
Language (ISO639):eng
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2015T10
Rights Holder:Portions © 1987-1989 Dow Jones & Company, Inc., © 2015 Depobam Das, © 2015 Maite Taboada, © 1995, 1999, 2002, 2015 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2015T10
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Das, Debopam; Taboada, Maite; McFetridge, Paul. 2015. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2015T10
Up-to-date as of: Mon Mar 25 7:20:44 EDT 2024