OLAC Record
oai:www.ldc.upenn.edu:LDC2021T05

Metadata
Title:Penn Discourse Treebank Version 2.0 - German Translation
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Sluyter-Gaethje, Henny, Peter Bourgonje, and Manfred Stede. Penn Discourse Treebank Version 2.0 - German Translation LDC2021T05. Web Download. Philadelphia: Linguistic Data Consortium, 2021
Contributor:Sluyter-Gaethje, Henny
Bourgonje, Peter
Stede, Manfred
Date (W3CDTF):2021
Date Issued (W3CDTF):2021-02-15
Description:*Introduction* Penn Discourse Treebank Version 2.0 - German Translation was developed at the University of Potsdam's Applied Computational Linguistics group and consists of approximately one million tokens derived from Penn Discourse Treebank Version 2.0 (LDC2008T05). This data was translated into German and annotated for shallow discourse relations in the financial news domain. The aim of the Penn Discourse Treebank (PDTB) project is to annotate the Wall Street Journal text in Treebank-2 with discourse relations. PDTB2-German is based on a subset of PDTB2.0 used in the 2016 CoNLL Shared Task on Multilingual Shallow Discourse Parsing. *Data* Data is in CoNLL format. Text was automatically translated into German with deepL, and projections of the annotations using word alignments were produced with GIZA++. See the included documentation for more information on the relation annotations. Source text and CoNLL format annotations are each presented in their own tab separated plain text file, encoded in UTF-8. *Samples* Please view this source sample (TXT) and annotation sample (TXT). *Updates* None at this time.
Extent:Corpus size: 27771 KB
Identifier:LDC2021T05
https://catalog.ldc.upenn.edu/LDC2021T05
ISBN: 1-58563-955-9
ISLRN: 142-519-062-218-1
DOI: 10.35111/x7qb-7h47
Language:German
Language (ISO639):deu
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2021T05
Rights Holder:Portions © 1987-1989 Dow Jones & Company, Inc., © 2008, 2012, 2021 The Penn Discourse Treebank Group, © 2021 Manfred Stede, © 1993-1995, 2008, 2012, 2021 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2021T05
DateStamp:  2023-10-31
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Sluyter-Gaethje, Henny; Bourgonje, Peter; Stede, Manfred. 2021. Linguistic Data Consortium.
Terms: area_Europe country_DE dcmi_Text iso639_deu olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2021T05
Up-to-date as of: Mon Mar 25 7:21:13 EDT 2024