OLAC Record
oai:www.ldc.upenn.edu:LDC2009T12

Metadata
Title:2008 CoNLL Shared Task Data
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Surdeanu, Mihai, et al. 2008 CoNLL Shared Task Data LDC2009T12. Web Download. Philadelphia: Linguistic Data Consortium, 2009
Contributor:Surdeanu, Mihai
Johansson, Richard
Marquez, Lluis
Meyers, Adam
Nivre, Joakim
Date (W3CDTF):2009
Date Issued (W3CDTF):2009-05-22
Description:*Introduction* 2008 CoNLL Shared Task Data, Linguistic Data Consortium (LDC) catalog number LDC2009T12 and isbn 1-58563-505-7, contains the the trial corpus, training corpus, development and test data for the 2008 CoNLL (Conference on Computational Natural Language Learning) Shared Task Evaluation. The 2008 Shared Task developed syntactic dependency annotations, including information such as named-entity boundaries and the semantic dependencies model roles of both verbal and nominal predicates. The materials in the Shared Task data consist of excerpts from the following corpora: Treebank-3 LDC99T42, BBN Pronoun Coreference and Entity Type Corpus LDC2005T33, Proposition Bank I LDC2004T14 (PropBank) and NomBank v 1.0 LDC2008T23. The Conference on Computational Natural Language Learning (CoNLL) is accompanied every year by a shared task intended to promote natural language processing applications and evaluate them in a standard setting. The 2004 and 2005 CoNLL shared tasks were dedicated to semantic role labeling (SRL) in a monolingual setting (English). In 2006 and 2007, the shared tasks were devoted to the parsing of syntactic dependencies and used corpora from up to thirteen languages. The 2008 shared task employed a unified dependency-based formalism and merged the task of syntactic dependency parsing and the task of identifying semantic arguments and labeling them with semantic roles. LDC has also released the following CoNLL Shared Task data sets: * 2006 CoNLL Shared Task - Ten Languages (LDC2015T11) * 2006 CoNLL Shared Task - Arabic & Czech (LDC2015T12) * 2009 CoNLL Shared Task Part 1 (LDC2012T03) * 2009 CoNLL Shared Task Part 2 (LDC2012T04) * 2015-2016 CoNLL Shared Task (LDC2017T13) *Data* The 2008 shared task was divided into three subtasks: * parsing syntactic dependencies * identification and disambiguation of semantic predicates * identification of arguments and assignment of semantic roles for each predicate Several objectives were addressed in this shared task: * SRL was performed and evaluated using a dependency-based representation for both syntactic and semantic dependencies. While SRL on top of a dependency treebank has been addressed before, the approach of the 2008 Shared Task was characterized by the following novelties: * The constituent-to-dependency conversion strategy transformed all annotated semantic arguments in PropBank and NomBank v 1.0, not just a subset; * The annotations addressed propositions centered around both verbal (PropBank) and nominal (NomBank) predicates. * Based on the observation that a richer set of syntactic dependencies improves semantic processing, the syntactic dependencies modeled are more complex than the ones used in the previous CoNLL shared tasks. For example, the corpus includes apposition links, dependencies derived from named entity (NE) structures, and better modeling of long-distance grammatical relations. * A practical framework is provided for the joint learning of syntactic and semantic dependencies. Due to the complexity of the 2008 shared task, only a single language, English, was used. *Samples* An example of the shared task annotations is provided below
Extent:Corpus size: 84377 KB
Identifier:LDC2009T12
https://catalog.ldc.upenn.edu/LDC2009T12
ISBN: 1-58563-505-7
ISLRN: 757-340-046-619-2
DOI: 10.35111/mad1-yd84
Language:English
Language (ISO639):eng
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2009T12
Rights Holder:Portions © 1987-1989 Dow Jones & Company, Inc., © 2002 BBNT Solutions, LLC, © 1995, 1999, 2005, 2008, 2009 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2009T12
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Surdeanu, Mihai; Johansson, Richard; Marquez, Lluis; Meyers, Adam; Nivre, Joakim. 2009. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2009T12
Up-to-date as of: Mon Mar 25 7:20:22 EDT 2024