OLAC Record
oai:www.ldc.upenn.edu:LDC2005T33

Metadata
Title:BBN Pronoun Coreference and Entity Type Corpus
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Weischedel, Ralph, and Ada Brunstein. BBN Pronoun Coreference and Entity Type Corpus LDC2005T33. Web Download. Philadelphia: Linguistic Data Consortium, 2005
Contributor:Weischedel, Ralph
Brunstein, Ada
Date (W3CDTF):2005
Date Issued (W3CDTF):2005-09-20
Description:*Introduction* BBN Pronoun Coreference and Entity Type Corpus was developed by BBN Technologies (BBN) and contains approximately 24,000 pronoun coreferences as well as entity and numeric annotation for approximately 2,300 documents. This publication supplements the one million words of Wall Street Journal texts in Penn's Treebank-2 (LDC95T7). The corpus contains stand-off annotation of pronoun coreference, indicated by sentence and token numbers, as well as annotation of a variety of entity and numeric types. All annotation was done by hand at BBN using proprietary annotation tools. This corpus was developed by BBN to support the ACE and AQUAINT programs. *Data* The corpus contains two components: * Pronoun coreference: Stand-off annotation of pronoun coreference of the WSJ corpus is provided in a single file. Pronouns and antecedents are indexed by sentence and token numbers. * Entity types: The corpus includes annotation of 12 named entity types (Person, Facility, Organization, GPE, Location, Nationality, Product, Event, Work of Art, Law, Language, and Contact-Info), nine nominal entity types (Person, Facility, Organization, GPE, Product, Plant, Animal, Substance, Disease and Game), and seven numeric types (Date, Time, Percent, Money, Quantity, Ordinal and Cardinal). Several of these types are further divided into subtypes. Annotation for a total of 64 subtypes is provided. *Samples* For an example of the data in this corpus, please examing the following samples: * Entity and Numberic Annotation (QA) * Reference Sentence (TXT) * Pronoun Corefence (TXT) *Updates* None at this time.
Extent:Corpus size: 32768 KB
Identifier:LDC2005T33
https://catalog.ldc.upenn.edu/LDC2005T33
ISBN: 1-58563-362-3
ISLRN: 375-520-999-436-0
DOI: 10.35111/9fx9-gz10
Language:English
Language (ISO639):eng
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2005T33
Rights Holder:Portions © 1989 Wall Street Journal, © 2002 BBNT Solutions LLC., © 2005 Trustees of the University of Pennslyvania.
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2005T33
DateStamp:  2021-07-19
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Weischedel, Ralph; Brunstein, Ada. 2005. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2005T33
Up-to-date as of: Mon Mar 25 7:20:08 EDT 2024