OLAC Record
oai:www.ldc.upenn.edu:LDC2005T09

Metadata
Title:ACE 2004 Multilingual Training Corpus
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Mitchell, Alexis, et al. ACE 2004 Multilingual Training Corpus LDC2005T09. Web Download. Philadelphia: Linguistic Data Consortium, 2005
Contributor:Mitchell, Alexis
Strassel, Stephanie
Huang, Shudong
Zakhary, Ramez
Date (W3CDTF):2005
Date Issued (W3CDTF):2005-03-15
Description:*Introduction* This file contains documentation on the ACE 2004 Multilingual Training Corpus, Linguistic Data Consortium (LDC) catalog number LDC2005T09 and ISBN 1-58563-334-8. This publication contains the complete set of English, Arabic and Chinese training data for the 2004 Automatic Content Extraction (ACE) technology evaluation. The corpus consists of data of various types annotated for entities and relations and was created by Linguistic Data Consortium with support from the ACE Program, with additional assistance from the DARPA TIDES (Translingual Information Detection, Extraction and Summarization) Program. This data was previously distributed as an e-corpus (LDC2004E17) to participants in the 2004 ACE evaluation. The objective of the ACE program is to develop automatic content extraction technology to support automatic processing of human language in text form. In September 2004, sites were evaluated on system performance in six areas: Entity Detection and Recognition (EDR), Entity Mention Detection (EMD), EDR Co-reference, Relation Detection and Recognition (RDR), Relation Mention Detection (RMD), and RDR given reference entities. All tasks were evaluated in three languages: English, Chinese and Arabic. The current publication consists of the official training data for these evaluation tasks. A seventh evaluation area, Timex Detection and Recognition, is supported by the ACE Time Normalization (TERN) 2004 English Training Data Corpus (LDC2005T907). The TERN corpus source data largely overlaps with the English source data contained in the current release. A complete description of the ACE 2004 Evaluation can be found on the ACE Program website maintained by the National Institute of Standards and Technology (NIST): http://www.nist.gov/speech/tests/ace/ For more information about linguistic resources for the ACE program, including annotation guidelines, task definitions, free annotation tools and other documentation, please visit LDC's ACE website: http://www.ldc.upenn.edu/Projects/ACE *Samples* The files listed below are samples from the English data. They should provide a good example of the material in this corpus. * Chinese Treebank * Fisher Transcripts * Broadcast News The World is a co-production of Public Radio International and the British Broadcasting Corporation and is produced at WGBH Boston.
Extent:Corpus size: 366008 KB
Identifier:LDC2005T09
https://catalog.ldc.upenn.edu/LDC2005T09
ISBN: 1-58563-334-8.
ISLRN: 789-870-824-708-5
OAI: oai:www.ldc.upenn.edu:LDC2005T09
Language:English
Standard Arabic
Mandarin Chinese
Language (ISO639):eng
arb
cmn
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2005T09
Rights Holder:Portions (c) 1994-1998, 2000 Xinhua News Agency (c) 1997 Department of Information Services, Hong Kong Special Administrative Region (c) 1996-1998 & 2000-2001 Sinorama Magazine (c) 2000 Agence France-Presse, (c) 2000 New York Times, (c) 2000 Associated Press Newswire, (c) 2000 SPH AsiaOne, Ltd. (Zaobao), (c) 2000 An-Nahar, (c) 2000 Al-Hayat, (c) 2000 Nile TV, (c) 2000 Cable News Network, All Rights Reserved, (c) 2000 American Broadcasting Corporation, (c) 2000 National Broadcasting Company, (c) 2000 China National Radio, (c) 2000 China Television System, (c) 2000 China Central TV, (c) 2000 China Broadcasting System, (c) 2000 Public Radio International. The World is a co-production of Public Radio International and the British Broadcasting Corporation and is produced at WGBH Boston, (c) 2005 Trustees of the University of Pennsylvania.

The World is a co-production of Public Radio International and the British Broadcasting Corporation and is produced at WGBH Boston.
Type (DCMI):Text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2005T09
DateStamp:  2014-09-05
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Mitchell, Alexis; Strassel, Stephanie; Huang, Shudong; Zakhary, Ramez. 2005. Linguistic Data Consortium.
Terms: area_Asia area_Europe country_CN country_GB country_SA dcmi_Text iso639_arb iso639_cmn iso639_eng


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2005T09
Up-to-date as of: Tue Sep 23 0:16:36 EDT 2014