OLAC Record
oai:www.ldc.upenn.edu:LDC2005T05

Metadata
Title:Multiple-Translation Arabic (MTA) Part 2
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Ma, Xiaoyi. Multiple-Translation Arabic (MTA) Part 2 LDC2005T05. Web Download. Philadelphia: Linguistic Data Consortium, 2005
Contributor:Ma, Xiaoyi
Date (W3CDTF):2005
Date Issued (W3CDTF):2005-02-15
Description:*Introduction* Multiple-Translation Arabic (MTA) Part 2, Linguistic Data Consortium (LDC) catalog number LDC2005T05 and ISBN 1-58563-328-3, was produced by LDC. To support the development of automatic means for evaluating translation quality, LDC was sponsored to solicit four sets of human translations for a single set of Arabic source materials. LDC was also asked to produce translations from various commercial-off-the-shelf-systems (COTS, including commercial Machine Translation (MT) systems as well as MT systems available on the Internet). This corpus contains two sets of COTS outputs and one output set from a TIDES 2003 MT Evaluation participant, which is representative for the state-of-the-art research systems. To determine if automatic evaluation systems such as BLEU track human assessment, LDC also performed human assessment on the two COTS outputs and the TIDES research system. The corpus includes the assessment results for one of the two COTS systems, the assessment result for the TIDES research system, and the specifications used for conducting the assessments. *Source Data Selection:* * Xinhua News Service (Xinhua): 50 news stories * Agence France Presse (AFP): 50 news stories (total: 100 stories) There are 100 source files and 700 translation files. All source data were drawn from January and February 2003 collection of Xinhua Arabic data and AFP Arabic data. The story selection from the two newswire collections was controlled by story length: all selected stories contain between 700 and 1,500 Arabic characters. The overall count of Arabic words (excluding markup), by source, is shown in the following table: * AFP 7,528 * Xinhua 7,551 * ------------- * 15,079 *Samples* For samples from this corpus, see this screen shot of the Arabic source file and its translation. Please contact Xiaoyi Ma with any questions regarding this corpus.
Identifier:LDC2005T05
https://catalog.ldc.upenn.edu/LDC2005T05
ISBN: 1-58563-328-3
ISLRN: 136-463-995-609-6
OAI: oai:www.ldc.upenn.edu:LDC2005T05
Language:English
Standard Arabic
Language (ISO639):eng
arb
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2005T05
Rights Holder:Portions © 2003 Xinhua News Agency, © 2003 Agence France Press, © 2004-2005 Trustees of the University of Pennsylvania
Type (DCMI):Text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2005T05
DateStamp:  2014-07-17
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Ma, Xiaoyi. 2005. Linguistic Data Consortium.
Terms: area_Asia area_Europe country_GB country_SA dcmi_Text iso639_arb iso639_eng


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2005T05
Up-to-date as of: Mon Nov 24 0:32:30 EST 2014