OLAC Record: 2008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data

OLAC Record
oai:www.ldc.upenn.edu:LDC2009T05

Metadata

Title: 2008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Przybocki, Mark, Kay Peterson, and Sébastien Bronsart. 2008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data LDC2009T05. Web Download. Philadelphia: Linguistic Data Consortium, 2009

Contributor: Przybocki, Mark

Peterson, Kay

Bronsart, Sébastien

Date (W3CDTF): 2009

Date Issued (W3CDTF): 2009-03-17

Description: *Introduction* NIST MetricsMATR is a series of research challenge events for machine translation (MT) metrology, promoting the development of innovative, even revolutionary, MT metrics that correlate highly with human assessments of MT quality. In this program, participants submit their metrics to the National Institute of Standards and Technology (NIST). NIST runs those metrics on certain held-back test data for which it has human assessments measuring quality and then calculates correlations between the automatic metric scores and the human assessments. This release contains the development data received by participants in NIST Metrics for Machine Translation 2008 Evaluation (MetricsMATR08). Specifically, this corpus is comprised of a subset of the materials used in the NIST Open MT06 evaluation and includes human reference translations, system translations, and human assessments of adequacy and preference. The source data consists of twenty-five Arabic language newswire documents with a total of 249 segments. The data in each segment includes four human reference translations in English and system translations from eight different MT06 machine translation systems. In addition to the data and reference translations, this release inlcudes software tools for evaluation and reporting and documentation describing how the human assessments were obtained and how they are represented in the data. The evaluation plan contains further information and rules on the use of this data. The MetricsMATR program seeks to overcome several drawbacks to the methods employed for the evaluation of MT technology. Currently, automatic metrics have not yet proved able to predict the usefulness and reliability of MT technologies with confidence. Nor have automatic metrics demonstrated that they are meaningful in target languages other than English. Human assessments, however, are expensive, slow, subjective and difficult to standardize. These problems, and the need to overcome them through the development of improved automatic (or even semi-automatic) metrics, have been a constant point of discussion at past NIST MT evaluation events. MetricsMATR aims to provide a platform to address these shortcomings. Specifically, the goals of MetricsMATR are: * To inform other MT technology evaluation campaigns and conferences with regard to improved metrology. * To establish an infrastructure that encourages the development of innovative metrics. * To build a diverse community that will bring new perspectives to MT metrology research. * To provide a forum for MT metrology discussion and for establishing future directions of MT metrology. *Data* The MetricsMATR08 development data set released here is reflective of the test data set only to a degree; the evaluation data set contains more varied data -- from more genres, more source languages, more systems and different evaluations -- than this development data set. There are also more types of human assessments for the test data. The MetricsMATR08 test data remains unseen to allow for repeated use as test data. The software used for obtaining the human judgments included in this data set is the same software used for the NIST Open MT08 human assessments. It includes a description of the adequacy and preference assessment tasks and the instructions given to the judges. All segments assessed were judged by two independent judges. Adequacy judgments were performed for all segments of each document. Preference judgments were performed for the first four segments of each document such that full pair-wise comparisons between all eight MT systems were obtained. All judgments were performed against only one reference translation. The score represents an adjudicated score over the two individual judgments. The official results of MetricsMATR08 on the test data for the metrics submitted to MetricsMATR08 are publicly available. NIST performed the same analyses on the MetricsMATR08 development data after the evaluation. These results are not publicly available, but will likely be available on request in the future by contacting mt_poc@nist.gov. *Samples* For an example of the data in this release, please examine these sample scores and judgments.

Extent: Corpus size: 2273 KB

Identifier: LDC2009T05

https://catalog.ldc.upenn.edu/LDC2009T05

ISBN: 1-58563-508-1

ISLRN: 415-470-503-471-5

DOI: 10.35111/d2bd-bd55

Language: English

Standard Arabic

Arabic

Language (ISO639): eng

arb

ara

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2009T05

Rights Holder: Portions © 2006 Agence France-Presse, © 2006, 2008, 2009 Trustees of the University of Pennsylvania

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2009T05

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Przybocki, Mark; Peterson, Kay; Bronsart, Sébastien. 2009. Linguistic Data Consortium.
Terms: area_Asia area_Europe country_GB country_SA dcmi_Text iso639_ara iso639_arb iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2009T05
Up-to-date as of: Wed Oct 29 7:01:06 EDT 2025

Metadata
Title:		2008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Przybocki, Mark, Kay Peterson, and Sébastien Bronsart. 2008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data LDC2009T05. Web Download. Philadelphia: Linguistic Data Consortium, 2009
Contributor:		Przybocki, Mark
		Peterson, Kay
		Bronsart, Sébastien
Date (W3CDTF):		2009
Date Issued (W3CDTF):		2009-03-17
Description:		Introduction NIST MetricsMATR is a series of research challenge events for machine translation (MT) metrology, promoting the development of innovative, even revolutionary, MT metrics that correlate highly with human assessments of MT quality. In this program, participants submit their metrics to the National Institute of Standards and Technology (NIST). NIST runs those metrics on certain held-back test data for which it has human assessments measuring quality and then calculates correlations between the automatic metric scores and the human assessments. This release contains the development data received by participants in NIST Metrics for Machine Translation 2008 Evaluation (MetricsMATR08). Specifically, this corpus is comprised of a subset of the materials used in the NIST Open MT06 evaluation and includes human reference translations, system translations, and human assessments of adequacy and preference. The source data consists of twenty-five Arabic language newswire documents with a total of 249 segments. The data in each segment includes four human reference translations in English and system translations from eight different MT06 machine translation systems. In addition to the data and reference translations, this release inlcudes software tools for evaluation and reporting and documentation describing how the human assessments were obtained and how they are represented in the data. The evaluation plan contains further information and rules on the use of this data. The MetricsMATR program seeks to overcome several drawbacks to the methods employed for the evaluation of MT technology. Currently, automatic metrics have not yet proved able to predict the usefulness and reliability of MT technologies with confidence. Nor have automatic metrics demonstrated that they are meaningful in target languages other than English. Human assessments, however, are expensive, slow, subjective and difficult to standardize. These problems, and the need to overcome them through the development of improved automatic (or even semi-automatic) metrics, have been a constant point of discussion at past NIST MT evaluation events. MetricsMATR aims to provide a platform to address these shortcomings. Specifically, the goals of MetricsMATR are: * To inform other MT technology evaluation campaigns and conferences with regard to improved metrology. * To establish an infrastructure that encourages the development of innovative metrics. * To build a diverse community that will bring new perspectives to MT metrology research. * To provide a forum for MT metrology discussion and for establishing future directions of MT metrology. Data The MetricsMATR08 development data set released here is reflective of the test data set only to a degree; the evaluation data set contains more varied data -- from more genres, more source languages, more systems and different evaluations -- than this development data set. There are also more types of human assessments for the test data. The MetricsMATR08 test data remains unseen to allow for repeated use as test data. The software used for obtaining the human judgments included in this data set is the same software used for the NIST Open MT08 human assessments. It includes a description of the adequacy and preference assessment tasks and the instructions given to the judges. All segments assessed were judged by two independent judges. Adequacy judgments were performed for all segments of each document. Preference judgments were performed for the first four segments of each document such that full pair-wise comparisons between all eight MT systems were obtained. All judgments were performed against only one reference translation. The score represents an adjudicated score over the two individual judgments. The official results of MetricsMATR08 on the test data for the metrics submitted to MetricsMATR08 are publicly available. NIST performed the same analyses on the MetricsMATR08 development data after the evaluation. These results are not publicly available, but will likely be available on request in the future by contacting mt_poc@nist.gov. Samples For an example of the data in this release, please examine these sample scores and judgments.
Extent:		Corpus size: 2273 KB
Identifier:		LDC2009T05
		https://catalog.ldc.upenn.edu/LDC2009T05
		ISBN: 1-58563-508-1
		ISLRN: 415-470-503-471-5
		DOI: 10.35111/d2bd-bd55
Language:		English
		Standard Arabic
		Arabic
Language (ISO639):		eng
		arb
		ara
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2009T05
Rights Holder:		Portions © 2006 Agence France-Presse, © 2006, 2008, 2009 Trustees of the University of Pennsylvania
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2009T05
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Przybocki, Mark; Peterson, Kay; Bronsart, Sébastien. 2009. Linguistic Data Consortium.
Terms:		area_Asia area_Europe country_GB country_SA dcmi_Text iso639_ara iso639_arb iso639_eng olac_primary_text