OLAC Record
oai:catalogue.elra.info:ELRA-E0020

Metadata
Title:CESTA Evaluation Package
Abstract:The CESTA Evaluation Package was produced within the French national project CESTA (Evaluation of MT systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The CESTA project enabled to carry out a campaign for the evaluation of machine translation technologies. This package includes the material that was used for the CESTA evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system. The campaign is distributed over two actions: evaluation on a non restrictive vocabulary, evaluation on a specialised domain (evaluation after terminology enrichment).
Access Rights:Rights available for: Evaluation Use
Date Available (W3CDTF):2007-06-28
Date Issued (W3CDTF):2007-06-28
Date Modified (W3CDTF):2007-06-28
Description:Written Corpora
The CESTA Evaluation Package was produced within the French national project CESTA (Evaluation of MT systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The CESTA project enabled to carry out a campaign for the evaluation of machine translation systems with English and Arabic texts translated into French. This package includes the material that was used for the CESTA evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system. The campaign is distributed over two actions: 1) Evaluation on a restrictive vocabulary: an evaluation protocol was introduced and was dedicated to two translation directions: English into French and Arabic into French. 2) Evaluation on a specialised domain (evaluation after terminology enrichment): it consists in observing the impact of the systems adaptation to the specialised domain. The CESTA evaluation package contains the following data and tools: 1) Test run data: - English-French parallel corpus: 21,590 English words and 23,554 French words extracted from the Official Journal of the European Communities, 1993, Written Questions section of the European Parliament, from the MLCC corpus (catalogue ref. ELRA-W0023). - Arabic-French parallel corpus: 15,603 Arabic words and 18,257 French words extracted from Le Monde Diplomatique 2002 (catalogue ref. ELRA-W0036). 2) First campaign data: - English-French parallel corpus: test corpus of 20,658 English words and 22,774 French words extracted from the Official Journal of the European Communities, 1993, Written Questions section of the European Parliament, from the MLCC corpus (catalogue ref. ELRA-W0023). Four translations in French are available. - Arabic-French parallel corpus: test corpus of 23,763 Arabic words and 28,664 French words extracted from Le Monde Diplomatique 2002 and 2003 (catalogue r?f. ELRA-W0036). Four translations in French are available. 3) Second campaign data: - English-French parallel corpus: adaptation corpus of 19,383 English words and 22,741 French words, extracted from the Sant? Canada website. Translation in French is available. - Arabic-French parallel corpus: adaptation corpus of 19,560 Arabic words and 22,533 French words extracted from the UNICEF, WHO and FHI websites. Translation in French is available. - English-French parallel corpus: test corpus of 18,880 English words and 23,411 French words, extracted from the Sant? Canada website. Four translations in French are available. - Arabic-French parallel corpus: test corpus of 17,305 Arabic words and 20,885 French words extracted from the UNICEF, WHO and FHI websites. Four translations in French are available. 4) Anonymised submissions of systems and human judgments with adequacy and fluency annotations. 5) French corpus of 13,000 words with adequacy and fluency tags. 6) Evaluation infrastructure for human judgments and for automatic evaluation. 7) Project documentation and publications. A description of the project is available at the following address: http://www.technolangue.net/article.php3?id_article=199 (in French language)
Identifier:ELRA-E0020
http://catalog.elra.info/product_info.php?products_id=994
Language:English
French
Arabic
Language (ISO639):eng
fra
ara
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-E0020
DateStamp:  2007-06-28
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2007. ELRA (European Language Resources Association).
Terms: area_Europe country_FR country_GB dcmi_Text iso639_ara iso639_eng iso639_fra olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-E0020
Up-to-date as of: Tue Sep 5 1:34:21 EDT 2017