OLAC Record
oai:lindat.mff.cuni.cz:11234/1-4703

Metadata
Title:Machine Translation Testsuite for Gender-Consistent Translation
Bibliographic Citation:http://hdl.handle.net/11234/1-4703
Creator:Aires, João Paulo
Date (W3CDTF):2022-04-20T11:35:13Z
Date Available:2022-04-20T11:35:13Z
Description:Document-level testsuite for evaluation of gender translation consistency. Our Document-Level test set consists of selected English documents from the WMT21 newstest annotated with gender information. Czech unnanotated references are also added for convenience. We semi-automatically annotated person names and pronouns to identify the gender of these elements as well as coreferences. Our proposed annotation consists of three elements: (1) an ID, (2) an element class, and (3) gender. The ID identifies a person's name and its occurrences (name and pronouns). The element class identifies whether the tag refers to a name or a pronoun. Finally, the gender information defines whether the element is masculine or feminine. We performed a series of NLP techniques to automatically identify person names and coreferences. This initial process resulted in a set containing 45 documents to be manually annotated. Thus, we started a manual annotation of these documents to make sure they are correctly tagged. See README.md for more details.
Identifier (URI):http://hdl.handle.net/11234/1-4703
Language:English
Czech
Language (ISO639):eng
ces
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
http://creativecommons.org/licenses/by-nc/4.0/
Subject:machine translation
testsuite
evaluation
gender
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-4703
DateStamp:  2022-04-20
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Aires, João Paulo. 2022. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: area_Europe country_CZ country_GB dcmi_Text iso639_ces iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-4703
Up-to-date as of: Thu Oct 5 0:43:10 EDT 2023