![]() |
OLAC Record oai:lindat.mff.cuni.cz:11234/1-4703 |
| Metadata | ||
| Title: | Machine Translation Testsuite for Gender-Consistent Translation | |
| Bibliographic Citation: | http://hdl.handle.net/11234/1-4703 | |
| Creator: | Aires, João Paulo | |
| Date (W3CDTF): | 2022-04-20T11:35:13Z | |
| Date Available: | 2022-04-20T11:35:13Z | |
| Description: | Document-level testsuite for evaluation of gender translation consistency. Our Document-Level test set consists of selected English documents from the WMT21 newstest annotated with gender information. Czech unnanotated references are also added for convenience. We semi-automatically annotated person names and pronouns to identify the gender of these elements as well as coreferences. Our proposed annotation consists of three elements: (1) an ID, (2) an element class, and (3) gender. The ID identifies a person's name and its occurrences (name and pronouns). The element class identifies whether the tag refers to a name or a pronoun. Finally, the gender information defines whether the element is masculine or feminine. We performed a series of NLP techniques to automatically identify person names and coreferences. This initial process resulted in a set containing 45 documents to be manually annotated. Thus, we started a manual annotation of these documents to make sure they are correctly tagged. See README.md for more details. | |
| Identifier (URI): | http://hdl.handle.net/11234/1-4703 | |
| Language: | English | |
| Czech | ||
| Language (ISO639): | eng | |
| ces | ||
| Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
| Rights: | Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) | |
| http://creativecommons.org/licenses/by-nc/4.0/ | ||
| Subject: | machine translation | |
| testsuite | ||
| evaluation | ||
| gender | ||
| Type: | corpus | |
| Type (DCMI): | Text | |
| Type (OLAC): | primary_text | |
OLAC Info |
||
| Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
| Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
| GetRecord: | OAI-PMH request for OLAC format | |
| GetRecord: | Pre-generated XML file | |
OAI Info |
||
| OaiIdentifier: | oai:lindat.mff.cuni.cz:11234/1-4703 | |
| DateStamp: | 2022-04-20 | |
| GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
| Citation: | Aires, João Paulo. 2022. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
| Terms: | area_Europe country_CZ country_GB dcmi_Text iso639_ces iso639_eng olac_primary_text | |