OLAC Record
oai:lindat.mff.cuni.cz:11372/LRT-1671

Metadata
Title:WMT16 Tuning Shared Task Models (Czech-to-English)
Bibliographic Citation:http://hdl.handle.net/11372/LRT-1671
Creator:Kamran, Amir
Jawaid, Bushra
Bojar, Ondřej
Stanojevic, Milos
Date (W3CDTF):2016-03-22T12:05:39Z
Date Available:2016-03-22T12:05:39Z
Description:The item contains models to tune for the WMT16 Tuning shared task for Czech-to-English. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sentences longer than 60 words and shorter than 4 words are removed before training. Alignment is done using fast_align (https://github.com/clab/fast_align) and the standard Moses pipeline is used for training. Two 5-gram language models are trained using KenLM: one only using the CzEng English data and the other is trained using all available English mono data for WMT except Common Crawl. Also included are two lexicalized bidirectional reordering models, word based and hierarchical, with msd conditioned on both source and target of processed CzEng.
Identifier (URI):http://hdl.handle.net/11372/LRT-1671
Language:Czech
English
Language (ISO639):ces
eng
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
University of Amsterdam, ILLC
Rights:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
http://creativecommons.org/licenses/by-nc-sa/4.0/
Subject:WMT16
machine translation
tuning
baseline models
shared task
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11372/LRT-1671
DateStamp:  2017-11-09
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Kamran, Amir; Jawaid, Bushra; Bojar, Ondřej; Stanojevic, Milos. 2016. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: area_Europe country_CZ country_GB dcmi_Text iso639_ces iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11372/LRT-1671
Up-to-date as of: Sun Jul 28 14:40:53 EDT 2019