Title:English-Urdu Religious Parallel Corpus
Bibliographic Citation:http://hdl.handle.net/11234/1-2582
Creator:Jawaid, Bushra
Zeman, Daniel
Date (W3CDTF):2018-01-05T15:38:19Z
Date Available:2018-01-05T15:38:19Z
Description:English-Urdu parallel corpus is a collection of religious texts (Quran, Bible) in English and Urdu language with sentence alignments. The corpus can be used for experiments with statistical machine translation. Our modifications of crawled data include but are not limited to the following: 1- Manually corrected sentence alignment of the corpora. 2- Our data split (training-development-test) so that our published experiments can be reproduced. 3- Tokenization (optional, but needed to reproduce our experiments). 4- Normalization (optional) of e.g. European vs. Urdu numerals, European vs. Urdu punctuation, removal of Urdu diacritics.
Identifier (URI):http://hdl.handle.net/11234/1-2582
Language (ISO639):eng
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Subject:parallel corpus
religious text
machine translation
Type (DCMI):Text
Type (OLAC):primary_text


Citation: Jawaid, Bushra; Zeman, Daniel. 2018. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
