Title:Urdu Monolingual Corpus
Bibliographic Citation:http://hdl.handle.net/11858/00-097C-0000-0023-65A9-5
Creator:Jawaid, Bushra
Kamran, Amir
Bojar, Ondřej
Date (W3CDTF):2014-03-27T15:41:35Z
Date Available:2014-03-27T15:41:35Z
Description:We release a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags. We extend the work of Jawaid and Bojar (2012) who use three different taggers and then apply a voting scheme to disambiguate among the different choices suggested by each tagger. We run this complex ensemble on a large monolingual corpus and release the both plain and tagged corpora.
it is supported by the MosesCore project sponsored by the European Commission’s Seventh Framework Programme (Grant Number 288487).
Identifier (URI):http://hdl.handle.net/11858/00-097C-0000-0023-65A9-5
Language (ISO639):urd
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
monolingual data
annotated data
Urdu language
Subject (ISO639):urd
Type (DCMI):Text
Type (OLAC):lexicon


OaiIdentifier:  oai:lindat.mff.cuni.cz:11858/00-097C-0000-0023-65A9-5
DateStamp:  2018-07-02
Citation: Jawaid, Bushra; Kamran, Amir; Bojar, Ondřej. 2014. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
