OLAC Record

Title:Urdu Monolingual Corpus
Bibliographic Citation:http://hdl.handle.net/11858/00-097C-0000-0023-65A9-5
Creator:Jawaid, Bushra
Kamran, Amir
Bojar, Ondřej
Date (W3CDTF):2014-03-27T15:41:35Z
Date Available:2014-03-27T15:41:35Z
Description:We release a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags. We extend the work of Jawaid and Bojar (2012) who use three different taggers and then apply a voting scheme to disambiguate among the different choices suggested by each tagger. We run this complex ensemble on a large monolingual corpus and release the both plain and tagged corpora.
it is supported by the MosesCore project sponsored by the European Commission’s Seventh Framework Programme (Grant Number 288487).
Identifier (URI):http://hdl.handle.net/11858/00-097C-0000-0023-65A9-5
Language (ISO639):urd
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
monolingual data
annotated data
Urdu language
Subject (ISO639):urd
Type (DCMI):Text
Type (OLAC):lexicon


Archive:  LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11858/00-097C-0000-0023-65A9-5
DateStamp:  2018-07-02
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Jawaid, Bushra; Kamran, Amir; Bojar, Ondřej. 2014. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: area_Asia country_PK dcmi_Text iso639_urd olac_lexicon

Inferred Metadata

Country: Pakistan
Area: Asia

Up-to-date as of: Sun Jul 22 0:22:40 EDT 2018