OLAC Record
oai:lindat.mff.cuni.cz:11858/00-097C-0000-0023-65A9-5

Metadata
Title:Urdu Monolingual Corpus
Bibliographic Citation:http://hdl.handle.net/11858/00-097C-0000-0023-65A9-5
Creator:Jawaid, Bushra
Kamran, Amir
Bojar, Ondřej
Date (W3CDTF):2014-03-27T15:41:35Z
Date Available:2014-03-27T15:41:35Z
Description:We release a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags. We extend the work of Jawaid and Bojar (2012) who use three different taggers and then apply a voting scheme to disambiguate among the different choices suggested by each tagger. We run this complex ensemble on a large monolingual corpus and release the both plain and tagged corpora.
it is supported by the MosesCore project sponsored by the European Commission’s Seventh Framework Programme (Grant Number 288487).
Identifier (URI):http://hdl.handle.net/11858/00-097C-0000-0023-65A9-5
Language:Urdu
Language (ISO639):urd
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
http://creativecommons.org/licenses/by-nc-sa/3.0/
Subject:Urdu
monolingual data
annotated data
corpus
Urdu language
Subject (ISO639):urd
Type:lexicalConceptualResource
Type (DCMI):Text
Type (OLAC):lexicon

OLAC Info

Archive:  LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11858/00-097C-0000-0023-65A9-5
DateStamp:  2017-11-09
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Jawaid, Bushra; Kamran, Amir; Bojar, Ondřej. 2014. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: area_Asia country_PK dcmi_Text iso639_urd olac_lexicon

Inferred Metadata

Country: Pakistan
Area: Asia


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11858/00-097C-0000-0023-65A9-5
Up-to-date as of: Fri Nov 10 1:47:09 EST 2017