![]() |
OLAC Record oai:lindat.mff.cuni.cz:11858/00-097C-0000-0023-65A9-5 |
| Metadata | ||
| Title: | Urdu Monolingual Corpus | |
| Bibliographic Citation: | http://hdl.handle.net/11858/00-097C-0000-0023-65A9-5 | |
| Creator: | Jawaid, Bushra | |
| Kamran, Amir | ||
| Bojar, Ondřej | ||
| Date (W3CDTF): | 2014-03-27T15:41:35Z | |
| Date Available: | 2014-03-27T15:41:35Z | |
| Description: | We release a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags. We extend the work of Jawaid and Bojar (2012) who use three different taggers and then apply a voting scheme to disambiguate among the different choices suggested by each tagger. We run this complex ensemble on a large monolingual corpus and release the both plain and tagged corpora. | |
| it is supported by the MosesCore project sponsored by the European Commission’s Seventh Framework Programme (Grant Number 288487). | ||
| Identifier (URI): | http://hdl.handle.net/11858/00-097C-0000-0023-65A9-5 | |
| Language: | Urdu | |
| Language (ISO639): | urd | |
| Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
| Rights: | Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) | |
| http://creativecommons.org/licenses/by-nc-sa/3.0/ | ||
| Subject: | Urdu | |
| monolingual data | ||
| annotated data | ||
| corpus | ||
| Urdu language | ||
| Subject (ISO639): | urd | |
| Type: | lexicalConceptualResource | |
| Type (DCMI): | Text | |
| Type (OLAC): | lexicon | |
OLAC Info |
||
| Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
| Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
| GetRecord: | OAI-PMH request for OLAC format | |
| GetRecord: | Pre-generated XML file | |
OAI Info |
||
| OaiIdentifier: | oai:lindat.mff.cuni.cz:11858/00-097C-0000-0023-65A9-5 | |
| DateStamp: | 2021-06-29 | |
| GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
| Citation: | Jawaid, Bushra; Kamran, Amir; Bojar, Ondřej. 2014. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
| Terms: | area_Asia country_PK dcmi_Text iso639_urd olac_lexicon | |
Inferred Metadata | ||
| Country: | Pakistan | |
| Area: | Asia | |