![]() |
OLAC Record oai:lindat.mff.cuni.cz:11858/00-097C-0000-0001-CC1E-B |
| Metadata | ||
| Title: | Hindi Web Texts | |
| Bibliographic Citation: | http://hdl.handle.net/11858/00-097C-0000-0001-CC1E-B | |
| Creator: | Bojar, Ondřej | |
| Straňák, Pavel | ||
| Zeman, Daniel | ||
| Date (W3CDTF): | 2011-11-23T15:47:18Z | |
| Date Available: | 2011-11-23T15:47:18Z | |
| Description: | A Hindi corpus of texts downloaded mostly from news sites. Contains both the original raw texts and an extensively cleaned-up and tokenized version suitable for language modeling. 18M sentences, 308M tokens | |
| FP7-ICT-2007-3-231720 (EuroMatrix Plus), 7E09003 (Czech part of EM+) | ||
| Identifier (URI): | UMC004 | |
| http://hdl.handle.net/11858/00-097C-0000-0001-CC1E-B | ||
| Is Replaced By (URI): | http://hdl.handle.net/11858/00-097C-0000-0023-6260-A | |
| Language: | Hindi | |
| Language (ISO639): | hin | |
| Publisher: | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) | |
| Rights: | Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0) | |
| http://creativecommons.org/licenses/by-nc/3.0/ | ||
| Subject: | news | |
| web texts | ||
| Type: | corpus | |
| Type (DCMI): | Text | |
| Type (OLAC): | primary_text | |
OLAC Info |
||
| Archive: | LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University | |
| Description: | http://www.language-archives.org/archive/lindat.mff.cuni.cz | |
| GetRecord: | OAI-PMH request for OLAC format | |
| GetRecord: | Pre-generated XML file | |
OAI Info |
||
| OaiIdentifier: | oai:lindat.mff.cuni.cz:11858/00-097C-0000-0001-CC1E-B | |
| DateStamp: | 2021-06-29 | |
| GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
| Citation: | Bojar, Ondřej; Straňák, Pavel; Zeman, Daniel. 2011. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL). | |
| Terms: | area_Asia country_IN dcmi_Text iso639_hin olac_primary_text | |