OLAC Record
oai:catalogue.elra.info:ELRA-W0092

Metadata
Title:TRAD Pashto Monolingual text Corpus
Abstract:This is a monolingual text corpus in Pashto. The corpus contains about 112,000,000 tokens collected from 46 different blogs and websites.
Access Rights:Rights available for: Research Use, Commercial Use
Date Available (W3CDTF):2016-04-06
Date Issued (W3CDTF):2016-04-06
Date Modified (W3CDTF):2016-04-06
Description:Written Corpora
This is a monolingual text corpus in Pashto. The corpus contains about 112,000,000 tokens collected from 46 different blogs and websites. Identified and negotiated or freely available sources have been crawled in 2012, cleaned and XML-formatted. Pashto is an indo-iranian language spoken by the Pashtun people mainly in Pakistan and Afghanistan. This corpus was produced by ELDA within the PEA TRAD project supported by the French Ministry of Defence (DGA).
Identifier:ELRA-W0092
http://catalog.elra.info/product_info.php?products_id=1266
Language:Pushto
Language (ISO639):pus
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0092
DateStamp:  2016-04-06
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2016. ELRA (European Language Resources Association).
Terms: dcmi_Text iso639_pus olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0092
Up-to-date as of: Tue Aug 27 21:15:19 EDT 2019