OLAC Record
oai:catalogue.elra.info:ELRA-W0093

Metadata
Title:TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data
Abstract:This corpus consists of the transcription of 106 hours of recordings in Pashto from the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381) translated into French. It contains about 832,000 source words and 747,000 target words.
Access Rights:Rights available for: Research Use, Commercial Use
Date Available (W3CDTF):2016-04-06
Date Issued (W3CDTF):2016-04-06
Date Modified (W3CDTF):2016-04-06
Description:Written Corpora
The corpus consists of the transcription of 106 hours of recordings in Pashto translated into French. The transcriptions are extracted from the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381). It contains about 832,000 source words and 747,000 target words. No audio file is provided. Pashto is an indo-iranian language spoken by the Pashtun people mainly in Pakistan and Afghanistan. This corpus was produced by ELDA within the PEA TRAD project supported by the French Ministry of Defence (DGA). It was used as training data for language modelling in machine translation.
Identifier:ELRA-W0093
http://catalog.elra.info/product_info.php?products_id=1267
Language:Pushto
French
Language (ISO639):pus
fra
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0093
DateStamp:  2016-04-06
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2016. ELRA (European Language Resources Association).
Terms: area_Europe country_FR dcmi_Text iso639_fra iso639_pus olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0093
Up-to-date as of: Tue Aug 27 21:15:19 EDT 2019