OLAC Record

Title:NEMLAR Broadcast News Speech Corpus
Access Rights: Rights available for: nonCommercialUse, commercialUse
Date Available (W3CDTF):2006-08-11
Date Issued (W3CDTF):2006-08-11
Date Modified (W3CDTF):2017-06-01
Description:This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the same project, are also available: NEMLAR Written Corpus (ELRA-W0042) and the NEMLAR Speech Synthesis Corpus (ELRA-S0220).The Nemlar Broadcast News Speech Corpus consists of about 40 hours of Standard Arabic news broadcasts. The broadcasts were recorded from four different radio stations: Medi1, Radio Orient, RMC – Radio Monte Carlo, RTM – Radio Television Maroc.Each broadcast contains between 25 and 30 minutes of news and interviews (259 distinct speakers identified). The recordings were carried out at three different periods between 30 June 2002 and 18 July 2005. All files were recorded in linear PCM format, 16 kHz, 16 bit.The software used for the transcription is Transcriber with the additional patch for Arabic. Thus the transcriptions were done in Arabic characters and the software automatically generated the transliterations. The following annotation levels are included:•Orthographic transcription of speech (in news, not in music, commercials, etc.), including Named Entities•Speakers and speaker turns •Segment markers (portions of maximum 10 seconds)•Topic/story boundaries•Background noises (stationary and instantaneous noise events)•Change of background•Music/Noise•Word boundariesA lexicon of 62,000 words with transliterations, frequency and SAMPA for Arabic is also included.The database is distributed in 1 ISO 9660 DVD-ROM volume. It has been validated by an external partner and a validation report is provided.
ISLRN: 479-507-036-103-9
Identifier (URI):http://catalog.elra.info/en-us/repository/browse/ELRA-S0219/
Language (ISO639):ara
Medium:Not specified
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Sound
Type (OLAC):primary_text


Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-S0219
DateStamp:  2006-08-11
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2006. ELRA (European Language Resources Association).
Terms: dcmi_Sound iso639_ara olac_primary_text

Up-to-date as of: Wed Nov 17 9:12:42 EST 2021