OLAC Record
oai:catalogue.elra.info:ELRA-S0093

Metadata
Title:IBNC - An Italian Broadcast News Corpus
Abstract:Produced through a funding from ELRA in the framework of the European Commission project LRsPProduced through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335), the collection consists of 150 broadcast programs from the RAI, for a total time of about 30 hours, issued in 36 different days, between 1992 and 1999. down-sampled to 16kHz 16 bit, and encoded into the NIST Sphere PCM format.
Access Rights:Rights available for: Research Use
Coverage:Between 1992 and 1999
Date Available (W3CDTF):2000-12-15
Date Issued (W3CDTF):2004-09-14
Date Modified (W3CDTF):2007-02-22
Description:Broadcast Resources
The Italian Broadcast News Corpus (IBNC) was produced by the ITC-IRST (Italy) through a funding from ELRA in the framework of the European Commission project LRsPThe Italian Broadcast News Corpus (IBNC) was produced by the ITC-IRST (Italy) through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335). RAI, the major Italian broadcast company, supplied studio quality recordings of radio news programs sampled from its internal digital archive. The collection consists of 150 programs, for a total time of about 30 hours, issued in 36 different days, between 1992 and 1999. Recordings were supplied by RAI on Digital Audio Tapes (DAT), with 44kHz sampling rate and 16 bit resolution. Each DAT was manually processed to transfer each single program issue into a single file. During this operation, the signal was down-sampled to 16kHz with a resolution of 16 bits, and encoded into the NIST Sphere PCM format. Speech recordings present variations of topic, speaker, acoustic channel, speaking mode, etc. The corpus has been segmented, labelled and transcribed manually using the tool developed by DGA (D?l?gation G?n?rale pour l'Armement, France) and LDC (Linguistic Data Consortium, USA), called "Transcriber", with conventions similar to those adopted by LDC for the DARPA HUB-4 corpora.The transcription text consists of mixed-case ASCII characters of the ISO-8859-1 extended set. A validation work was carried out by an external validator. It consisted of checking audio files, documentation and transcriptions.
Identifier:ELRA-S0093
http://catalog.elra.info/product_info.php?products_id=593
Language:Italian
Language (ISO639):ita
Medium:DVD
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Sound
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-S0093
DateStamp:  2000-12-15
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2004. ELRA (European Language Resources Association).
Terms: area_Europe country_IT dcmi_Sound iso639_ita olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-S0093
Up-to-date as of: Fri May 5 1:18:34 EDT 2017