OLAC Record
oai:catalogue.elra.info:ELRA-S0272

Metadata
Title:MEDIA speech database for French
Abstract:The MEDIA speech database for French was produced by ELDA within the French national project MEDIA (Automatic evaluation of man-machine dialogue systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). It contains 1,258 transcribed dialogues from 250 adult speakers. The method chosen for the corpus construction process is that of a ?Wizard of Oz? (WoZ) system. This consists of simulating a natural language man-machine dialogue. The scenario was built in the domain of tourism and hotel reservation. The semantic annotation of the corpus is available in this catalogue and referenced ELRA-E0024 (MEDIA Evaluation Package).
Access Rights:Rights available for: Research Use, Commercial Use
Date Available (W3CDTF):2008-03-27
Date Issued (W3CDTF):2008-03-20
Date Modified (W3CDTF):2008-03-27
Description:Telephone
The MEDIA speech database for French was produced by ELDA within the French national project MEDIA (Automatic evaluation of man-machine dialogue systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). It contains 1,258 transcribed dialogues from 250 adult speakers. The method chosen for the corpus construction process is that of a ?Wizard of Oz? (WoZ) system. This consists of simulating a natural language man-machine dialogue. The scenario was built in the domain of tourism and hotel reservation. The database is formatted following the SpeechDat conventions and it includes the following items: ? 1,258 recorded sessions for a total of 70 hours of speech. The signals are stored in a stereo wave file format. Each of the two speech channels is recorded at 8 kHz with 16 bit quantization with the least significant byte first (?lohi? or Intel format) as signed integers. ? Manual transcription of each session in XML format. Label files were created with the free transcription tool Transcriber (TRS files). ? Phonetic lexicon containing all the words spoken in the database. Column 1 contains the orthography of the French word. Column 2 shows the frequency of the word. Column 3 contains the pronunciation in SAMPA format. Here is a sample entry of the lexicon: 1) agit?e 3 A/ Z i t e ? Documentation and statistics are also provided with the database. The semantic annotation of the corpus is available in this catalogue and referenced ELRA-E0024 (MEDIA Evaluation Package).
Identifier:ELRA-S0272
http://catalog.elra.info/product_info.php?products_id=1057
Language:French
Language (ISO639):fra
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Sound
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-S0272
DateStamp:  2008-03-27
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2008. ELRA (European Language Resources Association).
Terms: area_Europe country_FR dcmi_Sound iso639_fra olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-S0272
Up-to-date as of: Mon Feb 27 0:31:27 EST 2017