OLAC Record
oai:catalogue.elra.info:ELRA-S0371

Metadata
Title:PortMedia French and Italian corpus
Abstract:This corpus contains 700 transcribed dialogues from about 140 French speakers and 604 transcribed dialogues from about 150 Italian speakers (several dialogues per speaker). The method chosen for the corpus construction process is that of a ?Wizard of Oz? (WoZ) system. This consists of simulating a natural language man-machine dialogue. The scenario was built in the domain of touristic information and reservation. A manual transcription and semantic annotation of the corpus are provided with corresponding wave files.
Access Rights:Rights available for: Research Use, Evaluation Use, Commercial Use
Date Available (W3CDTF):2014-07-23
Date Issued (W3CDTF):2014-07-23
Date Modified (W3CDTF):2014-07-23
Description:Telephone
The PortMedia French and Italian corpus was produced by ELDA, with the same paradigm and specifications as the MEDIA speech database (ELRA-S0272) but on a different domain. The method chosen for the corpus construction process is that of a ?Wizard of Oz? (WoZ) system. This consists of simulating a natural language man-machine dialogue. The scenario was built in the domain of touristic information and reservation (ticket reservation within the 2010 Festival d?Avignon for French and hotel reservation for Italian). The corpus contains 700 transcribed dialogues from about 140 French speakers and 604 transcribed dialogues from about 150 Italian speakers (several dialogues per speaker). The database is formatted following the SpeechDat conventions and it includes the following items: ? 700 recorded sessions for French and 604 sessions for Italian. The signals are stored in a stereo wave file format. Each of the two speech channels is recorded at 8 kHz with 16 bit quantization with the least significant byte first (?lohi? or Intel format) as signed integers. ? Manual transcription of each session in HTML format. Label files were created with the free transcription tool Transcriber (TRS files). ? A manual semantic annotation of the corpus. It has been produced with Semantizer, which is also provided with the data.
Identifier:ELRA-S0371
http://catalog.elra.info/product_info.php?products_id=1224
Language:French
Italian
Language (ISO639):fra
ita
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Sound
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-S0371
DateStamp:  2014-07-23
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2014. ELRA (European Language Resources Association).
Terms: area_Europe country_FR country_IT dcmi_Sound iso639_fra iso639_ita olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-S0371
Up-to-date as of: Fri Jun 23 1:06:31 EDT 2017