OLAC Record
oai:catalogue.elra.info:ELRA-S0198

Metadata
Title:GlobalPhone German
Abstract:The GlobalPhone corpus was designed to provide read speech data for the development and evaluation of large continuous speech recognition systems in the most widespread languages of the world, and to provide a uniform, multilingual speech and text database for language independent and language adaptive speech recognition as well as for language identification tasks. The entire GlobalPhone corpus enables the acquisition of acoustic-phonetic knowledge of the following 20 spoken languages Arabic, Bulgarian, Chinese-Mandarin, Chinese-Shanghai, Croatian, Czech, French, German, Hausa, Japanese, Korean, Polish, Portuguese (Brazilian), Russian, Spanish (Latin America), Swedish, Tamil, Thai, Turkish, Vietnamese. In each language about 100 sentences were read from each of the 100 speakers. The read texts were selected from national newspapers available via Internet to provide a large vocabulary (up to 65,000 words). The read articles cover national and international political news as well as economic news.
Access Rights:Rights available for: Research Use, Commercial Use
Date Available (W3CDTF):2006-01-30
Date Issued (W3CDTF):2006-01-27
Date Modified (W3CDTF):2017-06-26
Description:Desktop/Microphone
The GlobalPhone corpus developed in collaboration with the Karlsruhe Institute of Technology (KIT) was designed to provide read speech data for the development and evaluation of large continuous speech recognition systems in the most widespread languages of the world, and to provide a uniform, multilingual speech and text database for language independent and language adaptive speech recognition as well as for language identification tasks. Le corpus GlobalPhone complet permet l?acquisition d?une connaissance acoustico-phon?tique pour les 22 langues parl?es suivantes: allemand (ELRA-S0198), arabe (arabe standard moderne) (ELRA-S0192), bulgare (ELRA-S0319), chinois-mandarin (ELRA-S0193), chinois de Shanghai (ELRA-S0194), cor?en (ELRA-S0200), croate (ELRA-S0195), espagnol (d?Am?rique latine) (ELRA-S0203), fran?ais (ELRA-S0197), haoussa (ELRA-S0347), japonais (ELRA-S0199), polonais (ELRA-S0320), portugais (br?silien) (ELRA-S0201), russe (ELRA-S0202), su?dois (ELRA-S0204), swahili (ELRA-S0375), tamoul (ELRA-S0205), tha? (ELRA-S0321), tch?que (ELRA-S0196), turc (ELRA-S0206), ukrainien (ELRA-S0377) et vietnamien (ELRA-S0322). Dans chaque langue, environ 100 phrases ont ?t? lues par chacun des 100 locuteurs. Les textes lus sont extraits d?articles de journaux nationaux disponibles sur internet afin de fournir un large vocabulaire. Les articles lus couvrent des actualit?s politiques nationales et internationales, ainsi que des nouvelles ?conomiques. Les donn?es de parole ont ?t? enregistr?es en 16 bit, 16 kHz (qualit? mono) avec un micro-casque (Sennheiser 440-6). Les transcriptions ont ?t? valid?es en interne et annot?es au moyen de balises sp?ciales pour marquer les effets spontan?s, tels que le b?gaiement, les faux d?marrages, et les effets non verbaux comme le rire et les h?sitations. La base contient ?galement des informations sur les locuteurs, telles que l??ge, le genre, la profession, etc. ainsi que des informations sur la mise en place de l?enregistrement. Le corpus GlobalPhone dans son entier comprend plus de 450 heures de parole enregistr?es par plus de 2100 locuteurs adultes natifs. Data is shortened by means of the shorten program written by Tony Robinson. Alternatively, the data could be delivered unshorten. The German corpus was produced using the Frankfurter Allgemeine und Sueddeutsche Zeitung newspaper. It contains recordings of 77 speakers (70 males, 7 females) recorded in Karlsruhe, Germany. No age distribution is available.
Identifier:ELRA-S0198
http://catalog.elra.info/product_info.php?products_id=822
Language:German
Language (ISO639):deu
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Sound
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-S0198
DateStamp:  2006-01-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2006. ELRA (European Language Resources Association).
Terms: area_Europe country_DE dcmi_Sound iso639_deu olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-S0198
Up-to-date as of: Thu Aug 30 1:36:19 EDT 2018