OLAC Record
oai:catalogue.elra.info:ELRA-S0145

Metadata
Title:Mandarin-5000 database
Abstract:This speech database contains the recordings of 4,752 speakers of Mandarin as first or second language recorded over the fixed and mobile telephone networks in all provinces of mainland China, including Hong Kong. Each speaker uttered around 54 read and spontaneous items.
Access Rights:Rights available for: Research Use, Commercial Use
Date Available (W3CDTF):2003-06-16
Date Issued (W3CDTF):2004-09-14
Date Modified (W3CDTF):2006-03-13
Description:Telephone
The MANDARIN-5000 database contains the recordings of 4,752 speakers (2383 males, 2369 females) of Mandarin as first or second language (3,222 native speakers) recorded over the fixed and mobile telephone networks in all provinces of mainland China, including Hong Kong (fixed network: cordless handset: 513 speakers, POT (plain old telephone): 3,558 speakers; mobile network: 491 speakers; undetermined (cordless or mobile): 190 speakers). The database design closely follows the SpeechDat(II) conventions, in particular with respect to the content of the database. The database consists of 1 CD containing all documentation files including the phonetic lexicon, and 3 DVD-R containing the data, i.e. speech files and corresponding transcription files. Speech samples are stored as sequences of 8-bit 8 kHz A-law, uncompressed. Each prompted utterance is stored in a separate file, and each signal file is accompanied by a transcription file encoded in GB-2312 and ASCII which contains the orthographic representation (i.e. pictograms), phonemic transcription in Pinyin with tones and word boundaries. Each speaker uttered the following 54 items: - 6 isolated application words (25 fixed, 5 free) - 1 additional application command with a parameter (e.g. name dialling) - 1 sequence of 10 isolated digits (balanced) - 6 digit strings (in total balanced for digits, letters, dashes and their transitions) - 3 dates, where 1 of them spontaneous - 2 word spotting phrases using an application word - 2 handset information (?mobile phone ?? ?cordless phone ??) - 2 isolated digits - 2 spelled words (letter sequences) - 1 currency money amount - 1 natural plain number (balanced for words and transitions) - 1 natural number with measure word - 8 names (persons, spelling, cities, companies), where 3 of them spontaneous - 1 spontaneous train schedule request (origin, destination, date, time) - 1 spontaneous correction - 1 spontaneous answer to question for time - 1 spontaneous answer to question for time or day - 4 spontaneous answers to questions, including fuzzy yes/no - For training 8 phonetically rich sentences (read newspaper text) and alternatively for test 8 sentences dictated out of newspaper article - 1 time of day (spontaneous) - 1 time phrase (read) The following age distribution has been obtained: 239 speakers are under 16, 2,391 are between 16 and 30, 1,449 are between 31 and 45, 601 are between 46 and 60, and 32 speakers are over 60. (The age of 40 speakers was not determined.) A pronunciation lexicon with orthographic representation (i.e. pictograms), phonemic transcription in Pinyin with tones and frequency of occurrences is also included.
Identifier:ELRA-S0145
http://catalog.elra.info/product_info.php?products_id=15
Language:Chinese
Language (ISO639):zho
Medium:CD-ROM
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Sound
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-S0145
DateStamp:  2003-06-16
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2004. ELRA (European Language Resources Association).
Terms: dcmi_Sound iso639_zho olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-S0145
Up-to-date as of: Sun Jan 21 1:34:02 EST 2018