Abstract:MICROAES is a Spanish microphone database, which comprises the recordings from 300 different speakers (a total of 30 hours of speech). Each speaker recorded a corpus of 450 paragraphs in a quiet environment. The database includes an orthographic and lexical transcription, with a few details that represent audible acoustic events (speech and non speech) present in the corresponding waveform files. The lexicon has more than 7400 words with the corresponding pronunciation information in SAMPA.
Access Rights:Rights available for: Commercial Use, Research Use
Date Available (W3CDTF):2004-08-30
Date Issued (W3CDTF):2004-05-12
Date Modified (W3CDTF):2004-05-12
The ATLAS Spanish Microphone Database (MICROAES) has been collected in Spain by Applied Technologies on Language and Speech, S.L. (ATLAS). This database comprises microphone recordings from 300 different speakers, who have been selected from five different dialectal areas. Sex and age distribution was also considered for speaker selection. The corpus has 30 sets of 15 paragraphs giving a total of 450 paragraphs. Each 15 paragraph set contains at least two allophones from the extended SAMPA symbols. For this purpose, coarticulation effect between words was considered. The recording platform is based on a laptop using a PCMCIA slot as interface to the audio equipment. Up to four microphones are recorded simultaneously: * Sennheiser ME 104 (close distance) * Nokia Lavalier HDC-6D (close distance) * Sennheiser ME 64 (medium distance) * Haun MBNM-550 E-L (far distance) In this database all recordings have been done in an office with no discussion or meeting during the recordings. The signals are stored in a raw file format, i.e. without headers in the signal file. Each of the four speech channels is recorded at 16 kHz with 16 bit quantization. A description of the sample rate, the quantization, and byte order used is held in the SAM label file that corresponds to each speech file. This label file also contains information about the signal quality value of the speech file. The transcription included in this database is an orthographic, lexical transcription with a few details that represent audible acoustic events (speech and non speech) present in the corresponding waveform files. Transcription includes segment markers dividing the paragraph in portions of less than 10 seconds using speaker pauses. The lexicon file included in this database has more that 7400 words with the corresponding pronunciation information using the SAMPA phonemic notation. The database contains 30 hours of speech and is distributed in 30 ISO 9660 CD-ROM volumes or 5 ISO 9660 DVD-ROM volumes.
Language:Spanish, Castilian
Language (ISO639):spa
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Sound
Type (OLAC):primary_text


OaiIdentifier:  oai:catalogue.elra.info:ELRA-S0165
DateStamp:  2004-08-30
