OLAC Record: Siemens Synthesis Corpus

OLAC Record
oai:catalogue.elra.info:ELRA-S0082

Metadata

Title: Siemens Synthesis Corpus - SI1000P

Access Rights: Rights available for: nonCommercialUse, commercialUse

Date Available (W3CDTF): 2000-04-06

Date Issued (W3CDTF): 2000-04-06

Date Modified (W3CDTF): 2005-05-09

Description: The SI1000P recordings were done to provide material for high quality concatenate speech synthesis. It contains 1000 newspaper sentences read by two German professional broadcasting announcers in studio quality together with the laryngographic signal and the glottal pulse stream. Parts of the corpus were labelled and segmented phonemically (SAM-PA) and prosodically (borders + accents).Both speakers are trained and experienced broadcast announcers at the local state broadcasting unit. They were asked to read the texts in a speaking style like broadcast announcing, very correct, but fluently and without pausing between words. The recordings were done in a total echo-cancelling studio at the Institute of Phonetics at the University of Munich. Recording channels were:- speech signal recorded by Sennheiser MKH20 omnidirectional, 30 cm from mouth.- laryngograph signal, LxProc of Laryngograph Ltd. London.- glottis pulse stream by laryngograph- start/stop pulse at beginning and end of utteranceRecording machine was a high quality 4 channel DAT (48 kHz, 16 bit). The data were copied to hard disk and cut according the pulse information in the forth channel into separate utterances (one utterance per file).Speech signals were filtered and down-sampled from 48 kHz to 16 kHz. Laryngograph signals were filtered and downsampled to 16 kHz. The format of the signal files is PhonDat 2.The resulting segmentation and all information accompanying the signal is summed up in the corresponding Partitur File. The Partitur File format is an open structure that allows the easy description and processing of information aligned to a speech signal.The database also provides an ordered list of all occurring words together with the standard pronunciation in SAM-PA and the orthography of all spoken utterances in the corpus.

Identifier: ELRA-S0082

ISLRN: 389-408-959-715-2

Identifier (URI): https://catalog.elra.info/en-us/repository/browse/ELRA-S0082/

Language: German

Language (ISO639): deu

Medium: Not specified

Publisher: ELRA (European Language Resources Association)

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: ELRA Catalogue of Language Resources

Description: http://www.language-archives.org/archive/catalogue.elra.info

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:catalogue.elra.info:ELRA-S0082

DateStamp: 2000-04-06

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: n.a. 2000. ELRA (European Language Resources Association).
Terms: area_Europe country_DE dcmi_Sound iso639_deu olac_primary_text

http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-S0082
Up-to-date as of: Wed Oct 1 0:54:49 EDT 2025

Metadata
Title:		Siemens Synthesis Corpus - SI1000P
Access Rights:		Rights available for: nonCommercialUse, commercialUse
Date Available (W3CDTF):		2000-04-06
Date Issued (W3CDTF):		2000-04-06
Date Modified (W3CDTF):		2005-05-09
Description:		The SI1000P recordings were done to provide material for high quality concatenate speech synthesis. It contains 1000 newspaper sentences read by two German professional broadcasting announcers in studio quality together with the laryngographic signal and the glottal pulse stream. Parts of the corpus were labelled and segmented phonemically (SAM-PA) and prosodically (borders + accents).Both speakers are trained and experienced broadcast announcers at the local state broadcasting unit. They were asked to read the texts in a speaking style like broadcast announcing, very correct, but fluently and without pausing between words. The recordings were done in a total echo-cancelling studio at the Institute of Phonetics at the University of Munich. Recording channels were:- speech signal recorded by Sennheiser MKH20 omnidirectional, 30 cm from mouth.- laryngograph signal, LxProc of Laryngograph Ltd. London.- glottis pulse stream by laryngograph- start/stop pulse at beginning and end of utteranceRecording machine was a high quality 4 channel DAT (48 kHz, 16 bit). The data were copied to hard disk and cut according the pulse information in the forth channel into separate utterances (one utterance per file).Speech signals were filtered and down-sampled from 48 kHz to 16 kHz. Laryngograph signals were filtered and downsampled to 16 kHz. The format of the signal files is PhonDat 2.The resulting segmentation and all information accompanying the signal is summed up in the corresponding Partitur File. The Partitur File format is an open structure that allows the easy description and processing of information aligned to a speech signal.The database also provides an ordered list of all occurring words together with the standard pronunciation in SAM-PA and the orthography of all spoken utterances in the corpus.
Identifier:		ELRA-S0082
Identifier:		ISLRN: 389-408-959-715-2
Identifier (URI):		https://catalog.elra.info/en-us/repository/browse/ELRA-S0082/
Language:		German
Language (ISO639):		deu
Medium:		Not specified
Publisher:		ELRA (European Language Resources Association)
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		ELRA Catalogue of Language Resources
Description:		http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:catalogue.elra.info:ELRA-S0082
DateStamp:		2000-04-06
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		n.a. 2000. ELRA (European Language Resources Association).
Terms:		area_Europe country_DE dcmi_Sound iso639_deu olac_primary_text