OLAC Record
oai:catalogue.elra.info:ELRA-S0253

Metadata
Title:TC-STAR English Test Corpora for ASR
Abstract:This corpus consists of 70 hours of recordings of EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European English and other European languages. From this corpus, 16 hours of English speeches (native or non native) were annotated (transcribed). Each speech file contains a single channel with 16-bit resolution at a sample rate of 16kHz. The transcription files are stored in Transcriber XML file format.
Access Rights:Rights available for: Commercial Use, Research Use
Date Available (W3CDTF):2007-11-15
Date Issued (W3CDTF):2007-11-15
Date Modified (W3CDTF):2007-11-15
Description:Desktop/Microphone
TC-STAR is a European integrated project focusing on all core technologies for Speech-to-Speech Translation (SST): Automatic Speech Recognition (ASR), Spoken Language Translation (SLT), and Text to Speech Synthesis (TTS). This corpus consists of 70 hours of recordings of EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European English and other European languages. From this corpus, 16 hours of English speeches (native or non native) were annotated (transcribed). Transcriptions are included in the present package. The data comprises the test (development and evaluation) data for the TC-STAR project in the years 2005, 2006, and 2007. The recordings were obtained from Europe by Satellite (http://europa.eu.it/comm/ebs) from Oct. until Nov. 2004, June to Nov. 2005, and June until July 2006. The transcription files are stored in Transcriber XML file format. The speech signals were submitted by EbS via internet in Real Media format and via satellite in MPEG1-layer2 format. The signals were decoded, resampled and are stored in WAVE RIFF (Resource Interchange File Format). Each file contains a single channel with 16-bit resolution at a sample rate of 16kHz. The speech databases made within the TC-STAR project were validated by SPEX, in the Netherlands, to assess their compliance with the TC-STAR format and content specifications.
Identifier:ELRA-S0253
http://catalog.elra.info/product_info.php?products_id=1037
Language:English
Language (ISO639):eng
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Sound
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-S0253
DateStamp:  2007-11-15
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2007. ELRA (European Language Resources Association).
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-S0253
Up-to-date as of: Fri Jun 23 1:05:56 EDT 2017