OLAC Record
oai:www.ldc.upenn.edu:LDC2006S35

Metadata
Title:CSLU: Multilanguage Telephone Speech Version 1.2
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Muthusamy, Yeshwant, Ronald Cole, and Beatrice Oshika. CSLU: Multilanguage Telephone Speech Version 1.2 LDC2006S35. Web Download. Philadelphia: Linguistic Data Consortium, 2006
Contributor:Muthusamy, Yeshwant
Cole, Ronald Allan
Oshika, Beatrice
Date (W3CDTF):2006
Date Issued (W3CDTF):2006-06-15
Description:*Introduction* CSLU: Multilanguage Telephone Speech Version 1.2 was developed by The Center for Spoken Language Understanding (CSLU) and consists of telephone approximately 38.5 hours of speech, about eight hours of which has time-aligned phonetic transcripts, from 11 languages: English, Farsi, French, German, Hindi, Japanese, Korean, Mandarin, Spanish, Tamil, and Vietnamese. The corpus contains fixed vocabulary utterances (e.g. days of the week) as well as fluent continuous speech. The current release includes recorded utterances from about 2,052 speakers, 12,152 speech files, and 619 phonetic transcripts. This corpus was collected and developed in 1992. *Data* Each subject called the CSLU data collection system by dialing a toll-free number. Most subjects were respondents to postings on USEnet newsgroups. Subjects were asked to contribute their voice to science to help with the research. Participating subjects responded to prompts that were designed to ilicit vocabulary of three types: fixed and useful -- language spoken, days of the week, numbers domain-specific -- short open-ended questions unrestricted -- monologue on subject of choice An analog telephone line was connected to a Gradient Technologies box. Data from incoming calls were recorded by the Gradient box. The sampling rate was 8 kHz and the files were stored in 16-bit linear format on a UNIX file system. Each utterance was recorded as a separate file. *Samples* For an example of the data in this corpus, please listen to these audio samples in Tamil (WAV) and English (WAV). *Updates* None at this time.
Extent:Corpus size: 2202009 KB
Format:Sampling Rate: 8000
Sampling Format: pcm
Identifier:LDC2006S35
https://catalog.ldc.upenn.edu/LDC2006S35
ISBN: 1-58563-390-9
ISLRN: 871-936-811-171-7
DOI: 10.35111/j0p6-f049
Language:Vietnamese
Tamil
Spanish
Iranian Persian
Korean
Japanese
Hindi
French
English
German
Mandarin Chinese
Language (ISO639):vie
tam
spa
pes
kor
jpn
hin
fra
eng
deu
cmn
License:CSLU Agreement: https://catalog.ldc.upenn.edu/license/cslu-corpora-non-commercial-research-only.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2006S35
Rights Holder:Portions © 1992, 2000, 2002 Center for Spoken Language Understanding, Oregon Health & Science University, © 2006 Trustees of the University of Pennsylvania
Type (DCMI):Sound
Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2006S35
DateStamp:  2021-05-10
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Muthusamy, Yeshwant; Cole, Ronald Allan; Oshika, Beatrice. 2006. Linguistic Data Consortium.
Terms: area_Asia area_Europe country_CN country_DE country_ES country_FR country_GB country_IN country_IR country_JP country_KR country_VN dcmi_Sound dcmi_Text iso639_cmn iso639_deu iso639_eng iso639_fra iso639_hin iso639_jpn iso639_kor iso639_pes iso639_spa iso639_tam iso639_vie olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2006S35
Up-to-date as of: Mon Sep 27 7:49:58 EDT 2021