OLAC Record
oai:www.ldc.upenn.edu:LDC2005S26

Metadata
Title:CSLU: 22 Languages Corpus
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Lander, T. CSLU: 22 Languages Corpus LDC2005S26. Web Download. Philadelphia: Linguistic Data Consortium, 2005
Contributor:Lander, T
Date (W3CDTF):2005
Date Issued (W3CDTF):2005-11-29
Description:*Introduction* This file contains documentation on the CSLU: 22 Languages v 1.2, Linguistic Data Consortium (LDC) catalog number LDC2005S26 and ISBN 1-58563-361-5. Produced by Center for Spoken Language Understanding and distributed by the Linguistic Data Consortium, the 22 Languages corpus consists of telephone speech from 21 languages: Eastern Arabic, Cantonese, Czech, Farsi, German, Hindi, Hungarian, Japanese, Korean, Malay, Mandarin, Italian, Polish, Portuguese, Russian, Spanish, Swedish, Swahili, Tamil, Vietnamese, and English. The corpus contains fixed vocabulary utterances (e.g. days of the week) as well as fluent continuous speech. Each of the 50,191 utterances is verified by a native speaker to determine if the caller followed instructions when answering the prompts. For this release, approximately 19,758 utterances have corresponding orthographic transcriptions in all the above languages except Eastern Arabic, Farsi, Korean, Russian, Italian. *Samples* For an exampe of this corpus, please listen to these Arabic and English audio samples. *Updates and Contact* Questions regarding this corpus and about the Center for Spoken Language Understanding should be directed to Jan van Santen.
Format:Sampling Rate: 8000
Sampling Format: ulaw
Identifier:LDC2005S26
https://catalog.ldc.upenn.edu/LDC2005S26
ISBN: 1-58563-356-9
Language:Yue Chinese
Vietnamese
Tamil
Swahili (individual language)
Swedish
Russian
Portuguese
Polish
Korean
Japanese
Indonesian
Hindi
English
German
Arabic
Swahili (macrolanguage)
Congo Swahili
Spanish
Mandarin Chinese
Italian
Hungarian
Persian
Dari
Iranian Persian
Czech
Language (ISO639):yue
vie
tam
swh
swe
rus
por
pol
kor
jpn
ind
hin
eng
deu
ara
swa
swc
spa
cmn
ita
hun
fas
prs
pes
ces
License:CSLU Agreement: https://catalog.ldc.upenn.edu/license/cslu-corpora-non-commercial-research-only.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2005S26
Rights Holder:Portions © 1998-2002 Center for Spoken Language Understanding Oregon Health & Science University, © 2005 Trustees of the University of Pennsylvania
Type (DCMI):Sound
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2005S26
DateStamp:  2017-08-17
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Lander, T. 2005. Linguistic Data Consortium.
Terms: area_Africa area_Asia area_Europe country_AF country_CD country_CN country_CZ country_DE country_ES country_GB country_HU country_ID country_IN country_IR country_IT country_JP country_KR country_PL country_PT country_RU country_SE country_TZ country_VN dcmi_Sound iso639_ara iso639_ces iso639_cmn iso639_deu iso639_eng iso639_fas iso639_hin iso639_hun iso639_ind iso639_ita iso639_jpn iso639_kor iso639_pes iso639_pol iso639_por iso639_prs iso639_rus iso639_spa iso639_swa iso639_swc iso639_swe iso639_swh iso639_tam iso639_vie iso639_yue olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2005S26
Up-to-date as of: Fri Aug 18 1:38:24 EDT 2017