OLAC Record: CSLU: Kids` Speech Version 1.1

OLAC Record
oai:www.ldc.upenn.edu:LDC2007S18

Metadata

Title: CSLU: Kids` Speech Version 1.1

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Shobaki, Khaldoun, John-Paul Hosom, and Ronald Cole. CSLU: Kids` Speech Version 1.1 LDC2007S18. Web Download. Philadelphia: Linguistic Data Consortium, 2007

Contributor: Shobaki, Khaldoun

Hosom, John-Paul

Cole, Ronald Allan

Date (W3CDTF): 2007

Date Issued (W3CDTF): 2007-11-20

Description: *Introduction* CSLU: Kids' Speech Version 1.1 , Linguistic Data Consortium (LDC) catalog number LDC2007S18 and isbn 1-58563-395-X, is a collection of spontaneous and prompted speech from 1100 children between Kindergarten and Grade 10 in the Forest Grove School District in Oregon. All children -- approximately 100 children at each grade level -- read approximately 60 items from a total list of 319 phonetically-balanced but simple words, sentences or digit strings. Each utterance of spontaneous speech begins with a recitation of the alphabet and contains a monologue of about one minute in duration. This release consists of 1017 files containing approximately 8-10 minutes of speech per speaker. Corresponding word-level transcriptions are also included. This corpus was developed to facilitate research about the characteristics of children's speech at different ages and to train and evaluate recognizers for use in language training and other interactive tasks involving children, including to train recognizers used in language development with deaf children. *Data* Data collection was performed using the CSLU Speech Toolkit and two computers running Windows NT 4.0. Each computer was manned by a CSLU staff member who monitored progress and helped the child with any difficulties. The average time at the computer was 20 minutes, yielding approximately 8-10 minutes of speech digitized at 16 bits and 16kHz using Soundblaster 16 PnP audio cards with head-mounted microphones. The prompted speech, consisting of 200 isolated words and 10 numeric strings, was presented as text appearing below an animated character that produced accurate visible speech synchronized with recorded prompts. A text prompt was also displayed. The child then reproduced the prompted word. Once the prompted speech collection was completed, the experimenter then asked the subject a series of questions designed to elicit spontaneous speech (i.e "Tell me about your favorite movie"). *Samples* For an example of the speech in this corpus, please listen to this sample of spontaneous speech.

Extent: Corpus size: 12582912 KB

Format: Sampling Rate: 16000

Identifier: LDC2007S18

https://catalog.ldc.upenn.edu/LDC2007S18

ISBN: 1-58563-395-X

ISLRN: 965-489-670-052-2

DOI: 10.35111/q5tn-8096

Language: English

Language (ISO639): eng

License: CSLU Agreement: https://catalog.ldc.upenn.edu/license/cslu-corpora-non-commercial-research-only.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2007S18

Rights Holder: Portions © 2001-2002 Center for Spoken Language Understanding, Oregon Health & Science University, © 2007 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2007S18

DateStamp: 2021-06-17

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Shobaki, Khaldoun; Hosom, John-Paul; Cole, Ronald Allan. 2007. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2007S18
Up-to-date as of: Wed Oct 29 7:01:01 EDT 2025

Metadata
Title:		CSLU: Kids` Speech Version 1.1
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Shobaki, Khaldoun, John-Paul Hosom, and Ronald Cole. CSLU: Kids` Speech Version 1.1 LDC2007S18. Web Download. Philadelphia: Linguistic Data Consortium, 2007
Contributor:		Shobaki, Khaldoun
		Hosom, John-Paul
		Cole, Ronald Allan
Date (W3CDTF):		2007
Date Issued (W3CDTF):		2007-11-20
Description:		Introduction CSLU: Kids' Speech Version 1.1 , Linguistic Data Consortium (LDC) catalog number LDC2007S18 and isbn 1-58563-395-X, is a collection of spontaneous and prompted speech from 1100 children between Kindergarten and Grade 10 in the Forest Grove School District in Oregon. All children -- approximately 100 children at each grade level -- read approximately 60 items from a total list of 319 phonetically-balanced but simple words, sentences or digit strings. Each utterance of spontaneous speech begins with a recitation of the alphabet and contains a monologue of about one minute in duration. This release consists of 1017 files containing approximately 8-10 minutes of speech per speaker. Corresponding word-level transcriptions are also included. This corpus was developed to facilitate research about the characteristics of children's speech at different ages and to train and evaluate recognizers for use in language training and other interactive tasks involving children, including to train recognizers used in language development with deaf children. Data Data collection was performed using the CSLU Speech Toolkit and two computers running Windows NT 4.0. Each computer was manned by a CSLU staff member who monitored progress and helped the child with any difficulties. The average time at the computer was 20 minutes, yielding approximately 8-10 minutes of speech digitized at 16 bits and 16kHz using Soundblaster 16 PnP audio cards with head-mounted microphones. The prompted speech, consisting of 200 isolated words and 10 numeric strings, was presented as text appearing below an animated character that produced accurate visible speech synchronized with recorded prompts. A text prompt was also displayed. The child then reproduced the prompted word. Once the prompted speech collection was completed, the experimenter then asked the subject a series of questions designed to elicit spontaneous speech (i.e "Tell me about your favorite movie"). Samples For an example of the speech in this corpus, please listen to this sample of spontaneous speech.
Extent:		Corpus size: 12582912 KB
Format:		Sampling Rate: 16000
Identifier:		LDC2007S18
		https://catalog.ldc.upenn.edu/LDC2007S18
		ISBN: 1-58563-395-X
		ISLRN: 965-489-670-052-2
		DOI: 10.35111/q5tn-8096
Language:		English
Language (ISO639):		eng
License:		CSLU Agreement: https://catalog.ldc.upenn.edu/license/cslu-corpora-non-commercial-research-only.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2007S18
Rights Holder:		Portions © 2001-2002 Center for Spoken Language Understanding, Oregon Health & Science University, © 2007 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2007S18
DateStamp:		2021-06-17
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Shobaki, Khaldoun; Hosom, John-Paul; Cole, Ronald Allan. 2007. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text