OLAC Record: West Point Brazilian Portuguese Speech

OLAC Record
oai:www.ldc.upenn.edu:LDC2008S04

Metadata

Title: West Point Brazilian Portuguese Speech

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Morgan, John, Sheila Ackerlind, and Sterling Packer. West Point Brazilian Portuguese Speech LDC2008S04. Web Download. Philadelphia: Linguistic Data Consortium, 2008

Contributor: Morgan, John

Ackerlind, Sheila

Packer, Sterling

Date (W3CDTF): 2008

Date Issued (W3CDTF): 2008-05-19

Description: *Introduction* West Point Brazilian Portuguese Speech is a database of digital recordings of approximately 1.6 hours of spoken Brazilian Portuguese designed and collected by staff and faculty of the Department of Foreign Languages (DFL) and Center for Technology Enhanced Language Learning (CTELL) to develop acoustic models for speech recognition systems. The U.S. government uses such systems to provide speech-recognition enhanced language learning courseware to government linguists and students enrolled in various government language programs. The data in this corpus was collected in March 1999 in Brasilia, Brazil using informants from a Brazilian military academy. The corpus consists of read speech from 60 female and 68 male native and non-native speakers. The speech was elicited from a prompt script containing 296 sentences and phrases typically used in language learning situations. The prompts are listed in the file prompts.txt. Each line of this file has two fields separated by a tab: the first field denotes the base name of the waveform file; and the second field denotes the prompt used to record the utterance. A pronouncing dictionary developed by Dr. Sheila Ackerlind with help from cadet Sterling Packer is provided in the file SANTIAGO.txt. The speech was collected using four laptop computers running MS Windows. Three of the computers recorded with a 16 bit data size and sampling rate of 22050 Hz, the other laptop recorded with an 8 bit data size at a sampling rate of 11025 Hz. The recording script presented a visual display of the sentence to be recorded. The informant pressed a key and spoke the sentence. The recording was played back for review, allowing the utterance to be re-recorded. A member of the data collection team was present during the recording session to verify recordings and to provide technical assistance in case of malfunctioning equipment. *Samples* For an example of speech contained in this corpus, please listen to this audio sample (MS Wave format).

Extent: Corpus size: 1363148 KB

Format: Sampling Rate: 22050

Sampling Format: pcm

Identifier: LDC2008S04

https://catalog.ldc.upenn.edu/LDC2008S04

ISBN: 1-58563-471-9

ISLRN: 563-271-238-124-2

DOI: 10.35111/yjwc-zx48

Language: Portuguese

Language (ISO639): por

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2008S04

Rights Holder: Portions © 1999, 2004 United States Military Academy, © 2008 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2008S04

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Morgan, John; Ackerlind, Sheila; Packer, Sterling. 2008. Linguistic Data Consortium.
Terms: area_Europe country_PT dcmi_Sound iso639_por olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2008S04
Up-to-date as of: Wed Oct 29 7:01:02 EDT 2025

Metadata
Title:		West Point Brazilian Portuguese Speech
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Morgan, John, Sheila Ackerlind, and Sterling Packer. West Point Brazilian Portuguese Speech LDC2008S04. Web Download. Philadelphia: Linguistic Data Consortium, 2008
Contributor:		Morgan, John
		Ackerlind, Sheila
		Packer, Sterling
Date (W3CDTF):		2008
Date Issued (W3CDTF):		2008-05-19
Description:		Introduction West Point Brazilian Portuguese Speech is a database of digital recordings of approximately 1.6 hours of spoken Brazilian Portuguese designed and collected by staff and faculty of the Department of Foreign Languages (DFL) and Center for Technology Enhanced Language Learning (CTELL) to develop acoustic models for speech recognition systems. The U.S. government uses such systems to provide speech-recognition enhanced language learning courseware to government linguists and students enrolled in various government language programs. The data in this corpus was collected in March 1999 in Brasilia, Brazil using informants from a Brazilian military academy. The corpus consists of read speech from 60 female and 68 male native and non-native speakers. The speech was elicited from a prompt script containing 296 sentences and phrases typically used in language learning situations. The prompts are listed in the file prompts.txt. Each line of this file has two fields separated by a tab: the first field denotes the base name of the waveform file; and the second field denotes the prompt used to record the utterance. A pronouncing dictionary developed by Dr. Sheila Ackerlind with help from cadet Sterling Packer is provided in the file SANTIAGO.txt. The speech was collected using four laptop computers running MS Windows. Three of the computers recorded with a 16 bit data size and sampling rate of 22050 Hz, the other laptop recorded with an 8 bit data size at a sampling rate of 11025 Hz. The recording script presented a visual display of the sentence to be recorded. The informant pressed a key and spoke the sentence. The recording was played back for review, allowing the utterance to be re-recorded. A member of the data collection team was present during the recording session to verify recordings and to provide technical assistance in case of malfunctioning equipment. Samples For an example of speech contained in this corpus, please listen to this audio sample (MS Wave format).
Extent:		Corpus size: 1363148 KB
Format:		Sampling Rate: 22050
Format:		Sampling Format: pcm
Identifier:		LDC2008S04
		https://catalog.ldc.upenn.edu/LDC2008S04
		ISBN: 1-58563-471-9
		ISLRN: 563-271-238-124-2
		DOI: 10.35111/yjwc-zx48
Language:		Portuguese
Language (ISO639):		por
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2008S04
Rights Holder:		Portions © 1999, 2004 United States Military Academy, © 2008 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2008S04
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Morgan, John; Ackerlind, Sheila; Packer, Sterling. 2008. Linguistic Data Consortium.
Terms:		area_Europe country_PT dcmi_Sound iso639_por olac_primary_text