OLAC Record

Title:American English Spoken Lexicon
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Seidl-Friedman, Amanda Hallie, Masato Kobayashi, and Christopher Cieri. American English Spoken Lexicon LDC99L23. Web Download. Philadelphia: Linguistic Data Consortium, 1999
Contributor:Seidl-Friedman, Amanda Hallie
Kobayashi, Masato
Cieri, Christopher
Date (W3CDTF):1999
Description:*Introduction* This lexicon contains pronunciations captured in individual audio files for 53,602 of the most common words in English. *Data* 50,892 words were chosen from LDC's CALLHOME American English Lexicon on the basis of their frequency in the data that were used in creating the 1994 CSR Language Model Text Corpus ("CSR-III Text Corpus," LDC95T6). The sources for the language model include Wall Street Journal (1987-1994), Associated Press (1989-1991), and San Jose Mercury News (1991); all taken from the three CD-ROM volumes of TIPSTER (LDC93T3A). To extend the coverage of common words that happen not to occur in the LDC corpora sampled, an additional 2,922 words (ie. compounds, companies, places, languages, and numerals) were added from other sources. Each word was read by the speaker in a quiet recording studio, using a Sennheiser HMD 410 microphone and a Sony DAT recorder. The recordings were downsampled to 16KHz for storage on disk with the individual lexical utterances segmented into separate waveform files, with a consistent margin of silence on both sides of each word. The CD-ROMs were created using the ISO-9660 Level 2 data format, along with Rock Ridge extensions. All common computer operating systems should be able to read the full-length file names. The corpus has since been converted to a web downloaded file. *Updates* There are no updates at this time.
ISBN: 1-58563-156-6
ISLRN: 238-033-984-489-7
Language (ISO639):eng
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC99L23
Subject:English language
Subject (ISO639):eng
Type (DCMI):Text
Type (OLAC):lexicon


Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC99L23
DateStamp:  2018-04-25
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Seidl-Friedman, Amanda Hallie; Kobayashi, Masato; Cieri, Christopher. 1999. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_lexicon

Inferred Metadata

Country: United Kingdom
Area: Europe

Up-to-date as of: Sat Aug 11 0:39:06 EDT 2018