OLAC Record
oai:www.ldc.upenn.edu:LDC2004S07

Metadata
Title:Switchboard Cellular Part 2 Audio
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Graff, David, Kevin Walker, and David Miller. Switchboard Cellular Part 2 Audio LDC2004S07. Web Download. Philadelphia: Linguistic Data Consortium, 2004
Contributor:Graff, David
Walker, Kevin
Miller, David
Date (W3CDTF):2004
Date Issued (W3CDTF):2004-10-26
Description:*Introduction* Switchboard Cellular Part 2 Audio was developed by the Linguistic Data Consortium (LDC) and consists of approximately 200 hours of English telephone conversations collected by LDC in 2000. The Switchboard cellular collection focused primarily on cellular phone technology of all service types. The goal was to target 200 subjects balanced by gender to participate in 10 or more five- to six-minute conversations on cellular phones. The speech data was collected for research, development, and evaluation of automatic systems for speech-to-text conversion, speaker identification, language identification, and speech signal detection purposes. *Data* During the study period, LDC collected a total of 2,020 calls, or 4,040 sides (2,950 cellular). Here is a gender breakdown of the participant pool and call sides collected: Gender Participants Sides Female 250 2,405 Male 169 1,635 Total 419 4,040 Each speech file consists of a 1,024-byte ASCII-formatted Sphere header, followed by two-channel interleaved mu-law sample data. The mu-law samples represent the actual digital data transmission from the telephone service provider (MCI), as captured separately for each side of the telephone conversation by LDC's telephone collection platform. The header also indicates the caller_pin, callee_pin, topic_id, cellular service/handset information and speaker demographic information. The data files are not compressed. This release contains speech data files with documentation describing speaker information (sex, age, education, city and state where raised), call information (date, time, call duration, Personal Identification Numbers, topic), and audit information (channel quality, background noise). The documentation also contains reports on clipped files. Other releases in this series include: * Switchboard Cellular Part 1 Audio (LDC2001S13) * Switchboard Cellular Part 1 Transcribed Audio (LDC2001S15) * Switchboard Cellular Part 1 Transcription (LDC2001T14) *Sample* Please examine this example audio file to review a sample of this corpus. *Updates* There are no updates available at this time.
Extent:Corpus size: 11534336 KB
Format:Sampling Rate: 8000
Sampling Format: 2-channel ulaw
Identifier:LDC2004S07
https://catalog.ldc.upenn.edu/LDC2004S07
ISBN: 1-58563-297-x
ISLRN: 047-363-770-147-0
DOI: 10.35111/mgp6-4j96
Language:English
Language (ISO639):eng
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2004S07
Rights Holder:Portions © 2000, 2004 Trustees of the University of Pennsylvania
Type (DCMI):Sound
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2004S07
DateStamp:  2024-03-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Graff, David; Walker, Kevin; Miller, David. 2004. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2004S07
Up-to-date as of: Sat Mar 30 18:24:56 EDT 2024