OLAC Record
oai:www.ldc.upenn.edu:LDC97S44

Metadata
Title:1996 English Broadcast News Speech (HUB4)
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Graff, David, et al. 1996 English Broadcast News Speech (HUB4) LDC97S44. Web Download. Philadelphia: Linguistic Data Consortium, 1997
Contributor:Graff, David
Garofolo, John S.
Fiscus, Jonathan G.
Fisher, William
Pallett, David
Date (W3CDTF):1997
Description:LDC97S44 - Speech data LDC97S66 - Dev and eval LDC97T22 - Transcripts *Introduction* The 1996 Broadcast News Speech Corpus contains a total of 104 hours of broadcasts from ABC, CNN and CSPAN television networks and NPR and PRI radio networks with corresponding transcripts. The primary motivation for this collection is to provide training data for the DARPA "HUB4" Project on continuous speech recognition in the broadcast domain. *Data* The speech files are available as a training data set, development data and evaluation data. The following programs are represented in this corpus: * ABC Nightline * ABC World Nightly News * ABC World News Tonight * CNN Early Edition * CNN Early Prime News * CNN Headline News * CNN Prime Time News * CNN The World Today * CSPAN Washington Journal * NPR All Things Considered * NPR Marketplace Transcripts have been made of all recordings in this publication, manually time aligned to the phrasal level, annotated to identify boundaries between news stories, speaker turn boundaries and gender information about the speakers. The released version of the transcripts is in SGML format and there is accompanying documentation and an SGML DTD file, included with the transcription release. The transcripts are available via FTP. *Updates* There are no updates at this time. *Samples* * audio(MS Wave format). *Additional Licensing Instructions* This 'members-only' corpora is available to current members who can request the data at the listed reduced-license fee. Contact ldc@ldc.upenn.edu for information about becoming a member.
Extent:Corpus size: 11915951 KB
Format:Sampling Rate: 16000
Sampling Format: 1-channel pcm
Identifier:LDC97S44
https://catalog.ldc.upenn.edu/LDC97S44
ISBN: 1-58563-109-4
ISLRN: 876-519-945-577-5
DOI: 10.35111/hvkc-4n95
Language:English
Language (ISO639):eng
License:NPR and USC Archive User Agreement: https://catalog.ldc.upenn.edu/license/npr-and-usc-archive-user-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC97S44
Rights Holder: Portions © 1996 American Broadcasting Company, Inc., Cable News Network, LP, LLLP, National Cable Satellite Corporation, National Public Radio, Inc., The University of Southern California, USC Radio and Marketplace, Trustees of the University of Pennsylvania
Type (DCMI):Sound
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC97S44
DateStamp:  2020-11-30
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Graff, David; Garofolo, John S.; Fiscus, Jonathan G.; Fisher, William; Pallett, David. 1997. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC97S44
Up-to-date as of: Mon Mar 25 7:20:01 EDT 2024