OLAC Record: 2017 NIST OpenSAT Pilot

OLAC Record
oai:www.ldc.upenn.edu:LDC2022S01

Metadata

Title: 2017 NIST OpenSAT Pilot - SSSF

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Byers, Frederick. 2017 NIST OpenSAT Pilot - SSSF LDC2022S01. Web Download. Philadelphia: Linguistic Data Consortium, 2022

Contributor: Byers, Frederick

Date (W3CDTF): 2022

Date Issued (W3CDTF): 2022-01-18

Description: *Introduction* 2017 NIST OpenSAT Pilot - SSSF was developed by NIST (National Institute of Standards and Technology) and contains approximately one hour of operational speech data, transcripts and annotation files used in the speech activity detection, automatic speech recognition (ASR), and keyword search (KWS) tasks of the 2017 OpenSAT Pilot evaluation. The source audio consists of radio and telephone dispatches during the Sofa Super Store fire (Charleston, South Carolina) in June 2007 (SSSF), which claimed the lives of nine firefighters. These recordings contain content that some may find disturbing. The NIST Open Speech Analytic Technologies (OpenSAT) Evaluation Series was designed to bring together researchers developing different types of technologies to address speech analytic challenges present in some of the most difficult acoustic conditions with the end goal of improving the state-of-the-art through objective, large-scale common evaluations. The 2017 pilot focused on the public safety communications domain. The SSSF audio represents real-world, fire response, operational data with multiple challenges for system analytics, such as land-mobile-radio transmission effects, significant background noise, speech under stress and variable decibel levels. See the OpenSAT website for more information. *Data* This dataset was created from the audio and logs of SSSF radio and telephone dispatches and transcripts of those dispatches. The transcripts were re-annotated and transformed by NIST into the formats required to provide a reference key for scoring system output in the pilot OpenSAT evaluation. The data is divided into a 30-minute development set and a 30-minute evaluation set. Audio is presented as 16 bit, 8kHz, NIST SPHERE format files. Accompanying reference files are divided by analytic tasks utilized in the OpenSAT Pilot and are UTF-8 encoded text or XML files. ASR and KWS scoring tools are also included. *Samples* Please view the following samples: * Audio Sample (SPH) * Transcript Sample (TXT) * STM Sample (TXT) * Annotation Sample (TXT) *Updates* None at this time.

Extent: Corpus size: 60459 KB

Format: Sampling Rate: 8000

Sampling Format: pcm

Identifier: LDC2022S01

https://catalog.ldc.upenn.edu/LDC2022S01

ISBN: 1-58563-983-4

ISLRN: 847-094-281-048-4

DOI: 10.35111/4fw7-wy71

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2022S01

Rights Holder: Portions © 2022 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2022S01

DateStamp: 2023-08-03

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Byers, Frederick. 2022. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound dcmi_Text iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2022S01
Up-to-date as of: Wed Oct 29 7:02:07 EDT 2025

Metadata
Title:		2017 NIST OpenSAT Pilot - SSSF
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Byers, Frederick. 2017 NIST OpenSAT Pilot - SSSF LDC2022S01. Web Download. Philadelphia: Linguistic Data Consortium, 2022
Contributor:		Byers, Frederick
Date (W3CDTF):		2022
Date Issued (W3CDTF):		2022-01-18
Description:		Introduction 2017 NIST OpenSAT Pilot - SSSF was developed by NIST (National Institute of Standards and Technology) and contains approximately one hour of operational speech data, transcripts and annotation files used in the speech activity detection, automatic speech recognition (ASR), and keyword search (KWS) tasks of the 2017 OpenSAT Pilot evaluation. The source audio consists of radio and telephone dispatches during the Sofa Super Store fire (Charleston, South Carolina) in June 2007 (SSSF), which claimed the lives of nine firefighters. These recordings contain content that some may find disturbing. The NIST Open Speech Analytic Technologies (OpenSAT) Evaluation Series was designed to bring together researchers developing different types of technologies to address speech analytic challenges present in some of the most difficult acoustic conditions with the end goal of improving the state-of-the-art through objective, large-scale common evaluations. The 2017 pilot focused on the public safety communications domain. The SSSF audio represents real-world, fire response, operational data with multiple challenges for system analytics, such as land-mobile-radio transmission effects, significant background noise, speech under stress and variable decibel levels. See the OpenSAT website for more information. Data This dataset was created from the audio and logs of SSSF radio and telephone dispatches and transcripts of those dispatches. The transcripts were re-annotated and transformed by NIST into the formats required to provide a reference key for scoring system output in the pilot OpenSAT evaluation. The data is divided into a 30-minute development set and a 30-minute evaluation set. Audio is presented as 16 bit, 8kHz, NIST SPHERE format files. Accompanying reference files are divided by analytic tasks utilized in the OpenSAT Pilot and are UTF-8 encoded text or XML files. ASR and KWS scoring tools are also included. Samples Please view the following samples: * Audio Sample (SPH) * Transcript Sample (TXT) * STM Sample (TXT) * Annotation Sample (TXT) Updates None at this time.
Extent:		Corpus size: 60459 KB
Format:		Sampling Rate: 8000
Format:		Sampling Format: pcm
Identifier:		LDC2022S01
		https://catalog.ldc.upenn.edu/LDC2022S01
		ISBN: 1-58563-983-4
		ISLRN: 847-094-281-048-4
		DOI: 10.35111/4fw7-wy71
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2022S01
Rights Holder:		Portions © 2022 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2022S01
DateStamp:		2023-08-03
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Byers, Frederick. 2022. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound dcmi_Text iso639_eng olac_primary_text