OLAC Record
oai:www.ldc.upenn.edu:LDC2025S07

Metadata
Title:Mixer 6 - CHiME 8 Transcribed Calls and Interviews
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Wiesner, Matthew, et al. Mixer 6 - CHiME 8 Transcribed Calls and Interviews LDC2025S07. Web Download. Philadelphia: Linguistic Data Consortium, 2025
Contributor:Wiesner, Matthew
Raj, Desh
Maciejewski, Matthew
Haviland, Chloe
Cornell, Samuele
Chodroff, Eleanor
Khudanpur, Sanjeev
Godfrey, Jack
Date (W3CDTF):2025
Date Issued (W3CDTF):2025-08-15
Description:*Introduction* Mixer 6 - CHiME 8 Transcribed Calls and Interviews was developed for the 7th and 8th CHiME (Computational Hearing in Multisource Environments) challenges. It contains 80 hours of English interviews and telephone speech from Mixer 6 Speech (LDC2013S03) with transcripts developed for the CHiME challenges and divided into training, development and test sets. This data was used in CHiME 7 Task 1 and CHiME 8 Task 1 both of which focused on transcription and segmentation across varied recording conditions such as interviews, meetings, and dinner parties, with an emphasis on generalization across recording device types and array topologies. Mixer 6 Speech was developed by the Linguistic Data Consortium (LDC) and comprises 15,863 hours of audio recordings of interviews, transcript readings and conversational telephone speech involving 594 distinct native English speakers recorded over 14 channels. This material was collected by LDC in 2009 and 2010 as part of the Mixer project, specifically phase 6, the focus of which was on native American English speakers local to the Philadelphia area. *Data* The data includes audio from Mixer 6 Speech recorded on 13 microphones for a total of 1063 hours corresponding to 80 hours of speech. The development and test splits are speaker-disjoint from the training data and consist of fully transcribed, multi-microphone interviews. The transcripts were developed in three phases: (1) manual transcription, segmentation and automatic alignment with speech; (2) splitting sessions into sets; and (3) splitting certain sessions from the training set. Each segment was labeled with the speaker, the uttered text, and the start and end times in seconds for that segment. Audio data is provided as 16 bit FLAC files sampled at 16kHz. Transcripts are released as UTF-8 encoded JSON files. *Samples* Please view the following samples: * Speech Audio (FLAC) * Transcripts (JSON) *Updates* No updates at this time.
Extent:Corpus size: 108000000 KB
Format:Sampling Rate: 16000
Sampling Format: 16-bit FLAC
Identifier:LDC2025S07
https://catalog.ldc.upenn.edu/LDC2025S07
ISLRN: 017-424-674-662-6
DOI: 10.35111/pk0y-qp29
Language:English
Language (ISO639):eng
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2025S07
Rights Holder:Portions © 2009-2010, 2013, 2025 Trustees of the University of Pennsylvania
Subject:English language
Subject (ISO639):eng
Subject (OLAC):text_and_corpus_linguistics
Type (DCMI):Sound
Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2025S07
DateStamp:  2025-08-15
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Wiesner, Matthew; Raj, Desh; Maciejewski, Matthew; Haviland, Chloe; Cornell, Samuele; Chodroff, Eleanor; Khudanpur, Sanjeev; Godfrey, Jack. 2025. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound dcmi_Text iso639_eng olac_primary_text olac_text_and_corpus_linguistics

Inferred Metadata

Country: United Kingdom
Area: Europe


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2025S07
Up-to-date as of: Sat Aug 16 1:46:09 EDT 2025