OLAC Record: Columbia Games Corpus

OLAC Record
oai:www.ldc.upenn.edu:LDC2021S02

Metadata

Title: Columbia Games Corpus

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Hirschberg, Julia, et al. Columbia Games Corpus LDC2021S02. Web Download. Philadelphia: Linguistic Data Consortium, 2021

Contributor: Hirschberg, Julia

Gravano, Agustin

Benus, Stefan

Ward, Gregory

German, Elisa Sneed

Date (W3CDTF): 2021

Date Issued (W3CDTF): 2021-03-15

Description: *Introduction* Columbia Games Corpus was developed by the Spoken Language Group, Columbia University and the Department of Linguistics, Northwestern University. It consists of approximately 10 hours of spontaneous English conversation along with corresponding orthographic transcripts and annotation. Speech recordings are comprised of two subjects playing a series of computer games requiring verbal communication to achieve joint goals of identifying and moving images on the screen to reach a combined number of points. Each player used a separate laptop computer and could not see the screen of the other player. Participants played two games: the Cards Game and the Objects Game. In the Cards Game, one participant described a card and depending on the task in the game, the second participant searched for the described card or tried to match it from cards shown on their screen. In the Objects Game, each player's screen displayed 5-7 objects, one of which was the target object. One player described the target object's location on their screen, and the other player tried to move that object to the same position on their screen. *Data* Over 12 sessions conducted in 2004, 13 subjects (six female, seven male) participated in the collection. Sessions contained an average of 45 minutes of dialogue. Each recording has corresponding manually time-aligned orthographic transcripts, affirmative cue words discourse annotation, and turn-taking annotation. Annotation guidelines are included in this release. Task files for each game are also included for each recording. Audio data was recorded at a sample rate of 48kHz with 16-bit precision, and later converted to 16kHz, single channel FLAC compressed WAV. All text data is encoded in UTF-8. *Samples* Please view these samples: * Audio (FLAC) * Turns (TXT) * Words (TXT) * Discourse Markers (TXT) *Updates* None at this time.

Extent: Corpus size: 952980 KB

Format: Sampling Rate: 16000

Sampling Format: pcm

Identifier: LDC2021S02

https://catalog.ldc.upenn.edu/LDC2021S02

ISBN: 1-58563-960-5

ISLRN: 834-843-130-497-9

DOI: 10.35111/ayn3-sp31

Language: English

Language (ISO639): eng

License: Columbia Games Corpus Agreement: https://catalog.ldc.upenn.edu/license/columbia-games-corpus-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2021S02

Rights Holder: Portions © 2021 The Trustees of Columbia University, © 2021 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2021S02

DateStamp: 2022-01-01

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Hirschberg, Julia; Gravano, Agustin; Benus, Stefan; Ward, Gregory; German, Elisa Sneed. 2021. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound dcmi_Text iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2021S02
Up-to-date as of: Wed Oct 29 7:02:04 EDT 2025

Metadata
Title:		Columbia Games Corpus
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Hirschberg, Julia, et al. Columbia Games Corpus LDC2021S02. Web Download. Philadelphia: Linguistic Data Consortium, 2021
Contributor:		Hirschberg, Julia
		Gravano, Agustin
		Benus, Stefan
		Ward, Gregory
		German, Elisa Sneed
Date (W3CDTF):		2021
Date Issued (W3CDTF):		2021-03-15
Description:		Introduction Columbia Games Corpus was developed by the Spoken Language Group, Columbia University and the Department of Linguistics, Northwestern University. It consists of approximately 10 hours of spontaneous English conversation along with corresponding orthographic transcripts and annotation. Speech recordings are comprised of two subjects playing a series of computer games requiring verbal communication to achieve joint goals of identifying and moving images on the screen to reach a combined number of points. Each player used a separate laptop computer and could not see the screen of the other player. Participants played two games: the Cards Game and the Objects Game. In the Cards Game, one participant described a card and depending on the task in the game, the second participant searched for the described card or tried to match it from cards shown on their screen. In the Objects Game, each player's screen displayed 5-7 objects, one of which was the target object. One player described the target object's location on their screen, and the other player tried to move that object to the same position on their screen. Data Over 12 sessions conducted in 2004, 13 subjects (six female, seven male) participated in the collection. Sessions contained an average of 45 minutes of dialogue. Each recording has corresponding manually time-aligned orthographic transcripts, affirmative cue words discourse annotation, and turn-taking annotation. Annotation guidelines are included in this release. Task files for each game are also included for each recording. Audio data was recorded at a sample rate of 48kHz with 16-bit precision, and later converted to 16kHz, single channel FLAC compressed WAV. All text data is encoded in UTF-8. Samples Please view these samples: * Audio (FLAC) * Turns (TXT) * Words (TXT) * Discourse Markers (TXT) Updates None at this time.
Extent:		Corpus size: 952980 KB
Format:		Sampling Rate: 16000
Format:		Sampling Format: pcm
Identifier:		LDC2021S02
		https://catalog.ldc.upenn.edu/LDC2021S02
		ISBN: 1-58563-960-5
		ISLRN: 834-843-130-497-9
		DOI: 10.35111/ayn3-sp31
Language:		English
Language (ISO639):		eng
License:		Columbia Games Corpus Agreement: https://catalog.ldc.upenn.edu/license/columbia-games-corpus-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2021S02
Rights Holder:		Portions © 2021 The Trustees of Columbia University, © 2021 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2021S02
DateStamp:		2022-01-01
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Hirschberg, Julia; Gravano, Agustin; Benus, Stefan; Ward, Gregory; German, Elisa Sneed. 2021. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound dcmi_Text iso639_eng olac_primary_text