OLAC Record: 2000 Communicator Dialogue Act Tagged

OLAC Record
oai:www.ldc.upenn.edu:LDC2004T15

Metadata

Title: 2000 Communicator Dialogue Act Tagged

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Prasad, Rashmi, and Marilyn Walker. 2000 Communicator Dialogue Act Tagged LDC2004T15. Web Download. Philadelphia: Linguistic Data Consortium, 2004

Contributor: Prasad, Rashmi

Walker, Marilyn

Date (W3CDTF): 2004

Date Issued (W3CDTF): 2004-06-15

Description: *Introduction* 2000 Communicator Dialogue Act Tagged was developed by the Linguistic Data Consortium (LDC) and contains approximately 314,000 words of system and user interactions with entity and dialogue act tagging. This release is an addendum to 2000 Communicator Evaluation (LDC2002S56) developed by LDC in 2002. This addendum contains annotations on the transcriptions of the system and user utterances as taken from the log files of LDC2002S56. Dialogue Act annotations are provided for system utterances in the dialogues. The dialogue act tags follow the DATE (Dialogue Act Tagging for Evaluation) scheme. In addition, both system and user utterances are tagged for named entities. For further info on the 2000 Communicator Evaluation corpus, please refer to the main publication from 2002 linked above. *Data* The complete Dialogue Act annotated corpus is available as a single XML text file totalling approximately 16 MB. Dialogue Act tagging was done automatically via pattern matching with human-labeled dialogue utterances used by the nine different participating Communicator Systems. Named entity tagging also followed the same methodology. Here is the breakdown for dialogues and dialogue acts: Dialogues Dialogue Acts Tagged Dialogue Acts Unique Tags 648 22,752 22,701 61 Each dialogue is segmented into system and user turns. Except for one system, no utterance segmentation was done within the turns in the log files. The number of utterances is therefore the same as the number of turns. Utterance segmentation is carried out and reflected as the dialogue act segmentation. Here is a breakdown of the distribution of turns, and words: System User Total Turns 13,013 11,715 24,728 Words 275,938 38,285 314,223 The release also includes the raw transcripts from the dialogues. *Samples* For an example of the data in this corpus, please view this sample (TXT). *Sponsorship* This research was conducted using funding from the following grant number and funding agency: DARPA - contract MDA972-99-3-0003. *Updates* None at this time.

Identifier: LDC2004T15

https://catalog.ldc.upenn.edu/LDC2004T15

ISBN: 1-58563-305-4

ISLRN: 451-626-470-363-6

DOI: 10.35111/sp5p-5637

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2004T15

Rights Holder: Portions © 2004 Trustees of the University of Pennsylvania

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2004T15

DateStamp: 2022-04-08

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Prasad, Rashmi; Walker, Marilyn. 2004. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2004T15
Up-to-date as of: Wed Oct 29 7:00:22 EDT 2025

Metadata
Title:		2000 Communicator Dialogue Act Tagged
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Prasad, Rashmi, and Marilyn Walker. 2000 Communicator Dialogue Act Tagged LDC2004T15. Web Download. Philadelphia: Linguistic Data Consortium, 2004
Contributor:		Prasad, Rashmi
Contributor:		Walker, Marilyn
Date (W3CDTF):		2004
Date Issued (W3CDTF):		2004-06-15
Description:		Introduction 2000 Communicator Dialogue Act Tagged was developed by the Linguistic Data Consortium (LDC) and contains approximately 314,000 words of system and user interactions with entity and dialogue act tagging. This release is an addendum to 2000 Communicator Evaluation (LDC2002S56) developed by LDC in 2002. This addendum contains annotations on the transcriptions of the system and user utterances as taken from the log files of LDC2002S56. Dialogue Act annotations are provided for system utterances in the dialogues. The dialogue act tags follow the DATE (Dialogue Act Tagging for Evaluation) scheme. In addition, both system and user utterances are tagged for named entities. For further info on the 2000 Communicator Evaluation corpus, please refer to the main publication from 2002 linked above. Data The complete Dialogue Act annotated corpus is available as a single XML text file totalling approximately 16 MB. Dialogue Act tagging was done automatically via pattern matching with human-labeled dialogue utterances used by the nine different participating Communicator Systems. Named entity tagging also followed the same methodology. Here is the breakdown for dialogues and dialogue acts: Dialogues Dialogue Acts Tagged Dialogue Acts Unique Tags 648 22,752 22,701 61 Each dialogue is segmented into system and user turns. Except for one system, no utterance segmentation was done within the turns in the log files. The number of utterances is therefore the same as the number of turns. Utterance segmentation is carried out and reflected as the dialogue act segmentation. Here is a breakdown of the distribution of turns, and words: System User Total Turns 13,013 11,715 24,728 Words 275,938 38,285 314,223 The release also includes the raw transcripts from the dialogues. Samples For an example of the data in this corpus, please view this sample (TXT). Sponsorship This research was conducted using funding from the following grant number and funding agency: DARPA - contract MDA972-99-3-0003. Updates None at this time.
Identifier:		LDC2004T15
		https://catalog.ldc.upenn.edu/LDC2004T15
		ISBN: 1-58563-305-4
		ISLRN: 451-626-470-363-6
		DOI: 10.35111/sp5p-5637
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2004T15
Rights Holder:		Portions © 2004 Trustees of the University of Pennsylvania
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2004T15
DateStamp:		2022-04-08
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Prasad, Rashmi; Walker, Marilyn. 2004. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Text iso639_eng olac_primary_text