OLAC Record: TDT5 Topics and Annotations

OLAC Record
oai:www.ldc.upenn.edu:LDC2006T19

Metadata

Title: TDT5 Topics and Annotations

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Glenn, Meghan, et al. TDT5 Topics and Annotations LDC2006T19. Web Download. Philadelphia: Linguistic Data Consortium, 2006

Contributor: Glenn, Meghan

Strassel, Stephanie

Kong, Junbo

Maeda, Kazuaki

Date (W3CDTF): 2006

Date Issued (W3CDTF): 2006-12-19

Description: *Introduction* TDT5 Topics and Annotations was developed by the Linguistic Data Consortium (LDC) and includes about 10,000 topic relevance judgments and other associated information for the TDT5 2004 evaluation topics. This release contains complete relevance judgments, including the results of adjudication, in which discrepancies between system submissions and LDC annotations are reviewed and relevance judgments updated. This release also contains answer keys for the link detection task. The TDT5 corpora were created by Linguistic Data Consortium with support from the DARPA TIDES (Translingual Information Detection, Extraction and Summarization) Program. The multilingual news text corresponding to this publication can be found in TDT5 Multilingual News Text (LDC2006T18). *Data* A total of 250 topics, numbered 55001 - 55250, were annotated by LDC using a search guided annotation technique. Details of the annotation process are described in the annotation task definition. Approximately 25% of the topics are monolingual English (ENG), 25% are monolingual Mandarin Chinese (MAN), 25% are monolingual Arabic (ARB), and 25% are multilingual: 63 ENG 62 MAN 62 ARB 35 ARB ENG MAN 21 ENG MAN 7 ARB ENG 250 total Broken down by language and counting both mono- and multi-lingual topics: 126 ENG 118 MAN 104 ARB *Samples* For an example of the data in this corpus, please review this sample (TXT) from the link detection files. *Updates* None at this time.

Extent: Corpus size: 80896 KB

Identifier: LDC2006T19

https://catalog.ldc.upenn.edu/LDC2006T19

ISBN: 1-58563-418-2

ISLRN: 396-836-683-088-8

DOI: 10.35111/fjaq-y976

Language: English

Mandarin Chinese

Standard Arabic

Language (ISO639): eng

cmn

arb

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2006T19

Rights Holder: Portions © 2004, 2006 Trustees of the University of Pennsylvania

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2006T19

DateStamp: 2021-02-18

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Glenn, Meghan; Strassel, Stephanie; Kong, Junbo; Maeda, Kazuaki. 2006. Linguistic Data Consortium.
Terms: area_Asia area_Europe country_CN country_GB country_SA dcmi_Text iso639_arb iso639_cmn iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2006T19
Up-to-date as of: Wed Oct 29 7:00:52 EDT 2025

Metadata
Title:		TDT5 Topics and Annotations
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Glenn, Meghan, et al. TDT5 Topics and Annotations LDC2006T19. Web Download. Philadelphia: Linguistic Data Consortium, 2006
Contributor:		Glenn, Meghan
		Strassel, Stephanie
		Kong, Junbo
		Maeda, Kazuaki
Date (W3CDTF):		2006
Date Issued (W3CDTF):		2006-12-19
Description:		Introduction TDT5 Topics and Annotations was developed by the Linguistic Data Consortium (LDC) and includes about 10,000 topic relevance judgments and other associated information for the TDT5 2004 evaluation topics. This release contains complete relevance judgments, including the results of adjudication, in which discrepancies between system submissions and LDC annotations are reviewed and relevance judgments updated. This release also contains answer keys for the link detection task. The TDT5 corpora were created by Linguistic Data Consortium with support from the DARPA TIDES (Translingual Information Detection, Extraction and Summarization) Program. The multilingual news text corresponding to this publication can be found in TDT5 Multilingual News Text (LDC2006T18). Data A total of 250 topics, numbered 55001 - 55250, were annotated by LDC using a search guided annotation technique. Details of the annotation process are described in the annotation task definition. Approximately 25% of the topics are monolingual English (ENG), 25% are monolingual Mandarin Chinese (MAN), 25% are monolingual Arabic (ARB), and 25% are multilingual: 63 ENG 62 MAN 62 ARB 35 ARB ENG MAN 21 ENG MAN 7 ARB ENG 250 total Broken down by language and counting both mono- and multi-lingual topics: 126 ENG 118 MAN 104 ARB Samples For an example of the data in this corpus, please review this sample (TXT) from the link detection files. Updates None at this time.
Extent:		Corpus size: 80896 KB
Identifier:		LDC2006T19
		https://catalog.ldc.upenn.edu/LDC2006T19
		ISBN: 1-58563-418-2
		ISLRN: 396-836-683-088-8
		DOI: 10.35111/fjaq-y976
Language:		English
		Mandarin Chinese
		Standard Arabic
Language (ISO639):		eng
		cmn
		arb
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2006T19
Rights Holder:		Portions © 2004, 2006 Trustees of the University of Pennsylvania
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2006T19
DateStamp:		2021-02-18
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Glenn, Meghan; Strassel, Stephanie; Kong, Junbo; Maeda, Kazuaki. 2006. Linguistic Data Consortium.
Terms:		area_Asia area_Europe country_CN country_GB country_SA dcmi_Text iso639_arb iso639_cmn iso639_eng olac_primary_text