OLAC Record
oai:www.ldc.upenn.edu:LDC2025T11

Metadata
Title:KAIROS Phase 1 Quizlet
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Chen, Song, et al. KAIROS Phase 1 Quizlet LDC2025T11. Web Download. Philadelphia: Linguistic Data Consortium, 2025
Contributor:Chen, Song
Bies, Ann
Mott, Justin
Caruso, Christopher
Tracey, Jennifer
Strassel, Stephanie
Date (W3CDTF):2025
Date Issued (W3CDTF):2025-08-15
Description:*Introduction* KAIROS Phase 1 Quizlet was developed by the Linguistic Data Consortium (LDC). It contains English and Spanish text, video and image data and annotations used for pre-evaluation research and system development during Phase 1 of the DARPA KAIROS program. KAIROS Quizlets were a series of narrowly defined tasks designed to explore specific evaluation objectives enabling KAIROS system developers to exercise individual system components on a small data set prior to the full program evaluation. This corpus contains the complete set of Quizlet data used in Phase 1 which focused on two real-world complex events (CEs) within the Improvised Explosive Device bombing scenario: CE1001 (2018 Caracas drone attack) and CE1002 (Utah High School backpack bombing). The DARPA KAIROS (Knowledge-directed Artificial Intelligence Reasoning Over Schemas) program aimed to build technology capable of understanding and reasoning about complex real-world events in order to provide actionable insights to end users. KAIROS systems utilized formal event representations in the form of schema libraries that specified the steps, preconditions and constraints for an open set of complex events; schemas were then used in combination with event extraction to characterize and make predictions about real-world events in a large multilingual, multimedia corpus. *Data* Four quizlets were developed in Phase 1. In additon to the source documents, this release contains the contents of Quizlet 3 (graph G annotation generated with manual annotation) and Quizlet 4 (source documents, manual annotation, updated graph G). Quizlet 1 (evaluation task introduction) did not require data or annotation and is not included in this release. Quizlet 2 (schema generation and instantiation) used source documents but did not include annotation. Source data was collected from the web; 30 root web pages were collected and processed, yielding 29 text data files, 216 image files and 5 video files. Annotation steps included labeling scenario-relevant events and relations for each document to develop a structured representation of temporally ordered events, relations and arguments and to generate a reference knowledge graph. Source data is presented in various formats: .gif, .jpg,. ltf, .mp4, .png, .psm, and .svg. Annotations are presented as tab separated files (.tab) for temporal ordering, relations, events, and arguments. *Samples* Please view these samples: * Argument Annotations (.tab) * Graph G (.json) * PSM (.xml) * LTF (.xml) *Sponsorship* KAIROS was sponsored by the Air Force Research Laboratory (AFRL) and the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-19-S-0014. *Updates* No updates at this time.
Extent:Corpus size: 125 KB
Identifier:LDC2025T11
https://catalog.ldc.upenn.edu/LDC2025T11
ISLRN: 357-044-554-407-1
DOI: 10.35111/rcba-vb61
Language:English
Spanish
Language (ISO639):eng
spa
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2025T11
Rights Holder:Portions © 2019 Boston Globe Media Partners, LLC, © 2019 Critical Threats Project, © 2020 El Comercio Group, © 2020 Europa Press, © 2020 France 24, © 2020 Frandsen Digital Media, LLC, © 2020 Gannett Satellite Information Network, LLC, © 2020 Google LLC, © 2020 Guardian News & Media Limited or its affiliated companies, © 2019 Hearst Magazine Media, Inc., © 2020 Malecon Media Group SL, © 2020 NBCUNIVERSAL MEDIA, LLC, © 2020 Nexstar Media Group, Inc., © 2019 ROHM CO., LTD., © 2020 Sinclair Broadcast Group, © 2020 Spain Export Film & TV, © 2020 The Associated Press, © 2019 The Atlantic Monthly Group, © 2020 The E.W. Scripps Company, © 2019 The New York Times Company, © 2020 The Republic EC, © 2020 Vox Media, LLC, © 2020 Yahoo, © 2020, 2025 Trustees of the University of Pennsylvania
Subject:English language
Subject (ISO639):eng
Type (DCMI):Image
MovingImage
Software
Sound
StillImage
Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2025T11
DateStamp:  2025-08-15
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Chen, Song; Bies, Ann; Mott, Justin; Caruso, Christopher; Tracey, Jennifer; Strassel, Stephanie. 2025. Linguistic Data Consortium.
Terms: area_Europe country_ES country_GB dcmi_Image dcmi_MovingImage dcmi_Software dcmi_Sound dcmi_StillImage dcmi_Text iso639_eng iso639_spa olac_primary_text

Inferred Metadata

Country: United Kingdom
Area: Europe


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2025T11
Up-to-date as of: Sat Aug 16 1:46:08 EDT 2025