OLAC Record
oai:www.ldc.upenn.edu:LDC2025T13

Metadata
Title:AIDA Scenario 1 Evaluation Topic Source Data, Annotation, and Assessment
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Tracey, Jennifer, et al. AIDA Scenario 1 Evaluation Topic Source Data, Annotation, and Assessment LDC2025T13. Web Download. Philadelphia: Linguistic Data Consortium, 2025
Contributor:Tracey, Jennifer
Strassel, Stephanie
Getman, Jeremy
Bies, Ann
Griffitt, Kira
Graff, David
Caruso, Christopher
Date (W3CDTF):2025
Date Issued (W3CDTF):2025-09-15
Description:*Introduction* AIDA Scenario 1 Evaluation Topic Source Data, Annotation, and Assessment was developed by the Linguistic Data Consortium (LDC) and is comprised of English, Russian, and Ukrainian web documents (text, video, image), annotations and assessments used in the AIDA Phase 1 pilot and final evaluations. Each phase of the AIDA program centered on a specific scenario, or broad topic area, with related subtopics designated as either practice topics or evaluation topics.The Phase 1 scenario focused on political relations between Russia and Ukraine in the 2010s. The documents, annotations and assessments contained in this corpus include coverage of the following events: Suspicious Deaths and Murders in Ukraine (January-April 2015); Odessa Tragedy (May 2, 2014); and Siege of Sloviansk and Battle of Kramatorsk (April-July 2014). The AIDA (Active Interpretation of Disparate Alternatives) Program was designed to support development of technology to assist in cultivating and maintaining understanding of events when there are conflicting accounts of what happened (e.g., who did what to whom and/or where and when events occurred). AIDA systems must extract entities, events, and relations from individual multimedia documents, aggregate that information across documents and languages, and produce multiple knowledge graph hypotheses that characterize the conflicting accounts that are present in the data. *Data* The corpus contains a multi-media collection of 10,522 documents, annotations for 386 of those documents, and assessement results covering 77,965 responses in 1,525 of those documents. Source material was collected from the web by a combination of automatic and manual processes. HTML content was converted from its original form into XML. To the extent possible, all resources referenced by a given "root" HTML page (style sheets, javascript, images, media files, etc.) were stored as separate files of the given data type and assigned separate 9-character file-IDs (the same form of ID used for the "root" HTML page). Annotations were performed in three steps: (1) within-document labels for scenario-related entities, relations and events; (2) coreference annotation across documents by linking information elements to a knowledge base; and (3) indicatons of any relationship between labeled events/relations and hypotheses about the scenario. In the assessment phase, LDC annotators reviewed and judged system response files to provide evaluation organizers with a means for scoring submissions. Assessment tasks included zero-hop assessment, class-based assessment, graph assessment and hypotheseis assessment. Further information about annotation and assessment processes are contained in the documentation accompanying this release. Annotations and assessments are presented as tab separated files. The knowledge base for entity detection and linking annotation for all AIDA Scenario 1 and 2 corpora is available separately as AIDA Scenario 1 and 2 Reference Knowledge Base (LDC2023T10). *Sponsorship* This material is based upon work supported by Air Force Research Laboratory (AFRL) and the Defense Advanced Research Projects Agency (DARPA) under Contract No. FA8750-18-C-0013. *Samples* Please view the following samples: * Argument Mentions * Event Mentions * Event Slots * Knowledge Base Linking * Relation Mentions * Relation Slots *Updates* No updates at this time.
Extent:Corpus size: 57671680 KB
Format:Sampling Rate: 44100 Hz (with some variations)
Sampling Format: mpeg
Identifier:LDC2025T13
https://catalog.ldc.upenn.edu/LDC2025T13
ISLRN: 620-348-369-491-1
DOI: 10.35111/n4ac-3012
Language:Ukrainian
Russian
English
Language (ISO639):ukr
rus
eng
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2025T13
Rights Holder:Portions © 2003, 2015 2000.ua, © 2015 Arguments and Facts, © 2014 Associated Newspapers Ltd, © 2017 Belarus Today, © 2017 Belarusian Hour, © 2016-2018 Bessarabia INFORM, © 2017-2018 Bird In Flight, © 2015-2016 Cable News Network. Turner Broadcasting System, Inc., © 2016 Censor.NET, © 2011-2015, 2017 Consortiumnews, © 2014-2016 Digital Venture LLC, © 2012, 2017 DirectPress.ru, © 2015 Elisa Group Ltd., © 2018 Elnews.ru, © 2014 EUROMAIDAN PRESS, © 2015-2016 euronews,© 2017 Facts and Comments, © 2016 FAN, © 2014 Forbes Media LLC, © 2014 From-UA, © 2013 gate @ Crimea – news, comments, © 2017 Gazetadaily.ru, © 2018 GLAVRED.INFO, © 2011 Human Rights Watch, © 2013, 2017-2018 IA REGNUM, © 2014-2015 InfoKava.com, © 2017 Information and Analytical Agency, © 2015 InoSMI.ru, © 2014 Interfax-Ukraine, © 2009-2017 JSC Business News Media, © 2012-2014, 2016 KM Online, LLC, © 2014-2015 Lenta.Ru LLC, © 2017 Liga Information and Analytical Center, © 2015, 2017 Lux Television and Radio Company, © 2014-2017 MIA Russia Today, © 2016-2018 mirnews.su, © 2014 Mirror of the week, © 2017 News Front, © 2015-2016 NEWSru.com, © 2018 Obozrevatel, © 2014-2015 PJSC Today Multimedia, © 2017 Public Television, © 2014-2015, 2017 Radio Liberty, © 2014-2015 RFE/RL, © 2014-2015 The Daily Beast Company LLC, © 2014-2017 The Military Review, © 2011-2012 The Power of Truth, © 2014 The Slate Group, © 2014, 2016-2017 TSN.ua, © 2014-2017 TV-Novosti, © 2015, 2017 Ukrainian Media Holding, © 2014, 2016 Ukrainian Media Systems, © 2014-2015, 2017 Ukrainian Pravda, © 2015-2017 Ukrinform, © 2014, 2017 UNIAN.NET, © 2014-2015 Vice News, © 2017 Western Information Corporation, © 2014-2018 Zhitomir-Online, © 2018, 2025 Trustees f the University of Pennsylvania
Subject:English language
Russian language
Ukrainian language
Subject (ISO639):eng
rus
ukr
Type (DCMI):Image
MovingImage
StillImage
Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2025T13
DateStamp:  2025-09-15
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Tracey, Jennifer; Strassel, Stephanie; Getman, Jeremy; Bies, Ann; Griffitt, Kira; Graff, David; Caruso, Christopher. 2025. Linguistic Data Consortium.
Terms: area_Europe country_GB country_RU country_UA dcmi_Image dcmi_MovingImage dcmi_StillImage dcmi_Text iso639_eng iso639_rus iso639_ukr olac_primary_text

Inferred Metadata

Country: United KingdomRussian FederationUkraine
Area: Europe


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2025T13
Up-to-date as of: Tue Sep 16 3:24:16 EDT 2025