OLAC Record: Abstract Meaning Representation (AMR) Annotation Release 1.0

OLAC Record
oai:www.ldc.upenn.edu:LDC2014T12

Metadata

Title: Abstract Meaning Representation (AMR) Annotation Release 1.0

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Knight, Kevin , et al. Abstract Meaning Representation (AMR) Annotation Release 1.0 LDC2014T12. Web Download. Philadelphia: Linguistic Data Consortium, 2014

Contributor: Knight, Kevin

Baranescu, Laura

Bonial, Claire

Georgescu, Madalina

Griffitt, Kira

Hermjakob, Ulf

Marcu, Daniel

Palmer, Martha

Schneider, Nathan

Date (W3CDTF): 2014

Date Issued (W3CDTF): 2014-06-16

Description: *Introduction* Abstract Meaning Representation (AMR) Annotation Release 1.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado's Computational Language and Educational Research group and the Information Sciences Institute at the University of Southern California. It contains a sembank (semantic treebank) of over 13,000 English natural language sentences from newswire, weblogs and web discussion forums. AMR captures “who is doing what to whom” in a sentence. Each sentence is paired with a graph that represents its whole-sentence meaning in a tree-structure. AMR utilizes PropBank frames, non-core semantic roles, within-sentence coreference, named entity annotation, modality, negation, questions, quantities, and so on to represent the semantic structure of a sentence largely independent of its syntax. LDC also released Abstract Meaning Representation (AMR) Annotation Release 2.0 (LDC2017T10). *Data* The source data includes discussion forums collected for the DARPA BOLT program, Wall Street Journal and translated Xinhua news texts, various newswire data from NIST OpenMT evaluations and weblog data used in the DARPA GALE program. The following table summarizes the number of training, dev, and test AMRs for each dataset in the release. Totals are also provided by partition and dataset: Dataset Training Dev Test Totals BOLT DF MT 1061 133 133 1327 Weblog and WSJ 0 100 100 200 BOLT DF English 1703 210 229 2142 2009 Open MT 204 0 0 204 Xinhua MT 741 99 86 926 Totals 3709 542 548 4799 For those interested in a utilizing a standard/community partition for AMR research (for instance in development of semantic parsers), data in the "split" directory contains 13,051 AMRs divided roughly 80/10/10 into training/dev/test partitions, with most smaller datasets assigned to one of the splits as a whole. Note that splits observe document boundaries. The "unsplit" directory contains the same 13,051 AMRs with no train/dev/test partition. *Samples* Please view this sample. *Updates* None at this time. *Acknowledgements* From University of Colorado We gratefully acknowledge the support of the National Science Foundation Grant NSF: 0910992 IIS:RI: Large: Collaborative Research: Richer Representations for Machine Translation and the support of Darpa BOLT - HR0011-11-C-0145 and DEFT - FA-8750-13-2-0045 via a subcontract from LDC. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation, DARPA or the US government. From Information Sciences Institute Thanks to NSF (IIS-0908532) for funding the initial design of AMR, and to DARPA MRP (FA-8750-09-C-0179) for supporting a group to construct consensus annotations and the AMR Editor. The initial AMR bank was built under DARPA DEFT FA-8750-13-2-0045 (PI: Stephanie Strassel; co-PIs: Kevin Knight, Daniel Marcu, and Martha Palmer) and DARPA BOLT HR0011-12-C-0014 (PI: Kevin Knight). From Linguistic Data Consortium This material is based on research sponsored by Air Force Research Laboratory and Defense Advance Research Projects Agency under agreement number FA8750-13-2-0045. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory and Defense Advanced Research Projects Agency or the U.S. Government. We gratefully acknowledge the support of Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0184 Subcontract 4400165821. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA, AFRL, or the US government. From Language Weaver (SDL) This work was partially sponsored by DARPA contract HR0011-11-C-0150 to LanguageWeaver Inc. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA or the US government.

Extent: Corpus size: 9376 KB

Identifier: LDC2014T12

https://catalog.ldc.upenn.edu/LDC2014T12

ISBN: 1-58563-681-9

ISLRN: 637-196-362-554-6

DOI: 10.35111/0ync-7404

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2014T12

Rights Holder: Portions © 2007 Agence France Presse, Al-Ahram, Al Hayat, Al-Quds Al-Arabi, Asharq Al-Awsat, An Nahar, Assabah, China Military Online, Chinanews.com, Guangming Daily, © 1987-1989 Dow Jones & Company, Inc., © 1994-1998, 2007 Xinhua News Agency, © 2014 Language Weaver, Inc., © 2014 University of Colorado, © 2014 University of Southern California, © 2004, 2007, 2013, 2014 Trustees of the University of Pennsylvania

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2014T12

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Knight, Kevin; Baranescu, Laura; Bonial, Claire; Georgescu, Madalina; Griffitt, Kira; Hermjakob, Ulf; Marcu, Daniel; Palmer, Martha; Schneider, Nathan. 2014. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Text iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2014T12
Up-to-date as of: Wed Oct 29 7:01:27 EDT 2025

Metadata
Title:		Abstract Meaning Representation (AMR) Annotation Release 1.0
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Knight, Kevin , et al. Abstract Meaning Representation (AMR) Annotation Release 1.0 LDC2014T12. Web Download. Philadelphia: Linguistic Data Consortium, 2014
Contributor:		Knight, Kevin
		Baranescu, Laura
		Bonial, Claire
		Georgescu, Madalina
		Griffitt, Kira
		Hermjakob, Ulf
		Marcu, Daniel
		Palmer, Martha
		Schneider, Nathan
Date (W3CDTF):		2014
Date Issued (W3CDTF):		2014-06-16
Description:		Introduction Abstract Meaning Representation (AMR) Annotation Release 1.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado's Computational Language and Educational Research group and the Information Sciences Institute at the University of Southern California. It contains a sembank (semantic treebank) of over 13,000 English natural language sentences from newswire, weblogs and web discussion forums. AMR captures “who is doing what to whom” in a sentence. Each sentence is paired with a graph that represents its whole-sentence meaning in a tree-structure. AMR utilizes PropBank frames, non-core semantic roles, within-sentence coreference, named entity annotation, modality, negation, questions, quantities, and so on to represent the semantic structure of a sentence largely independent of its syntax. LDC also released Abstract Meaning Representation (AMR) Annotation Release 2.0 (LDC2017T10). Data The source data includes discussion forums collected for the DARPA BOLT program, Wall Street Journal and translated Xinhua news texts, various newswire data from NIST OpenMT evaluations and weblog data used in the DARPA GALE program. The following table summarizes the number of training, dev, and test AMRs for each dataset in the release. Totals are also provided by partition and dataset: Dataset Training Dev Test Totals BOLT DF MT 1061 133 133 1327 Weblog and WSJ 0 100 100 200 BOLT DF English 1703 210 229 2142 2009 Open MT 204 0 0 204 Xinhua MT 741 99 86 926 Totals 3709 542 548 4799 For those interested in a utilizing a standard/community partition for AMR research (for instance in development of semantic parsers), data in the "split" directory contains 13,051 AMRs divided roughly 80/10/10 into training/dev/test partitions, with most smaller datasets assigned to one of the splits as a whole. Note that splits observe document boundaries. The "unsplit" directory contains the same 13,051 AMRs with no train/dev/test partition. Samples Please view this sample. Updates None at this time. Acknowledgements From University of Colorado We gratefully acknowledge the support of the National Science Foundation Grant NSF: 0910992 IIS:RI: Large: Collaborative Research: Richer Representations for Machine Translation and the support of Darpa BOLT - HR0011-11-C-0145 and DEFT - FA-8750-13-2-0045 via a subcontract from LDC. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation, DARPA or the US government. From Information Sciences Institute Thanks to NSF (IIS-0908532) for funding the initial design of AMR, and to DARPA MRP (FA-8750-09-C-0179) for supporting a group to construct consensus annotations and the AMR Editor. The initial AMR bank was built under DARPA DEFT FA-8750-13-2-0045 (PI: Stephanie Strassel; co-PIs: Kevin Knight, Daniel Marcu, and Martha Palmer) and DARPA BOLT HR0011-12-C-0014 (PI: Kevin Knight). From Linguistic Data Consortium This material is based on research sponsored by Air Force Research Laboratory and Defense Advance Research Projects Agency under agreement number FA8750-13-2-0045. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory and Defense Advanced Research Projects Agency or the U.S. Government. We gratefully acknowledge the support of Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0184 Subcontract 4400165821. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA, AFRL, or the US government. From Language Weaver (SDL) This work was partially sponsored by DARPA contract HR0011-11-C-0150 to LanguageWeaver Inc. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA or the US government.
Extent:		Corpus size: 9376 KB
Identifier:		LDC2014T12
		https://catalog.ldc.upenn.edu/LDC2014T12
		ISBN: 1-58563-681-9
		ISLRN: 637-196-362-554-6
		DOI: 10.35111/0ync-7404
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2014T12
Rights Holder:		Portions © 2007 Agence France Presse, Al-Ahram, Al Hayat, Al-Quds Al-Arabi, Asharq Al-Awsat, An Nahar, Assabah, China Military Online, Chinanews.com, Guangming Daily, © 1987-1989 Dow Jones & Company, Inc., © 1994-1998, 2007 Xinhua News Agency, © 2014 Language Weaver, Inc., © 2014 University of Colorado, © 2014 University of Southern California, © 2004, 2007, 2013, 2014 Trustees of the University of Pennsylvania
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2014T12
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Knight, Kevin; Baranescu, Laura; Bonial, Claire; Georgescu, Madalina; Griffitt, Kira; Hermjakob, Ulf; Marcu, Daniel; Palmer, Martha; Schneider, Nathan. 2014. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Text iso639_eng olac_primary_text