OLAC Record
oai:www.ldc.upenn.edu:LDC2006T10

Metadata
Title:English-Arabic Treebank v 1.0
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Bies, Ann. English-Arabic Treebank v 1.0 LDC2006T10. Web Download. Philadelphia: Linguistic Data Consortium, 2006
Contributor:Bies, Ann
Date (W3CDTF):2006
Date Issued (W3CDTF):2006-05-18
Description:*Introduction* This file contains documentation on the English-Arabic Parallel Treebank v 1.0 , Linguistic Data Consortium (LDC) catalog number LDC2006T10, ISBN 1-58563-387-9. This release of the English-Arabic Treebank consists of 52,238 words in 224 files of individual Agence France Presse (AFP) news stories (corresponding to approximately the first 50K words of the Arabic Treebank: Part 1 v 3.0 -- LDC Catalog No.: LDC2005T02, ISBN: 1-58563-330-5). The English translation was provided by LDC, and was part-of-speech tagged and treebanked for this project. *Data* The guidelines followed for both part-of-speech and treebank annotation are essentially Penn Treebank II style, with two notable differences: * POS: tokenization of hyphenated items ("New York-based" has been replaced by "New York - based" for example), and the addition of HYPH and AFX tags necessitated by this change in tokenization * TreeBank: the addition of the node label NML for sub-NP nominal constituents (replacing NX and most NP-internal NAC) *Samples* For an example of the data in this corpus, please review this text sample.
Extent:Corpus size: 18432 KB
Identifier:LDC2006T10
https://catalog.ldc.upenn.edu/LDC2006T10
ISBN: 1-58563-387-9
ISLRN: 021-421-953-520-4
Language:English
Standard Arabic
Language (ISO639):eng
arb
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2006T10
Rights Holder:Portions © 2000 Agence France Presse, © 2006 Trustees of the University of Pennsylvania
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2006T10
DateStamp:  2019-01-03
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Bies, Ann. 2006. Linguistic Data Consortium.
Terms: area_Asia area_Europe country_GB country_SA dcmi_Text iso639_arb iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2006T10
Up-to-date as of: Sat Jul 27 9:55:40 EDT 2019