OLAC Record
oai:www.ldc.upenn.edu:LDC2001T55

Metadata
Title:Arabic Newswire Part 1
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Graff, David, and Kevin Walker. Arabic Newswire Part 1 LDC2001T55. CD. Philadelphia: Linguistic Data Consortium, 2001
Contributor:Graff, David
Walker, Kevin
Date (W3CDTF):2001
Description:*Introduction* This publication contains the Arabic Newswire A Corpus, Linguistic Data Consortium (LDC) catalog number LDC2001T55 and ISBN 1-58563-190-6. The Arabic Newswire Corpus is composed of articles from the Agence France Presse (AFP) Arabic Newswire. The source material was tagged using TIPSTER-style SGML and was transcoded to Unicode (UTF-8). The corpus includes articles from May 13, 1994 to December 20, 2000. *Data* The data is in 2,337 compressed (zipped) Arabic text data files. There are 209 Mb of compressed data (869 Mb uncompressed) with approximately 383,872 documents containing 76 million tokens over approximately 666,094 unique words. A template of the tagging is presented below. yyyymmdd_AFP_ARB.dddd Arabic Text Arabic TextOne or More Paragraphs of Arabic Text Arabic Text Arabic Text For a sample file of tagged articles, please see this sample. *Updates* There are no updates at this time.
Extent:Corpus size: 9728 KB
Identifier:LDC2001T55
https://catalog.ldc.upenn.edu/LDC2001T55
ISBN: 1-58563-190-6
ISLRN: 013-368-610-633-9
OAI: oai:www.ldc.upenn.edu:LDC2001T55
Language:Standard Arabic
Language (ISO639):arb
License:LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2001T55
Rights Holder:Portions © 1994-2000 Agence France Press
Type (DCMI):Text

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2001T55
DateStamp:  2014-07-17
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Graff, David; Walker, Kevin. 2001. Linguistic Data Consortium.
Terms: area_Asia country_SA dcmi_Text iso639_arb


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2001T55
Up-to-date as of: Mon Nov 24 0:32:09 EST 2014