OLAC Record
oai:www.ldc.upenn.edu:LDC2004L02

Metadata
Title:Buckwalter Arabic Morphological Analyzer Version 2.0
Access Rights:Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:Buckwalter, Tim. Buckwalter Arabic Morphological Analyzer Version 2.0 LDC2004L02. Web Download. Philadelphia: Linguistic Data Consortium, 2004
Contributor:Buckwalter, Tim
Date (W3CDTF):2004
Date Issued (W3CDTF):2004-12-15
Description:*Introduction* Buckwalter Arabic Morphological Analyzer Version 2.0 was developed by Tim Buckwalter at the Linguistic Data Consortium (LDC) and contains a Perl script for morphology analysis and part-of-speech (POS) tagging of Arabic text. The release includes lexicons with approximately 83,000 entries of Arabic prefixes, suffixes, and stems as well as compatibility tables that are referenced by the script in the analysis of the text. The analyzer considers each Arabic word token in all possible prefix-stem-suffix segmentations and lists all known/possible annotation solutions, POS labels, and glosses. The generated output may then be reviewed by users, and the most appropriate annotation selected from among several choices. This tool has been used frequently for LDC releases of annotated Arabic text. *Data* The data consists primarily of the Perl script, lexicons, and compatibility tables. Here are the three Arabic-English lexicon files: * Prefixes (299 entries) * Suffixes (618 entries) * Stems (82,158 entries representing 38,600 lemmas) The lexicons are supplemented by three morphological compatibility tables used for controlling possible word part combinations: * Prefix-stem (1,648 entries) * Stem-suffix (1,285 entries) * Prefix-suffix (598 entries) The documentation consists of a readme file with a description of the lexicon files, the morphological compatibility tables, the morphology analysis algorithm, a summary of stem morphological categories, and a table with the author's Arabic transliteration system. *Samples* To see an example of the analyzer's output, please examine this sample. *Updates* There are no updates available at this time. *Additional Licensing Instructions* This 'members-only' corpus is available to current members who can request the data at the listed reduced-license fee. Contact ldc@ldc.upenn.edu for information about becoming a member.
Extent:Corpus size: 9216 KB
Identifier:LDC2004L02
https://catalog.ldc.upenn.edu/LDC2004L02
ISBN: 1-58563-324-0
ISLRN: 694-194-540-336-4
DOI: 10.35111/050q-5r95
Language:Standard Arabic
English
Language (ISO639):arb
eng
License:BAMA Agreement: https://catalog.ldc.upenn.edu/license/buckwalter-arabic-morphological-analyzer.pdf
Medium:Distribution: Web Download
Publisher:Linguistic Data Consortium
Publisher (URI):https://www.ldc.upenn.edu
Relation (URI):https://catalog.ldc.upenn.edu/docs/LDC2004L02
Rights Holder:Portions (c) 2002-2004 QAMUS LLC (www.qamus.org), (c) 2002-2004 Trustees of the University of Pennsylvania
Subject:Standard Arabic language
Subject (ISO639):arb
Type (DCMI):Text
Type (OLAC):lexicon

OLAC Info

Archive:  The LDC Corpus Catalog
Description:  http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.ldc.upenn.edu:LDC2004L02
DateStamp:  2024-04-02
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Buckwalter, Tim. 2004. Linguistic Data Consortium.
Terms: area_Asia area_Europe country_GB country_SA dcmi_Text iso639_arb iso639_eng olac_lexicon

Inferred Metadata

Country: Saudi Arabia
Area: Asia


http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2004L02
Up-to-date as of: Wed Apr 3 6:37:05 EDT 2024