OLAC Record
oai:catalogue.elra.info:ELRA-W0049

Metadata
Title:"Le Monde Diplomatique" Arabic tagged corpus
Abstract:This corpus contains 102,960 vowelised, lemmatised and tagged words (58 texts from Le Monde Diplomatique Arabic, see also ELRA-W0036-04). To each text are associated 3 files : raw text in Arabic, vowelized text in Arabic, one XML file containing the morphological annotation of the text.
Access Rights:Rights available for: Research Use, Commercial Use
Date Available (W3CDTF):2009-03-31
Date Issued (W3CDTF):2009-03-31
Date Modified (W3CDTF):2009-03-31
Description:Written Corpora
This corpus contains 102,960 vowelised, lemmatised and tagged words (58 texts from Le Monde Diplomatique Arabic, see also ELRA-W0036-04). To each text are associated 3 files : - raw text in Arabic, - vowelized text in Arabic, - one XML file containing the morphological annotation of the text. Each text word associates a certain number of information, such as word size, rank of the word in the text, paragraph number where the word was found, etc. Each word associates a node in the XML file. Each node contains the following positional features of the word in the text: - Paragraph number in the text, i.e. paragraph where the word can be found, - Sentence number in the paragraph, - Sentence number in the text, - Rank of the word in the text, - Rank of the first character of the word in the text, - Word size. Information about word annotation are added as ? sub-nodes ?: - Word of non vowelised text, - Vowelised word, - Word lemma, - Grammatical category of the word.
Identifier:ELRA-W0049
http://catalog.elra.info/product_info.php?products_id=1096
Language:Arabic
Language (ISO639):ara
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0049
DateStamp:  2009-03-31
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2009. ELRA (European Language Resources Association).
Terms: dcmi_Text iso639_ara olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0049
Up-to-date as of: Mon Feb 27 0:31:33 EST 2017