OLAC Record

Title:"Le Monde Diplomatique" Arabic tagged corpus
Access Rights: Rights available for: nonCommercialUse, commercialUse
Date Available (W3CDTF):2009-03-31
Date Issued (W3CDTF):2009-03-31
Date Modified (W3CDTF):2009-03-31
Description:This corpus contains 102,960 vowelised, lemmatised and tagged words (58 texts from Le Monde Diplomatique Arabic, see also ELRA-W0036-04). To each text are associated 3 files :-raw text in Arabic,-vowelized text in Arabic,-one XML file containing the morphological annotation of the text. Each text word associates a certain number of information, such as word size, rank of the word in the text, paragraph number where the word was found, etc. Each word associates a node in the XML file. Each node contains the following positional features of the word in the text:-Paragraph number in the text, i.e. paragraph where the word can be found,-Sentence number in the paragraph,-Sentence number in the text,-Rank of the word in the text,-Rank of the first character of the word in the text,-Word size.Information about word annotation are added as « sub-nodes »:-Word of non vowelised text,-Vowelised word,-Word lemma,-Grammatical category of the word.
ISLRN: 124-139-628-259-2
Identifier (URI):http://catalog.elra.info/en-us/repository/browse/ELRA-W0049/
Language (ISO639):ara
Medium:Not specified
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text


Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0049
DateStamp:  2009-03-31
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2009. ELRA (European Language Resources Association).
Terms: dcmi_Text iso639_ara olac_primary_text

Up-to-date as of: Wed Nov 17 9:13:51 EST 2021