OLAC Record: PAROLE Greek Lexicon

OLAC Record
oai:catalogue.elra.info:ELRA-L0032

Metadata

Title: PAROLE Greek Lexicon

Access Rights: Rights available for: nonCommercialUse

Date Available (W3CDTF): 2000-03-09

Date Issued (W3CDTF): 2000-03-09

Date Modified (W3CDTF): 2016-11-15

Description: The PAROLE Greek lexicon has two layers, morphological and syntactic. It includes the most frequent words found in a 9 million word corpus, coded according to the PAROLE specifications.The Morphological layer contains a total of 20149 Morphological units, of which 12042 are nouns (common and proper), 3014 verbs, 3405 adjectives, 106 numerals, 45 pronouns, 2 articles, 1396 adverbs, 48 adpositions, 51 conjunctions, 21 interjections, 19 "unique" categories.The Syntactic layer contains 25092 Syntactic units, of which 14548 are nouns, 5397 verbs, 3558 adjectives, 1410 adverbs, 73 adpositions and 106 numerals.This lexicon was constructed based on the following resources:a. the ILSP Morphological Lexiconb. the ILSP Corpus***Introduction on the PAROLE projectLE-PAROLE project (MLAP/LE2-4017) aims to offer a large-scale harmonised set of "core" corpora and lexica for all European Union languages. Language corpora and lexica were built according to the same design and composition principles, in the period 1996-1998. PAROLE Corpora:The harmonisation with respect to corpus composition (selection of corpus texts) was to be achieved by the obligatory application of common parameters for time of production and classification according to publication medium. No texts older than 1970 were allowed. As for publication medium, the corpus had to include specific proportions of texts from the categories “Book”, “Newspaper”, “Periodical” and “Miscellaneous” within a settled range. The harmonisation effort also applied to the textual and linguistic encoding of the language corpora involved. With respect to the mark up of text structure and primary data, every single corpus text was to be encoded according to the PAROLE DTD, which is compatible with the DTD of the Text Encoding Initiative (TEI) and with that of the Corpus Encoding Standard (CES). The level of encoding was set to Level 1 of the CES, implying the encoding of text structure and textual features up to Paragraph Level, with the additional constraint, however, that all legacy data was kept. As for linguistic corpus annotation, an equal proportion of the corpus texts (up to 250,000 running words) was to be morphosyntactically annotated according to a common core PAROLE tagset, extended with a set of language specific features. The checking of the tags was split in two: 50,000 words had to be checked for maximum granularity and 200,000 for part-of-speech (PoS) only. The languages involved in PAROLE corpora are: Belgian French, Catalan, Danish, Dutch, English, French, Finnish, German, Greek, Irish, Italian, Norwegian, Portuguese and Swedish.PAROLE Lexica:The lexica (20,000 entries per language) were built conform to a model based on EAGLES guidelines and GENELEX results, underlying a common lexical tool adapted from the EUREKA-GENELEX project. This software tool was extended to support the PAROLE model and conversion and management processes of the resulting resources. The languages involved in PAROLE lexica are: Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portuguese, Spanish and Swedish.

Identifier: ELRA-L0032

ISLRN: 343-554-003-168-1

Identifier (URI): https://catalog.elra.info/en-us/repository/browse/ELRA-L0032/

Language: Modern Greek (1453-)

Language (ISO639): ell

Medium: Not specified

Publisher: ELRA (European Language Resources Association)

Type (DCMI): Text

Type (OLAC): lexicon

OLAC Info

Archive: ELRA Catalogue of Language Resources

Description: http://www.language-archives.org/archive/catalogue.elra.info

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:catalogue.elra.info:ELRA-L0032

DateStamp: 2000-03-09

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: n.a. 2000. ELRA (European Language Resources Association).
Terms: area_Europe country_GR dcmi_Text iso639_ell olac_lexicon

http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-L0032
Up-to-date as of: Wed Oct 1 0:55:15 EDT 2025

Metadata
Title:		PAROLE Greek Lexicon
Access Rights:		Rights available for: nonCommercialUse
Date Available (W3CDTF):		2000-03-09
Date Issued (W3CDTF):		2000-03-09
Date Modified (W3CDTF):		2016-11-15
Description:		The PAROLE Greek lexicon has two layers, morphological and syntactic. It includes the most frequent words found in a 9 million word corpus, coded according to the PAROLE specifications.The Morphological layer contains a total of 20149 Morphological units, of which 12042 are nouns (common and proper), 3014 verbs, 3405 adjectives, 106 numerals, 45 pronouns, 2 articles, 1396 adverbs, 48 adpositions, 51 conjunctions, 21 interjections, 19 "unique" categories.The Syntactic layer contains 25092 Syntactic units, of which 14548 are nouns, 5397 verbs, 3558 adjectives, 1410 adverbs, 73 adpositions and 106 numerals.This lexicon was constructed based on the following resources:a. the ILSP Morphological Lexiconb. the ILSP Corpus***Introduction on the PAROLE projectLE-PAROLE project (MLAP/LE2-4017) aims to offer a large-scale harmonised set of "core" corpora and lexica for all European Union languages. Language corpora and lexica were built according to the same design and composition principles, in the period 1996-1998. PAROLE Corpora:The harmonisation with respect to corpus composition (selection of corpus texts) was to be achieved by the obligatory application of common parameters for time of production and classification according to publication medium. No texts older than 1970 were allowed. As for publication medium, the corpus had to include specific proportions of texts from the categories “Book”, “Newspaper”, “Periodical” and “Miscellaneous” within a settled range. The harmonisation effort also applied to the textual and linguistic encoding of the language corpora involved. With respect to the mark up of text structure and primary data, every single corpus text was to be encoded according to the PAROLE DTD, which is compatible with the DTD of the Text Encoding Initiative (TEI) and with that of the Corpus Encoding Standard (CES). The level of encoding was set to Level 1 of the CES, implying the encoding of text structure and textual features up to Paragraph Level, with the additional constraint, however, that all legacy data was kept. As for linguistic corpus annotation, an equal proportion of the corpus texts (up to 250,000 running words) was to be morphosyntactically annotated according to a common core PAROLE tagset, extended with a set of language specific features. The checking of the tags was split in two: 50,000 words had to be checked for maximum granularity and 200,000 for part-of-speech (PoS) only. The languages involved in PAROLE corpora are: Belgian French, Catalan, Danish, Dutch, English, French, Finnish, German, Greek, Irish, Italian, Norwegian, Portuguese and Swedish.PAROLE Lexica:The lexica (20,000 entries per language) were built conform to a model based on EAGLES guidelines and GENELEX results, underlying a common lexical tool adapted from the EUREKA-GENELEX project. This software tool was extended to support the PAROLE model and conversion and management processes of the resulting resources. The languages involved in PAROLE lexica are: Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portuguese, Spanish and Swedish.
Identifier:		ELRA-L0032
Identifier:		ISLRN: 343-554-003-168-1
Identifier (URI):		https://catalog.elra.info/en-us/repository/browse/ELRA-L0032/
Language:		Modern Greek (1453-)
Language (ISO639):		ell
Medium:		Not specified
Publisher:		ELRA (European Language Resources Association)
Type (DCMI):		Text
Type (OLAC):		lexicon
OLAC Info
Archive:		ELRA Catalogue of Language Resources
Description:		http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:catalogue.elra.info:ELRA-L0032
DateStamp:		2000-03-09
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		n.a. 2000. ELRA (European Language Resources Association).
Terms:		area_Europe country_GR dcmi_Text iso639_ell olac_lexicon