OLAC Record
oai:catalogue.elra.info:ELRA-L0065

Metadata
Title:KORLEX ? Croatian Lexicon
Abstract:The KORLEX - Croatian Lexicon provides a list of 118,252 Croatian lemmas (including 52,450 nouns, 8,985 adverbs, 14,937 verbs and 41,161 adjectives, as well as pronouns, determiners, prepositions/postpositions, conjunctions and numerals), i.e., words in canonical form, annotated with part-of-speech (POS) tag and lexical features. The lexicon data is compiled with the objective of covering the majority of text circulating in everyday use, such as in the news, in business, technological documentation, legal documentation, and politics. The resource is a flat textual file in which each textual line contains information about one lemma. The resource is encoded using ISO-8859-2 encoding, and sorted according to the standard Croatian lexicographic order.
Access Rights:Rights available for: Research Use, Commercial Use
Date Available (W3CDTF):2006-03-31
Date Created (W3CDTF):2004-10-01
Date Issued (W3CDTF):2006-03-31
Date Modified (W3CDTF):2006-03-31
Description:Monolingual Lexicons
This lexical resource was developed as part of the bilingual lexicon for English-Croatian built for the following project: http://www.rjecnik.com. The lexicon data is compiled with the objective of covering the majority of text circulating in everyday use, such as in the news (e.g., newswire articles), in business, technological documentation, legal documentation, and politics. The words that are primarily used in literary and religious contexts, and which are not part of every-day usage, are generally not included in the lexicon. The KORLEX-Croatian Lexicon provides a list of 118,252 Croatian lemmas, i.e., words in canonical form, annotated with part-of-speech (POS) tag and lexical features. Among these 118,252 entries, there are 52,450 nouns, 8,985 adverbs, 14,937 verbs and 41,161 adjectives. Remaining entries are pronouns, determiners, prepositions/postpositions, conjunctions and numerals. The resource is a flat textual file in which each textual line contains information about one lemma. The format of a line can be captured with the following Perl regular expression: /^(.*\S)\t+(:\w+)(.*)$/; where $1 is lemma, $2 is POS tag, and $3 is a concatenated list of features. For example in: automobil :nn:m the lemma is "automobil", the POS tag is ":nn" and the lemma is annotated with one feature ":m". A lemma may contain the hash sign (#), in which case it denotes a frequently misspelled form. For example, in: mijesec#mjesec :nn:m:x "mijesec" is an incorrect form, followed with a correct form "mjesec". Additionally, the incorrect forms are marked with the feature ":x". The resource is encoded using ISO-8859-2 encoding, and sorted according to the standard Croatian lexicographic order.
Identifier:ELRA-L0065
http://catalog.elra.info/product_info.php?products_id=858
Language:Croatian
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-L0065
DateStamp:  2006-03-31
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2006. ELRA (European Language Resources Association).
Terms: dcmi_Text olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-L0065
Up-to-date as of: Wed Mar 29 3:49:57 EDT 2017