OLAC Record: KORLEX – Serbian Lexicon

OLAC Record
oai:catalogue.elra.info:ELRA-L0066

Metadata

Title: KORLEX – Serbian Lexicon

Access Rights: Rights available for: nonCommercialUse, commercialUse

Date Available (W3CDTF): 2006-03-31

Date Issued (W3CDTF): 2006-03-31

Date Modified (W3CDTF): 2006-03-31

Description: This lexical resource was developed as part of the bilingual lexicon for English-Serbian built for the following project: http://www.rjecnik.com.The lexicon data is compiled with the objective of covering the majority of text circulating in everyday use, such as in the news (e.g., newswire articles), in business, technological documentation, legal documentation, and politics. The words that are primarily used in literary and religious contexts, and which are not part of every-day usage, are generally not included in the lexicon.The KORLEX-Serbian Lexicon provides a list of 108,491 Serbian lemmas, i.e., words in canonical form, annotated with part-of-speech (POS) tag and lexical features. Among these 108,491 entries, there are 52,027 nouns, 9,153 adverbs, 15,522 verbs and 31,052 adjectives. Remaining entries are pronouns, determiners, prepositions/postpositions, conjunctions and numerals.The resource is a flat textual file in which each textual line contains information about one lemma. The format of a line can be captured with the following Perl regular expressions: # Characters appearing in a word (ISO-8859-2)$c = qr/[-{}|.\/\d\w\xA9\xAE\xB9\xBE\xC6\xC8\xD0\xE6\xE8\xF0]/# A lemma$m = qr/$c+(?: $c+)*/;# A lemma specification (each line in the resource)/^($m(?:#$m)?)\s+:(\w+)([\w:]+)\r?$/ In the last expression, $1 is a lemma, $2 is the POS tag, and $3 is a concatenated list of features. A typical line is: vrata:nn:fin which "vrata" is a lemma, with POS being "nn", and features including "f" gender.A lemma may contain the hash sign (#), in which case it denotes a frequently misspelled form. For example, in: bidem#budem:spec:x "bidem" is an incorrect form, followed by a correct form "budem".Additionally, the incorrect forms are marked with the feature ":x".Local linguistic variants are tagged with :ek and :ije tags for ekavian and ijekavian forms respectively. Ekavian is spoken in Serbia, while ijekavian is spoken in Montenegro and Republika Srpska (Bosnia).For example, inmesec:nn:m:ekmjesec:nn:m:ijeThe resource is encoded using ISO-8859-2 encoding, and sorted according to the standard Serbian lexicographic order.

Identifier: ELRA-L0066

ISLRN: 514-505-478-814-0

Identifier (URI): https://catalog.elra.info/en-us/repository/browse/ELRA-L0066/

Language: Serbian

Language (ISO639): srp

Medium: Not specified

Publisher: ELRA (European Language Resources Association)

Type (DCMI): Text

Type (OLAC): lexicon

OLAC Info

Archive: ELRA Catalogue of Language Resources

Description: http://www.language-archives.org/archive/catalogue.elra.info

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:catalogue.elra.info:ELRA-L0066

DateStamp: 2006-03-31

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: n.a. 2006. ELRA (European Language Resources Association).
Terms: area_Europe country_RS dcmi_Text iso639_srp olac_lexicon

http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-L0066
Up-to-date as of: Wed Oct 1 0:55:40 EDT 2025

Metadata
Title:		KORLEX – Serbian Lexicon
Access Rights:		Rights available for: nonCommercialUse, commercialUse
Date Available (W3CDTF):		2006-03-31
Date Issued (W3CDTF):		2006-03-31
Date Modified (W3CDTF):		2006-03-31
Description:		This lexical resource was developed as part of the bilingual lexicon for English-Serbian built for the following project: http://www.rjecnik.com.The lexicon data is compiled with the objective of covering the majority of text circulating in everyday use, such as in the news (e.g., newswire articles), in business, technological documentation, legal documentation, and politics. The words that are primarily used in literary and religious contexts, and which are not part of every-day usage, are generally not included in the lexicon.The KORLEX-Serbian Lexicon provides a list of 108,491 Serbian lemmas, i.e., words in canonical form, annotated with part-of-speech (POS) tag and lexical features. Among these 108,491 entries, there are 52,027 nouns, 9,153 adverbs, 15,522 verbs and 31,052 adjectives. Remaining entries are pronouns, determiners, prepositions/postpositions, conjunctions and numerals.The resource is a flat textual file in which each textual line contains information about one lemma. The format of a line can be captured with the following Perl regular expressions: # Characters appearing in a word (ISO-8859-2)$c = qr/[-{}\|.\/\d\w\xA9\xAE\xB9\xBE\xC6\xC8\xD0\xE6\xE8\xF0]/# A lemma$m = qr/$c+(?: $c+)*/;# A lemma specification (each line in the resource)/^($m(?:#$m)?)\s+:(\w+)([\w:]+)\r?$/ In the last expression, $1 is a lemma, $2 is the POS tag, and $3 is a concatenated list of features. A typical line is: vrata:nn:fin which "vrata" is a lemma, with POS being "nn", and features including "f" gender.A lemma may contain the hash sign (#), in which case it denotes a frequently misspelled form. For example, in: bidem#budem:spec:x "bidem" is an incorrect form, followed by a correct form "budem".Additionally, the incorrect forms are marked with the feature ":x".Local linguistic variants are tagged with :ek and :ije tags for ekavian and ijekavian forms respectively. Ekavian is spoken in Serbia, while ijekavian is spoken in Montenegro and Republika Srpska (Bosnia).For example, inmesec:nn:m:ekmjesec:nn:m:ijeThe resource is encoded using ISO-8859-2 encoding, and sorted according to the standard Serbian lexicographic order.
Identifier:		ELRA-L0066
Identifier:		ISLRN: 514-505-478-814-0
Identifier (URI):		https://catalog.elra.info/en-us/repository/browse/ELRA-L0066/
Language:		Serbian
Language (ISO639):		srp
Medium:		Not specified
Publisher:		ELRA (European Language Resources Association)
Type (DCMI):		Text
Type (OLAC):		lexicon
OLAC Info
Archive:		ELRA Catalogue of Language Resources
Description:		http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:catalogue.elra.info:ELRA-L0066
DateStamp:		2006-03-31
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		n.a. 2006. ELRA (European Language Resources Association).
Terms:		area_Europe country_RS dcmi_Text iso639_srp olac_lexicon