OLAC Record
oai:catalogue.elra.info:ELRA-L0056

Metadata
Title:STO SprogTeknologisk Ordbase (Danish Lexicon for NLP/HLT Applications)
Access Rights: Rights available for: nonCommercialUse, commercialUse
Date Available (W3CDTF):2005-08-01
Date Issued (W3CDTF):2005-08-01
Date Modified (W3CDTF):2005-08-03
Description:The STO Lexicon is the most comprehensive computational lexicon of Danish comprising approx. 81,530 entry words, and it is well integrated with the European activities in the field of lexicon development building on experience obtained from the PAROLE and SIMPLE projects. The model and descriptive method of the STO lexicon are kept compatible with the architecture and descriptive language of PAROLE/SIMPLE. A number of refinements, adaptations and language-specific extensions to the basic model are implemented in STO.Lexical coverage and encoded information by category is distributed as follows:Lexical CategoryLemmas Morphology onlyMorphology + SyntaxMorphology + Syntax + SemanticsNoun64,735 47% 41%12% Adjective 9,773 32% 55%13% Verb5,775 2% 81%17% Adverb771 81% 0%0% Interjection 158 100% 0%0% Preposition 80 100% 0%0% Conjunction 60 100% 0%0% Pronoun445 100% 0%0% Misc.: Determiner, adposition, conjunction, etc.128 100% 0%0% Total 81,524 A part of this vocabulary (i.e. 12,060 lemmas) is selected from 6 domains, as follows: DomainNounsVerbsAdjectivesTotalIT 1,957 52 662,075 Environment2,055 48 285 2,388 Commerce1,53716 57 1,610 Administration2,435 25 193 2,653 Health 1,603 42 350 1,995 Finance1,258 24 57 1,339 Total 10,8452071,00812,060Linguistic coverage / Main information types:Morphology: PoS, inflectional patterns, agreement features, noun compounding, spelling variants, etc.Syntax: subcategorisation frames (categorical and functional valency), alternation, diathesis, reflexivity, etc.Semantics: encoded at three different levels of specificity. The most specific is Level 3, which contains sense distinction, ontological type, argument structure, selectional restrictions, qualia structure, event structure, domain information, etc. Level 2 is a proper subset of Level 3 representing a more lean semantics (without qualia and event structure, etc.) whereas Level 1 concerns information on source domain only.The resource was validated internally. This lexicon is well suited for NLP/HLT monolingual applications, as lexicon component in taggers, parsers, grammar & spell checkers, summarisation tools, web crawlers, computer-aided language learning, as well as multilingual applications; also possibility for linking to other PAROLE/SIMPLE-compatible resources.The lexicon is provided with a thorough documentation in English and distributed on CD-ROM.
Identifier:ELRA-L0056
ISLRN: 050-677-531-676-8
Identifier (URI):https://catalog.elra.info/en-us/repository/browse/ELRA-L0056/
Language:Danish
Language (ISO639):dan
Medium:Not specified
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):lexicon

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-L0056
DateStamp:  2005-08-01
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2005. ELRA (European Language Resources Association).
Terms: area_Europe country_DK dcmi_Text iso639_dan olac_lexicon


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-L0056
Up-to-date as of: Fri Mar 8 7:25:32 EST 2024