OLAC Record
oai:www.clarin.si:11356/1206

Metadata
Title:Word embeddings CLARIN.SI-embed.sr 1.0
Bibliographic Citation:http://hdl.handle.net/11356/1206
Creator:Ljubešić, Nikola
Date (W3CDTF):2018-12-10T12:49:08Z
Date Available:2018-12-10T12:49:08Z
Description:CLARIN.SI-embed.sr contains word embeddings induced from the srWaC web corpus. The embeddings are based on the skip-gram model of fastText trained on 554,606,544 tokens of running text for (1) 881,150 lowercased surface forms (e.g., "srbije") and (2) 599,416 lowercased lemmas with added part-of-speech information (e.g., "srbija#Np").
Identifier (URI):http://hdl.handle.net/11356/1206
Language:Serbian
Language (ISO639):srp
Publisher:Jožef Stefan Institute
Rights:Creative Commons - Attribution 4.0 International (CC BY 4.0)
https://creativecommons.org/licenses/by/4.0/
Subject:word embeddings
lemmatisation
tagging
Serbian language
Subject (ISO639):srp
Type:lexicalConceptualResource
Type (DCMI):Text
Type (OLAC):lexicon

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1206
DateStamp:  2018-12-10
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Ljubešić, Nikola. 2018. Jožef Stefan Institute.
Terms: area_Europe country_RS dcmi_Text iso639_srp olac_lexicon

Inferred Metadata

Country: Serbia
Area: Europe


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1206
Up-to-date as of: Mon Jun 10 9:22:04 EDT 2019