OLAC Record
oai:www.clarin.si:11356/1277

Metadata
Title:ELMo embeddings models for seven languages
Bibliographic Citation:http://hdl.handle.net/11356/1277
Creator:Ulčar, Matej
Date (W3CDTF):2019-11-25T14:34:36Z
Date Available:2019-11-25T14:34:36Z
Description:ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on large monolingual corpora for 7 languages: Slovenian, Croatian, Finnish, Estonian, Latvian, Lithuanian and Swedish. Each language's model was trained for approximately 10 epochs. Corpora sizes used in training range from over 270 M tokens in Latvian to almost 2 B tokens in Croatian. About 1 million most common tokens were provided as vocabulary during the training for each language model. The model can also infer OOV words, since the neural network input is on the character level. Each model is in its own .tar.gz archive, consisting of two files: pytorch weights (.hdf5) and options (.json). Both are needed for model inference, using allennlp (https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md) python library.
Identifier (URI):http://hdl.handle.net/11356/1277
Language:Slovenian
Croatian
Finnish
Estonian
Latvian
Lithuanian
Swedish
Language (ISO639):slv
hrv
fin
est
lav
lit
swe
Publisher:Faculty of Computer and Information Science, University of Ljubljana
Replaces (URI):http://hdl.handle.net/11356/1257
Rights:GNU General Public Licence, version 3
http://opensource.org/licenses/GPL-3.0
Subject:ELMo
contextual embeddings
word embeddings
Slovenian language
Croatian language
Finnish language
Estonian language
Latvian language
Lithuanian language
Swedish language
Subject (ISO639):slv
hrv
fin
est
lav
lit
swe
Type:languageDescription
Type (DCMI):Text
Type (OLAC):language_description

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1277
DateStamp:  2019-11-25
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Ulčar, Matej. 2019. Faculty of Computer and Information Science, University of Ljubljana.
Terms: area_Europe country_FI country_HR country_LT country_SE country_SI dcmi_Text iso639_est iso639_fin iso639_hrv iso639_lav iso639_lit iso639_slv iso639_swe olac_language_description

Inferred Metadata

Country: FinlandCroatiaLithuaniaSwedenSlovenia
Area: Europe


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1277
Up-to-date as of: Sat Feb 15 9:26:25 EST 2020