OLAC Record
oai:www.clarin.si:11356/1261

Metadata
Title:Multilingual Culture-Independent Word Analogy Datasets
Bibliographic Citation:http://hdl.handle.net/11356/1261
Creator:Ulčar, Matej
Vaik, Kristiina
Lindström, Jessica
Linde, Dace
Dailidėnaitė, Milda
Šumakov, Andrei
Date (W3CDTF):2019-11-25T14:34:07Z
Date Available:2019-11-25T14:34:07Z
Description:Word analogy task evaluates word embeddings, based on analagous word pairs (eg. "Paris - France" should be equivalent to "Rome - Italy", "son - daughter" should be equivalent to "brother - sister"). The dataset has been inspired by Mikolov's analogy test set in English (http://download.tensorflow.org/data/questions-words.txt). It was first written for Slovenian and then partially translated and partially done from scratch for the other languages (Croatian, Finnish, Estonian, Swedish, Latvian, Lithuanian, Russian and English). The analogy dataset is composed of fifteen categories, five semantical and ten syntactical. Each dataset has about 19,000 entries. In addition to nine monolingual datasets (one for each language), we also composed 72 cross-lingual datasets (one for each language pair), where one half of the entry (one analogy, eg. "mother-father") is in one language and the other half of the entry (eg. "sister-brother") is in another language.
Identifier (URI):http://hdl.handle.net/11356/1261
Language:Slovenian
Croatian
English
Finnish
Estonian
Latvian
Lithuanian
Swedish
Russian
Language (ISO639):slv
hrv
eng
fin
est
lav
lit
swe
rus
Publisher:Faculty of Computer and Information Science, University of Ljubljana
Rights:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
https://creativecommons.org/licenses/by-sa/4.0/
Subject:analogy
word analogies
multilingual
cross-lingual
Slovenian language
Croatian language
English language
Finnish language
Estonian language
Latvian language
Lithuanian language
Swedish language
Russian language
Subject (ISO639):slv
hrv
eng
fin
est
lav
lit
swe
rus
Type:lexicalConceptualResource
Type (DCMI):Text
Type (OLAC):lexicon

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1261
DateStamp:  2019-11-25
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Ulčar, Matej; Vaik, Kristiina; Lindström, Jessica; Linde, Dace; Dailidėnaitė, Milda; Šumakov, Andrei. 2019. Faculty of Computer and Information Science, University of Ljubljana.
Terms: area_Europe country_FI country_GB country_HR country_LT country_RU country_SE country_SI dcmi_Text iso639_eng iso639_est iso639_fin iso639_hrv iso639_lav iso639_lit iso639_rus iso639_slv iso639_swe olac_lexicon

Inferred Metadata

Country: FinlandUnited KingdomCroatiaLithuaniaRussian FederationSwedenSlovenia
Area: Europe


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1261
Up-to-date as of: Sat Feb 15 9:26:23 EST 2020