OLAC Record
oai:catalogue.elra.info:ELRA-M0051

Metadata
Title:EnToSSLNE - a Lexicon of Parallel Named Entities from English to South Slavic Languages
Abstract:This lexicon consists of 26,155 parallel named entities in seven languages: English and six South Slavic ones: Bosnian, Bulgarian, Croatian, Macedonian, Serbian and Slovenian. The lexicon contains multiword entries which are not strictly named entities, but contain a word which is. Slovenian, Croatian and Bosnian are written in Latin script, Macedonian and Bulgarian in Cyrillic. Serbian language is specific since it may come in two scripts (Cyrillic and Latin) and two dialects (ekavica and ijekavica). This lexicon takes Serbian ekavica variant and its Cyrillic script. The lexicon comes in two formats: csv and xml.
Access Rights:Rights available for: Commercial Use, Research Use
Date Available (W3CDTF):2019-04-24
Date Issued (W3CDTF):2019-05-07
Date Modified (W3CDTF):2019-05-07
Description:Multilingual Lexicons
This lexicon contains multiword entries which are not strictly named entities, but contain a word which is. For example, German shepherd is an entry in this lexicon, since many dogs of this breed exist. But, the adjective German makes it a named entity in a broader sense. Accordingly, there are many multiword units in the lexicon which contain ethnonyms. Similarly, the unit Planck's law belongs to this lexicon as well. Certain natural terms like biological species and substances, which are sometimes considered named entities, are not included in the lexicon. Languages The lexicon consists of 26,155 parallel named entities in seven languages: English and six South Slavic ones: Bosnian, Bulgarian, Croatian, Macedonian, Serbian, and Slovenian. Slovenian, Croatian and Bosnian are written in Latin script, Macedonian and Bulgarian in Cyrillic. Serbian language is specific since it may come in two scripts (Cyrillic and Latin) and two dialects (ekavica and ijekavica). This lexicon takes Serbian ekavica variant and its Cyrillic script. Classification The tags used for named entities are: ORGANIZATION, LOCATION, PERSON, PRODUCT and MISC. Each named entity belongs to one of these classes. The classes comprise: ORGANIZATION: political organizations, companies, schools, rock bands, sport teams LOCATION: geographical terms, fictional places, cosmic terms PERSON: humans, gods, saints, fictional characters PRODUCT: industrial products, software products, weapons, art works, documents, concepts, standards, formats, anthems, algorithms, journals, coats of arms, platforms, websites MISC: events, languages, peoples, tribes, alliances, orders, scientific discoveries, theories, titles, currencies, holidays, dynasties, positions, projects, historical periods, competitions, deceases, breeds, programs, set of locations, awards, musical genres, missions, artistic directions, set of organizations, networks. The lexicon consists of 26,155 entries. A tag is assigned to each one of them. The distribution of classes is as follows: ORGANIZATION: 1,575 entries LOCATION: 6,327 entries PERSON: 8,584 entries PRODUCT: 1,716 entries MISC: 7,953 entries Formats The lexicon comes in two formats: csv and xml. The first row in the csv file is a title row and tab is used as a field separator, eg: German Shepherd Nem?ki ovčar Njemački ovčar Njemački ovčar Немачки овчар Германски овчар Немска овчарка MISC In the xml file, the tag denoting the class is an attribute and languages are elements, eg: German Shepherd Nem?ki ovčar
Njemački ovčar Njemački ovčar Немачки овчар Германски овчар Немска овчарка
Identifier:ELRA-M0051
http://catalog.elra.info/product_info.php?products_id=1331
Language:English
Bosnian
Bulgarian
Croatian
Macedonian
Serbian
Slovenian
Language (ISO639):eng
bos
bul
mkd
slv
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-M0051
DateStamp:  2019-04-24
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2019. ELRA (European Language Resources Association).
Terms: area_Europe country_BA country_BG country_GB country_MK country_SI dcmi_Text iso639_bos iso639_bul iso639_eng iso639_mkd iso639_slv olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-M0051
Up-to-date as of: Wed Jul 24 11:11:44 EDT 2019