OLAC Record
oai:dspace-clarin-it.ilc.cnr.it:20.500.11752/OPEN-980

Metadata
Title:It-Sr-NER: CLARIN compatible NER and geoparsing web services for parallel texts: case study Italian and Serbian
Bibliographic Citation:http://hdl.handle.net/20.500.11752/OPEN-980
Creator:Perišić, Olja
Stanković, Ranka
Vitas, Duško
Krstev, Cvetana
Moderc, Saša
Date (W3CDTF):2022-09-22T12:55:44Z
Date Available:2022-09-22T12:55:44Z
Description:It-Sr-NER-corp is the Italian/Serbian bilingual corpus with 10,000 aligned sentences compiled in the scope of the It-Sr-project from samples of several Italian novels translated to Serbian and vice versa, with the aim of the development of the CLARIN compatible NER web service for parallel text with the case study on Italian and Serbian. The set of 10,000 natural language segments is split into 4 files: 1*1000+3*3000. The corpus comprises of: 1) text versions, Italian and Serbian, with one segment per line 2) TMX (Translation Memory eXchange) bilingual aligned segments; 3) monolingual text and TMX files with automatically annotated named entities for six NER classes: demonyms (DEMO), works of art (WORK), person names (PERS), places (LOC), events (EVENT) and organizations (ORG). It-Sr-NER annotation uses a powerful Convolutional Neural Network architecture within the spaCy tool, for Italien WikiNER (Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, James R Curran) and for Serbian SrpCNNER (Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić, Branislava Šandrih Todorović).
Identifier (URI):http://hdl.handle.net/20.500.11752/OPEN-980
Language:Serbian
Italian
Language (ISO639):srp
ita
Publisher:Università degli studi di Torino
Rights:Creative Commons - Attribution 4.0 International (CC BY 4.0)
https://creativecommons.org/licenses/by/4.0
Subject:NER
TXM
Named Entity Recognition
aliged corpus
Serbian
Italian
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", National Research Council, in Pisa
Description:  http://www.language-archives.org/archive/dspace-clarin-it.ilc.cnr.it
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:dspace-clarin-it.ilc.cnr.it:20.500.11752/OPEN-980
DateStamp:  2022-09-22
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Perišić, Olja; Stanković, Ranka; Vitas, Duško; Krstev, Cvetana; Moderc, Saša. 2022. Università degli studi di Torino.
Terms: area_Europe country_IT country_RS dcmi_Text iso639_ita iso639_srp olac_primary_text


http://www.language-archives.org/item.php/oai:dspace-clarin-it.ilc.cnr.it:20.500.11752/OPEN-980
Up-to-date as of: Tue Sep 19 0:43:06 EDT 2023