OLAC Record
oai:www.clarin.si:11356/1063

Metadata
Title:Serbian web corpus srWaC 1.1
Bibliographic Citation:http://hdl.handle.net/11356/1063
Creator:Ljubešić, Nikola
Klubička, Filip
Date (W3CDTF):2016-05-12T15:32:55Z
Date Available:2016-05-12T15:32:55Z
Description:The Serbian web corpus srWaC was built by crawling the .rs top-level domain in 2014. The corpus was near-deduplicated on paragraph level, normalised via diacritic restoration, morphosyntactically annotated and lemmatised. The corpus is shuffled by paragraphs. Each paragraph contains metadata on the URL, domain and language identification (Serbian vs. Croatian). Version 1.0 of this corpus is described in http://www.aclweb.org/anthology/W14-0405. Version 1.1 contains newer and better linguistic annotations.
Identifier (URI):http://hdl.handle.net/11356/1063
Language:Serbian
Language (ISO639):srp
Publisher:Jožef Stefan Institute
Rights:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
https://creativecommons.org/licenses/by-sa/4.0/
Subject:web corpus
tagging
lemmatisation
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  Slovenian language resource repository CLARIN.SI
Description:  http://www.language-archives.org/archive/clarin.si
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:www.clarin.si:11356/1063
DateStamp:  2018-10-24
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Ljubešić, Nikola; Klubička, Filip. 2016. Jožef Stefan Institute.
Terms: area_Europe country_RS dcmi_Text iso639_srp olac_primary_text


http://www.language-archives.org/item.php/oai:www.clarin.si:11356/1063
Up-to-date as of: Wed Jul 17 9:50:24 EDT 2019