OLAC Record
oai:lindat.mff.cuni.cz:11372/LRT-1498

Metadata
Title:SoNaR Corpus
Bibliographic Citation:http://hdl.handle.net/11372/LRT-1498
Creator:Radboud University, CLST
Tilburg University, ILK
University of Twente, HMI
University College Ghent, Faculty of Translation Studies
KU Leuven, CCL
Utrecht University, UiL OTS
Date (W3CDTF):2015-06-29T13:23:32Z
Date Available:2015-06-29T13:23:32Z
Description:The SoNaR-corpus is a 500-million-word reference corpus of contemporary written Dutch and it consists of two parts, viz. SoNaR500 and SONAR1. SONAR500 contains over 500 million words (i.e. word tokens) of full texts from a wide variety of text types. All texts were tokenized, POS-tagged and lemmatized. The named entities were labelled. All annotations in SoNaR500 were automatically generated. SONAR1 is largely a subset of SONAR500 and contains 1 million words. SONAR1 was enriched with various types of semantic annotations, viz. named entity labeling, coreference resolution and annotation of spatial and temporal expressions and of semantic roles. All annotations in SONAR1 were manually verified. The new media texts (tweets, chats and SMS), which were also collected during the STEVIN project SONAR are not part of the SoNaR corpus. They are separately distributed as the SoNaR New Media Corpus.
Identifier (URI):http://hdl.handle.net/11372/LRT-1498
Language:Dutch
Language (ISO639):nld
Publisher:Dutch-Flemish HLT Agency
Subject:monolingual corpus
annotated corpus
written language
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11372/LRT-1498
DateStamp:  2016-04-06
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Radboud University, CLST; Tilburg University, ILK; University of Twente, HMI; University College Ghent, Faculty of Translation Studies; KU Leuven, CCL; Utrecht University, UiL OTS. 2015. Dutch-Flemish HLT Agency.
Terms: area_Europe country_NL dcmi_Text iso639_nld olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11372/LRT-1498
Up-to-date as of: Sun Oct 22 1:40:52 EDT 2017