OLAC Record
oai:lindat.mff.cuni.cz:11234/1-1593

Metadata
Title:SYN2015: representative corpus of written Czech
Bibliographic Citation:http://hdl.handle.net/11234/1-1593
Creator:Křen, Michal
Cvrček, Václav
Čapka, Tomáš
Čermáková, Anna
Hnátková, Milena
Chlumská, Lucie
Kováříková, Dominika
Jelínek, Tomáš
Petkevič, Vladimír
Procházka, Pavel
Skoumalová, Hana
Škrabal, Michal
Truneček, Petr
Vondřička, Pavel
Zasina, Adrian
Date (W3CDTF):2015-12-23T09:16:12Z
Date Available:2015-12-23T09:16:12Z
Description:Representative corpus of contemporary written Czech sized 100 MW. It was created as a representation of printed language from 2010–2014 containing a wide range of text types (fiction, professional literature, newspapers etc.). The corpus is lemmatized, morphologically and syntactically annotated by a combination of stochastic and rule-based methods. The corpus is provided in a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query interface to registered users of the CNC with one important exception: they are shuffled, i.e. divided into blocks sized max. 100 words (respecting the sentence boundaries) with ordering randomized within the given document.
Identifier (URI):http://hdl.handle.net/11234/1-1593
Language:Czech
Language (ISO639):ces
Publisher:Faculty of Arts, Institute of the Czech National Corpus, Charles University in Prague
Rights:Czech National Corpus (Shuffled Corpus Data)
https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc
Subject:representative corpus
written language
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-1593
DateStamp:  2018-07-02
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Křen, Michal; Cvrček, Václav; Čapka, Tomáš; Čermáková, Anna; Hnátková, Milena; Chlumská, Lucie; Kováříková, Dominika; Jelínek, Tomáš; Petkevič, Vladimír; Procházka, Pavel; Skoumalová, Hana; Škrabal, Michal; Truneček, Petr; Vondřička, Pavel; Zasina, Adrian. 2015. Faculty of Arts, Institute of the Czech National Corpus, Charles University in Prague.
Terms: area_Europe country_CZ dcmi_Text iso639_ces olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-1593
Up-to-date as of: Sun Jul 28 14:40:51 EDT 2019