OLAC Record

Title:Etalon 1.0
Bibliographic Citation:http://hdl.handle.net/11234/1-3698
Creator:Skoumalová, Hana
Date (W3CDTF):2021-06-02T11:23:22Z
Date Available:2021-06-02T11:23:22Z
Description:Etalon is a manually annotated corpus of contemporary Czech. The corpus contains 1,885,589 words (2,265,722 tokens) and is annotated in the same way as SYN2020 of the Czech National Corpus. The corpus includes fiction (ca 24%), professional and scientific literature (ca 40%) and newspapers (ca 36%). The corpus is provided in a vertical format, where sentence boundaries are marked with a blank line. Every word form is written on a separate line, followed by five tab-separated attributes: syntactic word, lemma, sublemma, tag and verbtag. The texts are shuffled in random chunks of 100 words at maximum (respecting sentence boundaries).
Identifier (URI):http://hdl.handle.net/11234/1-3698
Language (ISO639):ces
Publisher:Charles University, Faculty of Arts, Institute of Theoretical and Computational Linguistics
Rights:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Subject:annotated corpus
morphological annotation
Type (DCMI):Text
Type (OLAC):primary_text


Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-3698
DateStamp:  2021-06-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Skoumalová, Hana. 2021. Charles University, Faculty of Arts, Institute of Theoretical and Computational Linguistics.
Terms: area_Europe country_CZ dcmi_Text iso639_ces olac_primary_text

Up-to-date as of: Thu Oct 5 0:41:21 EDT 2023