OLAC Record
oai:lindat.mff.cuni.cz:11234/1-2517

Metadata
Title:FicTree 1.0
Bibliographic Citation:http://hdl.handle.net/11234/1-2517
Creator:Jelínek, Tomáš
Hnátková, Milena
Skoumalová, Hana
Date (W3CDTF):2017-11-15T19:20:19Z
Date Available:2017-11-15T19:20:19Z
Description:FicTree is a dependency treebank of Czech fiction manually annotated in the format of the analytical layer of the Prague Dependency Trebank. The treebank consists of 12,760 sentences (166,432 tokens). The texts come from eight literary works published in the Czech Republic between 1991 and 2007. The syntactic annotation of the treebank was first performed by two distinct parsers (MSTParser and MaltParser) trained on the PDT training data, then manually corrected. Any differences between the two versions were resolved manually (by another annotator). The corpus is provided in a vertical format, where sentence boundaries are marked with a blank line. Every word form is written on a separate line, followed by five tab-separated attributes: lemma, tag, ID (word index in the sentence), head and deprel (analytical function, afun in the PDT formalism). The texts are shuffled in random chunks of maximum 100 words (respecting sentence boundaries). Each chunk is provided as a separate file, with the suggested division into train, dev and test sets written as file prefix.
Identifier (URI):http://hdl.handle.net/11234/1-2517
Language:Czech
Language (ISO639):ces
Publisher:Charles University, Faculty of Arts, Institute of Theoretical and Computational Linguistics
Rights:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
http://creativecommons.org/licenses/by-nc-sa/4.0/
Subject:treebank
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-2517
DateStamp:  2018-07-02
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Jelínek, Tomáš; Hnátková, Milena; Skoumalová, Hana. 2017. Charles University, Faculty of Arts, Institute of Theoretical and Computational Linguistics.
Terms: area_Europe country_CZ dcmi_Text iso639_ces olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-2517
Up-to-date as of: Sun Sep 1 18:24:44 EDT 2019