OLAC Record
oai:lindat.mff.cuni.cz:11234/1-2579

Metadata
Title:ORTOFON v1: balanced corpus of informal spoken Czech with multi-tier transcription (transcriptions & audio)
Bibliographic Citation:http://hdl.handle.net/11234/1-2579
Creator:Kopřivová, Marie
Komrsková, Zuzana
Lukeš, David
Poukarová, Petra
Škarpová, Marie
Date (W3CDTF):2018-01-02T12:24:15Z
Date Available:2018-01-02T12:24:15Z
Description:ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole Czech Republic. The corpus is composed of 332 recordings from 2012–2017 and contains 1 014 786 orthographic words (i.e. a total of 1 236 508 tokens including punctuation); a total of 624 different speakers appear in the probes. ORTOFON v1 is fully balanced regarding the basic sociolinguistic speaker categories (gender, age group, level of education and region of childhood residence). The transcription is linked to the corresponding audio track. Unlike the ORAL-series corpora, the transcription was carried out on two main tiers, orthographic and phonetic, supplemented by an additional metalanguage tier. ORTOFON v1 is lemmatized and morphologically tagged. The (anonymized) transcriptions are provided in the XML Elan Annotation format, audio (with corresponding anonymization beeps) is in uncompressed 16-bit PCM WAV, mono, 16 kHz format. Another format option of the transcriptions is also available under less restrictive CC BY-NC-SA license at http://hdl.handle.net/11234/1-2580
Identifier (URI):http://hdl.handle.net/11234/1-2579
Language:Czech
Language (ISO639):ces
Publisher:Charles University, Faculty of Arts, Institute of the Czech National Corpus
Rights:License Agreement for Czech National Corpus Data
https://lindat.mff.cuni.cz/repository/xmlui/page/license-cnc-data
Subject:balanced corpus
spoken language
informal language
Czech
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-2579
DateStamp:  2018-07-02
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Kopřivová, Marie; Komrsková, Zuzana; Lukeš, David; Poukarová, Petra; Škarpová, Marie. 2018. Charles University, Faculty of Arts, Institute of the Czech National Corpus.
Terms: area_Europe country_CZ dcmi_Text iso639_ces olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-2579
Up-to-date as of: Fri Nov 15 9:33:19 EST 2019