OLAC Record
oai:lindat.mff.cuni.cz:11234/1-3719

Metadata
Title:ESIC 1.0 -- Europarl Simultaneous Interpreting Corpus
Bibliographic Citation:http://hdl.handle.net/11234/1-3719
Creator:Macháček, Dominik
Žilinec, Matúš
Bojar, Ondřej
Date (W3CDTF):2021-06-18T09:30:16Z
Date Available:2021-06-18T09:30:16Z
Description:ESIC (Europarl Simultaneous Interpreting Corpus) is a corpus of 370 speeches (10 hours) in English, with manual transcripts, transcribed simultaneous interpreting into Czech and German, and parallel translations. The corpus contains source English videos and audios. The interpreters' voices are not published within the corpus, but there is a tool that downloads them from the web of European Parliament, where they are publicly avaiable. The transcripts are equipped with metadata (disfluencies, mixing voices and languages, read or spontaneous speech, etc.), punctuated, and with word-level timestamps. The speeches in the corpus come from the European Parliament plenary sessions, from the period 2008-11. Most of the speakers are MEP, both native and non-native speakers of English. The corpus contains metadata about the speakers (name, surname, id, fraction) and about the speech (date, topic, read or spontaneous). The current version of ESIC is v1.0. It has validation and evaluation parts.
Identifier (URI):http://hdl.handle.net/11234/1-3719
Language:English
Czech
German
Language (ISO639):eng
ces
deu
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
http://creativecommons.org/licenses/by-nc-sa/4.0/
Subject:simultaneous interpreting
interpreting
ASR evaluation
automatic machine translation evaluation
Europarl
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-3719
DateStamp:  2022-08-29
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Macháček, Dominik; Žilinec, Matúš; Bojar, Ondřej. 2021. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: area_Europe country_CZ country_DE country_GB dcmi_Text iso639_ces iso639_deu iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-3719
Up-to-date as of: Thu Oct 5 0:41:22 EDT 2023