OLAC Record
oai:lindat.mff.cuni.cz:11234/1-4692

Metadata
Title:ELITR Minuting Corpus
Bibliographic Citation:http://hdl.handle.net/11234/1-4692
Creator:Nedoluzhko, Anna
Singh, Muskaan
Hledíková, Marie
Tirthankar, Ghosal
Bojar, Ondřej
Date (W3CDTF):2022-05-26T09:59:11Z
Date Available:2022-05-26T09:59:11Z
Description:ELITR Minuting Corpus consists of transcripts of meetings in Czech and English, their manually created summaries ("minutes") and manual alignments between the two. Czech meetings are in the computer science and public administration domains and English meetings are in the computer science domain. Each transcript has one or multiple corresponding minutes files. Alignments are only provided for a portion of the data. This corpus contains 59 Czech and 120 English meeting transcripts, consisting of 71097 and 87322 dialogue turns respectively. For Czech meetings, we provide 147 total minutes with 55 of them aligned. For English meetings, it is 256 total minutes with 111 of them aligned. Please find a more detailed description of the data in the included README and stats.tsv files. If you use this corpus, please cite: Nedoluzhko, A., Singh, M., Hledíková, M., Ghosal, T., and Bojar, O. (2022). ELITR Minuting Corpus: A novel dataset for automatic minuting from multi-party meetings in English and Czech. In Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC-2022), Marseille, France, June. European Language Resources Association (ELRA). In print. @inproceedings{elitr-minuting-corpus:2022, author = {Anna Nedoluzhko and Muskaan Singh and Marie Hled{\'{\i}}kov{\'{a}} and Tirthankar Ghosal and Ond{\v{r}}ej Bojar}, title = {{ELITR} {M}inuting {C}orpus: {A} Novel Dataset for Automatic Minuting from Multi-Party Meetings in {E}nglish and {C}zech}, booktitle = {Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC-2022)}, year = 2022, month = {June}, address = {Marseille, France}, publisher = {European Language Resources Association (ELRA)}, note = {In print.} }
Identifier (URI):http://hdl.handle.net/11234/1-4692
Language:English
Czech
Language (ISO639):eng
ces
Publisher:Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Rights:Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
http://creativecommons.org/licenses/by-nc-sa/4.0/
Subject:summarization
minuting
meeting minutes
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:  http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:lindat.mff.cuni.cz:11234/1-4692
DateStamp:  2022-05-26
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Nedoluzhko, Anna; Singh, Muskaan; Hledíková, Marie; Tirthankar, Ghosal; Bojar, Ondřej. 2022. Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL).
Terms: area_Europe country_CZ country_GB dcmi_Text iso639_ces iso639_eng olac_primary_text


http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11234/1-4692
Up-to-date as of: Thu Oct 5 0:43:10 EDT 2023