OLAC Record: European Parliament Interpretation Corpus (EPIC)

OLAC Record
oai:catalogue.elra.info:ELRA-S0323

Metadata

Title: European Parliament Interpretation Corpus (EPIC)

Access Rights: Rights available for: nonCommercialUse

Date Available (W3CDTF): 2011-11-22

Date Issued (W3CDTF): 2011-11-22

Date Modified (W3CDTF): 2016-11-18

Description: The EPIC corpus is a parallel corpus of European Parliament speeches and their corresponding simultaneous interpretations. This corpus includes source speeches in Italian, English and Spanish and interpreted speeches in all possible combinations and directions (from English into Italian and Spanish; from Italian into English and Spanish; and from Spanish into Italian and English). It contains a total of 357 speeches (177,295 words). The EPIC corpus includes video clips of each source language speaker, audio clips of the corresponding interpreted target speeches and transcripts of all the clips. The corpus has been orthographically transcribed. Annotation includes paralinguistic features (truncated, mispronounced words, ...) and metadata (a header at the beginning of each transcript and information about the speaker and the speech). The transcripts are POS (part-of-speech) tagged and lemmatised. Non-tagged transcripts in text format are also available.Size of the nine subcorpora in the EPIC corpus:sub-corpus / number of speeches / total word count/ % of EPICORG-EN (source) / 81 / 42,705 / 25INT-EN-IT (interpretation) / 81 / 35,765 / 20INT-EN-ES (interpretation) / 81 / 38,066 / 21ORG-IT (source) / 17 / 6,765 / 4INT-IT-EN (interpretation) / 17 / 6,708 / 4INT-IT-ES (interpretation) / 17 / 7,052 / 4ORG-ES (source) / 21 / 14,406 / 8INT-ES-IT (interpretation) / 21 / 12,833 / 7INT-ES-EN (interpretation) / 21 / 12,995 / 7TOTAL / 357 / 177,295 / 100The EPIC corpus was developed by a multidisciplinary research group based at the Department of Interdisciplinary Studies in Translation, Languages and Cultures (University of Bologna at Forlì), involving interpreting scholars, corpus linguists and IT technicians: Mariachiara Russo (coordinator), Claudio Bendazzoli, Cristina Monti, Annalisa Sandrelli, Marco Baroni, Silvia Bernardini, Gabriele Mack, Lorenzo Piccioni, Eros Zanchetta, Elio Ballardini, Peter Mead.

Identifier: ELRA-S0323

ISLRN: 716-168-855-843-2

Identifier (URI): https://catalog.elra.info/en-us/repository/browse/ELRA-S0323/

Language: English

Italian

Spanish; Castilian

Language (ISO639): eng

ita

spa

Medium: Not specified

Publisher: ELRA (European Language Resources Association)

Type (DCMI): Sound

MovingImage

Type (OLAC): primary_text

OLAC Info

Archive: ELRA Catalogue of Language Resources

Description: http://www.language-archives.org/archive/catalogue.elra.info

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:catalogue.elra.info:ELRA-S0323

DateStamp: 2011-11-22

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: n.a. 2011. ELRA (European Language Resources Association).
Terms: area_Europe country_ES country_GB country_IT dcmi_MovingImage dcmi_Sound iso639_eng iso639_ita iso639_spa olac_primary_text

http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-S0323
Up-to-date as of: Wed Jul 15 7:05:11 EDT 2026

Metadata
Title:		European Parliament Interpretation Corpus (EPIC)
Access Rights:		Rights available for: nonCommercialUse
Date Available (W3CDTF):		2011-11-22
Date Issued (W3CDTF):		2011-11-22
Date Modified (W3CDTF):		2016-11-18
Description:		The EPIC corpus is a parallel corpus of European Parliament speeches and their corresponding simultaneous interpretations. This corpus includes source speeches in Italian, English and Spanish and interpreted speeches in all possible combinations and directions (from English into Italian and Spanish; from Italian into English and Spanish; and from Spanish into Italian and English). It contains a total of 357 speeches (177,295 words). The EPIC corpus includes video clips of each source language speaker, audio clips of the corresponding interpreted target speeches and transcripts of all the clips. The corpus has been orthographically transcribed. Annotation includes paralinguistic features (truncated, mispronounced words, ...) and metadata (a header at the beginning of each transcript and information about the speaker and the speech). The transcripts are POS (part-of-speech) tagged and lemmatised. Non-tagged transcripts in text format are also available.Size of the nine subcorpora in the EPIC corpus:sub-corpus / number of speeches / total word count/ % of EPICORG-EN (source) / 81 / 42,705 / 25INT-EN-IT (interpretation) / 81 / 35,765 / 20INT-EN-ES (interpretation) / 81 / 38,066 / 21ORG-IT (source) / 17 / 6,765 / 4INT-IT-EN (interpretation) / 17 / 6,708 / 4INT-IT-ES (interpretation) / 17 / 7,052 / 4ORG-ES (source) / 21 / 14,406 / 8INT-ES-IT (interpretation) / 21 / 12,833 / 7INT-ES-EN (interpretation) / 21 / 12,995 / 7TOTAL / 357 / 177,295 / 100The EPIC corpus was developed by a multidisciplinary research group based at the Department of Interdisciplinary Studies in Translation, Languages and Cultures (University of Bologna at Forlì), involving interpreting scholars, corpus linguists and IT technicians: Mariachiara Russo (coordinator), Claudio Bendazzoli, Cristina Monti, Annalisa Sandrelli, Marco Baroni, Silvia Bernardini, Gabriele Mack, Lorenzo Piccioni, Eros Zanchetta, Elio Ballardini, Peter Mead.
Identifier:		ELRA-S0323
Identifier:		ISLRN: 716-168-855-843-2
Identifier (URI):		https://catalog.elra.info/en-us/repository/browse/ELRA-S0323/
Language:		English
		Italian
		Spanish; Castilian
Language (ISO639):		eng
		ita
		spa
Medium:		Not specified
Publisher:		ELRA (European Language Resources Association)
Type (DCMI):		Sound
Type (DCMI):		MovingImage
Type (OLAC):		primary_text
OLAC Info
Archive:		ELRA Catalogue of Language Resources
Description:		http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:catalogue.elra.info:ELRA-S0323
DateStamp:		2011-11-22
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		n.a. 2011. ELRA (European Language Resources Association).
Terms:		area_Europe country_ES country_GB country_IT dcmi_MovingImage dcmi_Sound iso639_eng iso639_ita iso639_spa olac_primary_text