OLAC Record
oai:catalogue.elra.info:ELRA-S0366

Metadata
Title:LECTRA (LECture TRAnscriptions in European Portuguese)
Abstract:This corpus is composed of the audio and the manual transcriptions from seven 1-semester University courses in Portuguese. The corpus contains a total of 28 hours of audio speech that were manually transcribed by several trained annotators. The corpus is comprised of technical University lectures.
Access Rights:Rights available for: Research Use, Commercial Use
Date Available (W3CDTF):2014-07-11
Date Issued (W3CDTF):2014-02-11
Date Modified (W3CDTF):2014-07-11
Description:Desktop/Microphone
This corpus is composed of the audio and the manual transcriptions of the LECTRA Corpus: classroom LECture TRAnscriptions in European Portuguese. The corpus includes seven 1-semester University courses. All lectures were taught at Technical University of Lisbon (IST), recorded in the presence of students, except IICT, recorded in another university and in a quiet office environment, targeting an Internet audience. The corpus contains a total of 28 hours of audio speech that were manually transcribed by several trained annotators. The corpus is comprised of technical University lectures: Production of Multimedia Contents (PMC), Economic Theory I (ETI), Linear Algebra (LA), Introduction to Informatics and Communication Techniques (IICT), Object Oriented Programming (OOP), Accounting (CONT), Graphical Interfaces (GI). Two files per lecture are provided: a) a RAW file: audio file b) a TRS file: containing the manual transcriptions. The TRS format is a kind of XML format that a standard transcription software such as Transcriber can open. Annotations in the TRS files are at word-level. They are fine-grained transcriptions that include disfluencies. The characters in the text files are encoded in ISO-8859-1 (Latin1). The TRS files have a total of 220K word tokens (Training set: 179K word tokens, Development set: 21K word tokens, Test set: 20K word tokens). The whole resource occupies 3.3 GB. For a complete description of the corpus and the report of Automatic Speech Recognition results, the reader may refer to: (Trancoso et al., 2008) Isabel Trancoso, Rui Martins, Helena Moniz, Ana Isabel Mata da Silva, Maria do C?u Guerreiro Viana Ribeiro, The LECTRA Corpus - Classroom Lecture Transcriptions in European Portuguese, In LREC 2008 - Language Resources and Evaluation Conference, Marrakesh, Morocco, May 2008. (Pellegrini et al., 2012) Thomas Pellegrini, Helena Moniz, Fernando Batista, Isabel Trancoso, Ramon Fernandez Astudillo, Extension of the LECTRA corpus: classroom LECture TRAnscriptions in European Portuguese, In SPEECH AND CORPORA, Belo Horizonte, March 2012.
Identifier:ELRA-S0366
http://catalog.elra.info/product_info.php?products_id=1221
Language:Portuguese
Language (ISO639):por
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Sound
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-S0366
DateStamp:  2014-07-11
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2014. ELRA (European Language Resources Association).
Terms: area_Europe country_PT dcmi_Sound iso639_por olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-S0366
Up-to-date as of: Thu Aug 3 1:12:30 EDT 2017