OLAC Record

Title:EUROPARL Corpus Parallel Corpora: Portuguese-English
Access Rights: Rights available for: nonCommercialUse, commercialUse
Date Available (W3CDTF):2016-01-20
Date Issued (W3CDTF):2016-01-20
Date Modified (W3CDTF):2016-01-20
Description:The EUROPARL Corpus (Portuguese-English subpart of the parallel corpora), was extracted from the proceedings of the European Parliament. It contains transcriptions of sessions dating back from 1996 to 2011, with a total of approximately 58,324,562 tokens of European Portuguese (L1) and 49,216,896 tokens of English (translation). The EUROPARL Corpus is composed of one text file for the English corpus and two files for the Portuguese version: a text file and an annotated file. The text version contains plain text and no further annotation. The Portuguese annotated file is a four-column file with one token per line, followed by a PoS tag and a lemma. The corpus was automatically PoS-tagged with MBT tagger (http://ilk.uvt.nl/mbt/), and lemmatized with MBLEM (http://ilk.uvt.nl/mbma/), following the annotation scheme of the Corpus of Reference of Contemporary Portuguese.
ISLRN: 435-502-922-727-2
Identifier (URI):http://catalog.elra.info/en-us/repository/browse/ELRA-W0090/
Language (ISO639):eng
Medium:Not specified
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text


Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0090
DateStamp:  2016-01-20
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2016. ELRA (European Language Resources Association).
Terms: area_Europe country_GB country_PT dcmi_Text iso639_eng iso639_por olac_primary_text

Up-to-date as of: Wed Nov 17 9:08:55 EST 2021