OLAC Record

Title:Spanish-English website parallel corpus (Processed)
Access Rights: Rights available for: other
Date Available (W3CDTF):2020-02-27
Date Issued (W3CDTF):2020-02-27
Date Modified (W3CDTF):2018-10-12
Description:This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu.This is a parallel corpus of bilingual texts crawled from multilingual websites, which contains 21,007 TUs.Period of crawling : 15/11/2016 - 23/01/2017A strict validation process has been followed, which resulted in discarding:- TUs from crawled websites that do not comply to the PSI directive,- TUs with more than 99% of mispelled tokens,- TUs identified during the manual validation process and all the TUs from websites whose error rate in the sample extracted for manual validation is strictly above the following thresholds:50% of TUs with language identification errors,50% of TUs with alignment errors,50% of TUs with tokenization errors,20% of TUs identified as machine translated content,50% of TUs with translation errors.
ISLRN: 664-503-904-200-9
Identifier (URI):http://catalog.elra.info/en-us/repository/browse/ELRA-W0248/
Spanish; Castilian
Language (ISO639):eng
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text


Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0248
DateStamp:  2020-02-27
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2020. ELRA (European Language Resources Association).
Terms: area_Europe country_ES country_GB dcmi_Text iso639_eng iso639_spa olac_primary_text

Up-to-date as of: Wed Nov 17 9:16:00 EST 2021