OLAC Record
oai:catalogue.elra.info:ELRA-W0017

Metadata
Title:MULTEXT JOC Corpus
Abstract:This CD-ROM contains a part of the corpus developed in the MULTEXT project financed by the European Commission (LRE 62-050). This part contains raw, tagged and aligned data from the Written Questions and Answers of the Official Journal of the European Community. The corpus contains ca. 5 million words in English, French, German, Italian and Spanish (ca. 1 million words par language). About 800,000 words were grammatically tagged and manually checked for English, French, Italian and Spanish, i.e. roughly 200,000 words per language. The same subset for French, German, Italian and Spanish was aligned to English at the sentence level.
Access Rights:Rights available for: Research Use, Commercial Use
Date Available (W3CDTF):1998-11-23
Date Issued (W3CDTF):2004-09-14
Date Modified (W3CDTF):2013-01-24
Description:Written Corpora
This CD-ROM contains a part of the corpus developed in the MULTEXT project financed by the European Commission (LRE 62-050). This part contains raw, tagged and aligned data from the Written Questions and Answers of the Official Journal of the European Community. The corpus contains approx. 5 million words in English, French, German, Italian and Spanish (approx. 1 million words per language). About 800,000 words were grammatically tagged and manually checked for English, French, Italian and Spanish, i.e. roughly 200,000 words per language. The same subset for French, German, Italian and Spanish was aligned to English at the sentence level. The JOC corpus is delivered in Corpus Encoding Standard conformant format at each level of treatment : paragraph annotation level, conformant to the CESDOC specifications (1 M words * 5 languages); morpho-syntactic annotation level (PoS Tagging), conformant to CESANA specifications (200,000 words * 4 languages); parallel text alignment at sentence level, conformant to CESALIGN specifications (200,000 words * 4 languages). Additional information: http://www.lpl.univ-aix.fr/projects/multext
Identifier:ELRA-W0017
http://catalog.elra.info/product_info.php?products_id=534
Language:English
French
German
Italian
Spanish, Castilian
Language (ISO639):eng
fra
deu
ita
spa
Medium:CD-ROM
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0017
DateStamp:  1998-11-23
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2004. ELRA (European Language Resources Association).
Terms: area_Europe country_DE country_ES country_FR country_GB country_IT dcmi_Text iso639_deu iso639_eng iso639_fra iso639_ita iso639_spa olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0017
Up-to-date as of: Sun Nov 12 1:43:35 EST 2017