OLAC Record
oai:catalogue.elra.info:ELRA-W0031

Metadata
Title:GeFRePaC - German French Reciprocal Parallel Corpus
Abstract:GeFRePac was produced in the framework of the LRsPGeFRePac was produced in the framework of the LRsP&P project. It contains 30 million words (15 million for each language) for the purpose of developing, enhancing and improving translation aids.
Access Rights:Rights available for: Research Use
Date Available (W3CDTF):2002-01-15
Date Issued (W3CDTF):2004-09-14
Date Modified (W3CDTF):2013-01-24
Description:Written Corpora
The German-French Reciprocal Parallel Corpus (GeFRePaC) was produced by the Multilinguale Forschung/Multilingual Research Abteilung Lexik, Institut f?r Deutsche Sprache (Germany) through a funding from ELRA in the framework of the European Commission project LRsPThe German-French Reciprocal Parallel Corpus (GeFRePaC) was produced by the Multilinguale Forschung/Multilingual Research Abteilung Lexik, Institut f?r Deutsche Sprache (Germany) through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335). The German-French Reciprocal Parallel Corpus (GeFRePaC) is a 30 million word corpus (15 million for each language) for the purpose of developing, enhancing and improving translation aids (dictionaries, lexicons, platforms) for French-German and German-French translation. The database consists of the following parallel corpora: European Union CELEX Database: Treaties, Foreign relations, Law, Complementar Law and all the published documents of the "European Parliament". Celex-Database: 22,000,000 words (German+French) (http://www.outlaw-web.com) Europarl: 8,320,000 words (German+French) (http://www.europarl.eu.int) It covers natural general language as used in public socio-political discourse and it has a focus on multilingual administration and commercial and legal documentation. GeFRePaC comprises a large variety of text types for which there is a rapidly growing need for translation but which currently defy successful machine translation. The corpus is encoded according to the PAROLE guidelines, it was aligned on the sentence level and also for single word translation units on the lexical level, POS-tagged in conformity with EAGLES recommendations and validated according to the most current version of the ELRA guidelines. The parallel German-French texts were aligned using a program developed at the Equipe Langue et Dialogue, Laboratoire Loria, Nancy. The text files containing markup for paragraphs and sentences were processed by the Tree Tagger developed at the IMS Stuttgart. The text files are automatically converted into TEI-conformant SGML format.
Identifier:ELRA-W0031
http://catalog.elra.info/product_info.php?products_id=633
Language:French
German
Language (ISO639):fra
deu
Medium:CD-ROM
Publisher:ELRA (European Language Resources Association)
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ELRA Catalogue of Language Resources
Description:  http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:catalogue.elra.info:ELRA-W0031
DateStamp:  2002-01-15
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: n.a. 2004. ELRA (European Language Resources Association).
Terms: area_Europe country_DE country_FR dcmi_Text iso639_deu iso639_fra olac_primary_text


http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0031
Up-to-date as of: Mon Oct 9 1:51:10 EDT 2017