OLAC Record: GeFRePaC - German French Reciprocal Parallel Corpus

OLAC Record
oai:catalogue.elra.info:ELRA-W0031

Metadata

Title: GeFRePaC - German French Reciprocal Parallel Corpus

Access Rights: Rights available for: nonCommercialUse

Date Available (W3CDTF): 2002-01-15

Date Issued (W3CDTF): 2002-01-15

Date Modified (W3CDTF): 2017-06-26

Description: The German-French Reciprocal Parallel Corpus (GeFRePaC) was produced by the Multilinguale Forschung/Multilingual Research Abteilung Lexik, Institut für Deutsche Sprache (Germany) through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335).The German-French Reciprocal Parallel Corpus (GeFRePaC) is a 30 million word corpus (15 million for each language) for the purpose of developing, enhancing and improving translation aids (dictionaries, lexicons, platforms) for French-German and German-French translation. The database consists of the following parallel corpora:European Union CELEX Database: Treaties, Foreign relations, Law, Complementar Law and all the published documents of the "European Parliament".Celex-Database: 22,000,000 words (German+French)Europarl: 8,320,000 words (German+French)It covers natural general language as used in public socio-political discourse and it has a focus on multilingual administration and commercial and legal documentation. GeFRePaC comprises a large variety of text types for which there is a rapidly growing need for translation but which currently defy successful machine translation. The corpus is encoded according to the PAROLE guidelines, it was aligned on the sentence level and also for single word translation units on the lexical level, POS-tagged in conformity with EAGLES recommendations and validated according to the most current version of the ELRA guidelines. The parallel German-French texts were aligned using a program developed at the Equipe Langue et Dialogue, Laboratoire Loria, Nancy. The text files containing markup for paragraphs and sentences were processed by the Tree Tagger developed at the IMS Stuttgart. The text files are automatically converted into TEI-conformant SGML format.

Identifier: ELRA-W0031

ISLRN: 086-761-267-762-3

Identifier (URI): https://catalog.elra.info/en-us/repository/browse/ELRA-W0031/

Language: German

French

Language (ISO639): deu

fra

Medium: downloadable

Publisher: ELRA (European Language Resources Association)

Type (DCMI): Text

Type (OLAC): primary_text

OLAC Info

Archive: ELRA Catalogue of Language Resources

Description: http://www.language-archives.org/archive/catalogue.elra.info

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:catalogue.elra.info:ELRA-W0031

DateStamp: 2002-01-15

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: n.a. 2002. ELRA (European Language Resources Association).
Terms: area_Europe country_DE country_FR dcmi_Text iso639_deu iso639_fra olac_primary_text

http://www.language-archives.org/item.php/oai:catalogue.elra.info:ELRA-W0031
Up-to-date as of: Wed Oct 1 0:56:50 EDT 2025

Metadata
Title:		GeFRePaC - German French Reciprocal Parallel Corpus
Access Rights:		Rights available for: nonCommercialUse
Date Available (W3CDTF):		2002-01-15
Date Issued (W3CDTF):		2002-01-15
Date Modified (W3CDTF):		2017-06-26
Description:		The German-French Reciprocal Parallel Corpus (GeFRePaC) was produced by the Multilinguale Forschung/Multilingual Research Abteilung Lexik, Institut für Deutsche Sprache (Germany) through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335).The German-French Reciprocal Parallel Corpus (GeFRePaC) is a 30 million word corpus (15 million for each language) for the purpose of developing, enhancing and improving translation aids (dictionaries, lexicons, platforms) for French-German and German-French translation. The database consists of the following parallel corpora:European Union CELEX Database: Treaties, Foreign relations, Law, Complementar Law and all the published documents of the "European Parliament".Celex-Database: 22,000,000 words (German+French)Europarl: 8,320,000 words (German+French)It covers natural general language as used in public socio-political discourse and it has a focus on multilingual administration and commercial and legal documentation. GeFRePaC comprises a large variety of text types for which there is a rapidly growing need for translation but which currently defy successful machine translation. The corpus is encoded according to the PAROLE guidelines, it was aligned on the sentence level and also for single word translation units on the lexical level, POS-tagged in conformity with EAGLES recommendations and validated according to the most current version of the ELRA guidelines. The parallel German-French texts were aligned using a program developed at the Equipe Langue et Dialogue, Laboratoire Loria, Nancy. The text files containing markup for paragraphs and sentences were processed by the Tree Tagger developed at the IMS Stuttgart. The text files are automatically converted into TEI-conformant SGML format.
Identifier:		ELRA-W0031
Identifier:		ISLRN: 086-761-267-762-3
Identifier (URI):		https://catalog.elra.info/en-us/repository/browse/ELRA-W0031/
Language:		German
Language:		French
Language (ISO639):		deu
Language (ISO639):		fra
Medium:		downloadable
Publisher:		ELRA (European Language Resources Association)
Type (DCMI):		Text
Type (OLAC):		primary_text
OLAC Info
Archive:		ELRA Catalogue of Language Resources
Description:		http://www.language-archives.org/archive/catalogue.elra.info
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:catalogue.elra.info:ELRA-W0031
DateStamp:		2002-01-15
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		n.a. 2002. ELRA (European Language Resources Association).
Terms:		area_Europe country_DE country_FR dcmi_Text iso639_deu iso639_fra olac_primary_text