OLAC Record: Translation Equivalents Extractor

OLAC Record
oai:lindat.mff.cuni.cz:11372/LRT-1271

Metadata

Title: Translation Equivalents Extractor

Bibliographic Citation: http://hdl.handle.net/11372/LRT-1271

Creator: Tufiş, Dan

Ion, Radu

Barbu, Ana-Maria

Date (W3CDTF): 2014-07-30T21:34:03Z

Date Available: 2014-07-30T21:34:03Z

Description: TREQ exploits the knowledge embedded in the parallel corpora and produces a set of translation equivalents (a translation lexicon), based on a 1:1 mapping hypothesis. The program uses almost no linguistic knowledge, relying on statistical evidence and some simplifying assumptions. The extraction process is based on a testing approach. It generates first a list of translation equivalent candidates and then successively extracts the most likely translation equivalence pairs. It does not require a pre-existing bilingual lexicon for the considered languages. Yet, if such a lexicon exists, it can be used to eliminate spurious candidate translation equivalence pairs and thus to speed up the process and increase its accuracy. The algorithm relies on some pre-processing of the bitext: sentence aligner, tokeniser (using [[(http://www.lpl.univaix.fr/projects/multext/MtSeg|MtSeg]]), a collocation extractor (unaware of translation equivalence), POS-tagger, lemmatiser. More detailed descriptions are available in the following paper (http://www.racai.ro/~tufis/papers/): -- Dan Tufiş and Ana-Maria Barbu (2002). Revealing translators knowledge: statistical methods in constructing practical translation lexicons for language and speech processing. In International Journal of Speech Technology, volume 5, pp. 199-209. Kluwer Academic Publishers, November 2002. ISSN 1381-2416. -- Dan Tufiş (2002). A cheap and fast way to build useful translation lexicons. In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), pp. 1030-1036, Taipei, Taiwan, August 2002. ISBN 1-55860-894. -- Dan Tufiş and Ana Maria Barbu (2001). Automatic Construction of Translation Lexicons. In V.V.Kluew, C.E. D'Attellis, and N.E. Mastorakis (eds.), Advances in Automation, Multimedia and Video Systems, and Modern Computer Science, pp. 156-161. WSES Press, December 2001. ISSN 1790-5117. -- Dan Tufiş and Ana Maria Barbu (2001). Extracting Multilingual Lexicons from Parallel Corpora. In Proceedings of the ACH-ALLC conference (ACH-ALLC 2001), New York, USA, June 2001. -- Dan Tufiş and Ana Maria Barbu (2001). Accurate Automatic Extraction of Translation Equivalents from Parallel Corpora. In Paul Rayson, Andrew Wilson, Tony McEnery, Andrew Hardie, and Shereen Khoja., editors, Proceedings of the Corpus Linguistics 2001 Conference (CL 2001), pp. 581-586, Lancaster, UK, March 2001. Lancaster University, Computing Department. ISBN 1-86220-107-2.

Identifier (URI): http://hdl.handle.net/11372/LRT-1271

Language: No linguistic content

Language (ISO639): zxx

Publisher: Research Institute for Artificial Intelligence, Romanian Academy of Sciences

Type: toolService

Type (DCMI): Software

OLAC Info

Archive: LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Description: http://www.language-archives.org/archive/lindat.mff.cuni.cz

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:lindat.mff.cuni.cz:11372/LRT-1271

DateStamp: 2021-06-29

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Tufiş, Dan; Ion, Radu; Barbu, Ana-Maria. 2014. Research Institute for Artificial Intelligence, Romanian Academy of Sciences.
Terms: dcmi_Software iso639_zxx

http://www.language-archives.org/item.php/oai:lindat.mff.cuni.cz:11372/LRT-1271
Up-to-date as of: Sun May 4 0:10:54 EDT 2025

Metadata
Title:		Translation Equivalents Extractor
Bibliographic Citation:		http://hdl.handle.net/11372/LRT-1271
Creator:		Tufiş, Dan
		Ion, Radu
		Barbu, Ana-Maria
Date (W3CDTF):		2014-07-30T21:34:03Z
Date Available:		2014-07-30T21:34:03Z
Description:		TREQ exploits the knowledge embedded in the parallel corpora and produces a set of translation equivalents (a translation lexicon), based on a 1:1 mapping hypothesis. The program uses almost no linguistic knowledge, relying on statistical evidence and some simplifying assumptions. The extraction process is based on a testing approach. It generates first a list of translation equivalent candidates and then successively extracts the most likely translation equivalence pairs. It does not require a pre-existing bilingual lexicon for the considered languages. Yet, if such a lexicon exists, it can be used to eliminate spurious candidate translation equivalence pairs and thus to speed up the process and increase its accuracy. The algorithm relies on some pre-processing of the bitext: sentence aligner, tokeniser (using [[(http://www.lpl.univaix.fr/projects/multext/MtSeg\|MtSeg]]), a collocation extractor (unaware of translation equivalence), POS-tagger, lemmatiser. More detailed descriptions are available in the following paper (http://www.racai.ro/~tufis/papers/): -- Dan Tufiş and Ana-Maria Barbu (2002). Revealing translators knowledge: statistical methods in constructing practical translation lexicons for language and speech processing. In International Journal of Speech Technology, volume 5, pp. 199-209. Kluwer Academic Publishers, November 2002. ISSN 1381-2416. -- Dan Tufiş (2002). A cheap and fast way to build useful translation lexicons. In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), pp. 1030-1036, Taipei, Taiwan, August 2002. ISBN 1-55860-894. -- Dan Tufiş and Ana Maria Barbu (2001). Automatic Construction of Translation Lexicons. In V.V.Kluew, C.E. D'Attellis, and N.E. Mastorakis (eds.), Advances in Automation, Multimedia and Video Systems, and Modern Computer Science, pp. 156-161. WSES Press, December 2001. ISSN 1790-5117. -- Dan Tufiş and Ana Maria Barbu (2001). Extracting Multilingual Lexicons from Parallel Corpora. In Proceedings of the ACH-ALLC conference (ACH-ALLC 2001), New York, USA, June 2001. -- Dan Tufiş and Ana Maria Barbu (2001). Accurate Automatic Extraction of Translation Equivalents from Parallel Corpora. In Paul Rayson, Andrew Wilson, Tony McEnery, Andrew Hardie, and Shereen Khoja., editors, Proceedings of the Corpus Linguistics 2001 Conference (CL 2001), pp. 581-586, Lancaster, UK, March 2001. Lancaster University, Computing Department. ISBN 1-86220-107-2.
Identifier (URI):		http://hdl.handle.net/11372/LRT-1271
Language:		No linguistic content
Language (ISO639):		zxx
Publisher:		Research Institute for Artificial Intelligence, Romanian Academy of Sciences
Type:		toolService
Type (DCMI):		Software
OLAC Info
Archive:		LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University
Description:		http://www.language-archives.org/archive/lindat.mff.cuni.cz
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:lindat.mff.cuni.cz:11372/LRT-1271
DateStamp:		2021-06-29
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Tufiş, Dan; Ion, Radu; Barbu, Ana-Maria. 2014. Research Institute for Artificial Intelligence, Romanian Academy of Sciences.
Terms:		dcmi_Software iso639_zxx