Identifying Cross Language Term Equivalents Using Statistical Machine Translation and Distributional Association Measures
2007 (English)In: Proceedings of Nodalida 2007, the 16th Nordic Conference of Computational Linguistics / [ed] Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit, 2007Conference paper (Refereed)
This article presents a comparison of the accuracy of a number of different approaches for identifying cross language term equivalents (translations). The methods investigated are on the one hand associative measures, commonly used in word-space models or in Information Retrieval and on the other hand a Statistical Machine Translation (SMT) approach. I have performed tests on six language pairs, using the JRC-Acquis parallel corpus as training material and Eurovoc as a gold standard. The SMT approach is shown to be more effective than the associative measures. The best results are achieved by taking a weighted average of the scores of the SMT approach and disparate associative measures.
Place, publisher, year, edition, pages
Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:su:diva-15930OAI: oai:DiVA.org:su-15930DiVA: diva2:182450
Nodalida 2007, the 16th Nordic Conference of Computational Linguistics