Word order typology through multilingual word alignment
2015 (English)In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 2015, 205-211 p.Conference paper (Refereed)
With massively parallel corpora of hundreds or thousands oftranslations of the same text, it is possible to automatically perform typological studies of language structure using very large language samples. We investigate the domain of word order using multilingual word alignment and high-precision annotation transfer in a corpus with 1144 translations in 986 languages of the New Testament. Results are encouraging, with 86% to 96% agreement between our method and the manually created WALS databasefor a range of different word order features. Beyond reproducing the categorical data in WALS and extending it to hundreds of other languages, we also provide quantitative data for the relative frequencies of different word orders, and show the usefulness of this for language comparison. Our method has applications for basic research in linguistic typology, as well as for NLP tasks like transfer learning for dependency parsing, which has been shown to benefit from word order information.
Place, publisher, year, edition, pages
2015. 205-211 p.
linguistic typology, word order typology, parallel texts, parallel corpora, word alignment, annotation transfer
Language Technology (Computational Linguistics) General Language Studies and Linguistics
Research subject Linguistics; Computational Linguistics
IdentifiersURN: urn:nbn:se:su:diva-119847OAI: oai:DiVA.org:su-119847DiVA: diva2:848836
The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing