Consistency Checking for Treebank Alignment
2010 (English)In: Proceedings of the Fourth Linguistic Annotation Workshop / [ed] Nianwen Xue and Massimo Poesio, Association for Computational Linguistics , 2010, 38-46 p.Conference paper (Refereed)
This paper explores ways to detect errors in aligned corpora, using very little technology. In the first method, applicableto any aligned corpus, we consider alignment as a string-to-string mapping. Treating the target string as a label, we examine each source string to find inconsistencies in alignment. Despite setting up the problem on a par with grammatical annotation, we demonstrate crucial differences in sorting errors from legitimate variations. The second method examines phrase nodes which are predicted to be aligned, based on the alignment of their yields. Both methods are effective in complementary ways.
Place, publisher, year, edition, pages
Association for Computational Linguistics , 2010. 38-46 p.
Language Technology (Computational Linguistics)
Research subject Computational Linguistics
IdentifiersURN: urn:nbn:se:su:diva-53545OAI: oai:DiVA.org:su-53545DiVA: diva2:390905