Inferring the location of authors from words in their texts
2015 (English)In: Proceedings of the 20th Nordic Conference of Computational Linguistics: NODALIDA 2015 / [ed] Beáta Megyesi, Linköping: Linköping University Electronic Press, ACL Anthology , 2015, 211-218 p.Conference paper (Refereed)
For the purposes of computational dialectology or other geographically bound text analysis tasks, texts must be annotated with their or their authors' location. Many texts are locatable but most have no ex- plicit annotation of place. This paper describes a series of experiments to determine how positionally annotated microblog posts can be used to learn location indicating words which then can be used to locate blog texts and their authors. A Gaussian distribution is used to model the locational qualities of words. We introduce the notion of placeness to describe how locational words are.
We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating locational information in a centroid for each text gives the most useful results. The results are applied to data in the Swedish language.
Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, ACL Anthology , 2015. 211-218 p.
, Linköping Electronic Conference Proceedings, ISSN 1650-3638 ; 109
General Language Studies and Linguistics
Research subject Computational Linguistics
IdentifiersURN: urn:nbn:se:su:diva-127529ISBN: 978-91-7519-098-3OAI: oai:DiVA.org:su-127529DiVA: diva2:909564
20th Nordic Conference of Computational Linguistics (NODALIDA 2015), Vilnius, Lithuania, May 11–13, 2015
ProjectsSINUS (Spridning av innovationer i nutida svenska)
FunderSwedish Research Council