Stagger: an Open-Source Part of Speech Tagger for Swedish
2013 (English)In: Northern European Journal of Language Technology (NEJLT), ISSN 2000-1533, Vol. 3, 1-18 p.Article in journal (Refereed) Published
This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert and Weston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpus dataset, making it the best single part of speech tagging system reported for Swedish. Accuracy increases to 96.6% on the latest version of the corpus, where the annotation has been revised to increase consistency. Stagger is also evaluated on a new corpus of Swedish blog posts, investigating its out-of-domain performance.
Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2013. Vol. 3, 1-18 p.
PoS tagging, part of speech tagging, Swedish, neural language models
Language Technology (Computational Linguistics)
Research subject Linguistics; Computational Linguistics
IdentifiersURN: urn:nbn:se:su:diva-94806DOI: 10.3384/nejlt.2000-1533.1331OAI: oai:DiVA.org:su-94806DiVA: diva2:656074