Endre søk
Begrens søket
1 - 39 of 39
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1. Berggren, Max
    et al.
    Karlgren, Jussi
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Parkvall, Mikael
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Inferring the location of authors from words in their texts2015Inngår i: Proceedings of the 20th Nordic Conference of Computational Linguistics: NODALIDA 2015 / [ed] Beáta Megyesi, Linköping: Linköping University Electronic Press, ACL Anthology , 2015, s. 211-218Konferansepaper (Fagfellevurdert)
    Abstract [en]

    For the purposes of computational dialectology or other geographically bound text analysis tasks, texts must be annotated with their or their authors' location. Many texts are locatable but most have no ex- plicit annotation of place. This paper describes a series of experiments to determine how positionally annotated microblog posts can be used to learn location indicating words which then can be used to locate blog texts and their authors. A Gaussian distribution is used to model the locational qualities of words. We introduce the notion of placeness to describe how locational words are.

    We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating locational information in a centroid for each text gives the most useful results. The results are applied to data in the Swedish language.

  • 2. Bjerva, Johannes
    et al.
    Grigonyte, Gintare
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Plank, Barbara
    Neural Networks and Spelling Features for Native Language Identification2017Inngår i: The Twelfth Workshop on Innovative Use of NLP for Building Educational Applications: Proceedings of the Workshop, Association for Computational Linguistics, 2017, s. 235-239Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present the RUG-SU team's submission at the Native Language Identification Shared Task 2017. We combine several approaches into an ensemble, based on spelling error features, a simple neural network using word representations, a deep residual network using word and character features, and a system based on a recurrent neural network. Our best system is an ensemble of neural networks, reaching an F1 score of 0.8323. Although our system is not the highest ranking one, we do outperform the baseline by far.

  • 3. Bjerva, Johannes
    et al.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Cross-lingual Learning of Semantic Textual Similarity with Multilingual Word Representations2017Inngår i: Proceedings of the 21st Nordic Conference on Computational Linguistics / [ed] Jörg Tiedemann, Linköping: Linköping University Electronic Press, 2017, s. 211-215, artikkel-id 024Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Assessing the semantic similarity between sentences in different languages is challenging. We approach this problem by leveraging multilingual distributional word representations, where similar words in different languages are close to each other. The availability of parallel data allows us to train such representations on a large amount of languages. This allows us to leverage semantic similarity data for languages for which no such data exists. We train and evaluate on five language pairs, including English, Spanish, and Arabic. We are able to train wellperforming systems for several language pairs, without any labelled data for that language pair.

  • 4. Bjerva, Johannes
    et al.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Han Veiga, Maria
    Tiedemann, Jörg
    Augenstein, Isabelle
    What Do Language Representations Really Represent?2019Inngår i: Computational linguistics - Association for Computational Linguistics (Print), ISSN 0891-2017, E-ISSN 1530-9312, Vol. 45, nr 2, s. 381-389Artikkel i tidsskrift (Annet vitenskapelig)
    Abstract [en]

    A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just as it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation similarity, whereas genetic relationships—a convenient benchmark used for evaluation in previous work—appears to be a confounding factor. Apart from implications about translation effects, we see this more generally as a case where NLP and linguistic typology can interact and benefit one another.

  • 5.
    Börstell, Carl
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Hörberg, Thomas
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Distribution and duration of signs and parts of speech in Swedish Sign Language2016Inngår i: Sign Language and Linguistics, ISSN 1387-9316, E-ISSN 1569-996X, Vol. 19, nr 2, s. 143-196Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In this paper, we investigate frequency and duration of signs and parts of speech in Swedish Sign Language (SSL) using the SSL Corpus. The duration of signs is correlated with frequency, with high-frequency items having shorter duration than low-frequency items. Similarly, function words (e.g. pronouns) have shorter duration than content words (e.g. nouns). In compounds, forms annotated as reduced display shorter duration. Fingerspelling duration correlates with word length of corresponding Swedish words, and frequency and word length play a role in the lexicalization of fingerspellings. The sign distribution in the SSL Corpus shows a great deal of cross-linguistic similarity with other sign languages in terms of which signs appear as high-frequency items, and which categories of signs are distributed across text types (e.g. conversation vs. narrative). We find a correlation between an increase in age and longer mean sign duration, but see no significant difference in sign duration between genders.

  • 6.
    Börstell, Carl
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Iconic Locations in Swedish Sign Language: Mapping Form to Meaning with Lexical Databases2017Inngår i: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa / [ed] Jörg Tiedemann, Linköping: Linköping University Electronic Press, 2017, s. 221-225, artikkel-id 026Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper, we describe a method for mapping the phonological feature location of Swedish Sign Language (SSL) signs to the meanings in the Swedish semantic dictionary SALDO. By doing so, we observe clear differences in the distribution of meanings associated with different locations on the body. The prominence of certain locations for specific meanings clearly point to iconic mappings between form and meaning in the lexicon of SSL, which pinpoints modalityspecific properties of the visual modality.

  • 7.
    Börstell, Carl
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Östling, Robert
    University of Helsinki, Finland.
    Visualizing Lects in a Sign Language Corpus: Mining Lexical Variation Data in Lects of Swedish Sign Language2016Inngår i: Workshop Proceedings: 7th Workshop on the Representation and Processing of Sign Languages: Corpus Mining / [ed] Eleni Efthimiou, Stavroula-Evita Fotinea, Thomas Hanke, Julie Hochgesang, Jette Kristoffersen, Johanna Mesch, Paris: ELRA , 2016, s. 13-18Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper, we discuss the possibilities for mining lexical variation data across (potential) lects in Swedish Sign Language (SSL). The data come from the SSL Corpus (SSLC), a continuously expanding corpus of SSL, its latest release containing 43 307 annotated sign tokens, distributed over 42 signers and 75 time-aligned video and annotation files. After extracting the raw data from the SSLC annotation files, we created a database for investigating lexical distribution/variation across three possible lects, by merging the raw data with an external metadata file, containing information about the age, gender, and regional background of each of the 42 signers in the corpus. We go on to present a first version of an easy-to-use graphical user interface (GUI) that can be used as a tool for investigating lexical variation across different lects, and demonstrate a few interesting finds. This tool makes it easier for researchers and non-researchers alike to have the corpus frequencies for individual signs visualized in an instant, and the tool can easily be updated with future expansions of the SSLC.

  • 8. Cap, Fabienne
    et al.
    Adesam, Yvonne
    Ahrenberg, Lars
    Borin, Lars
    Bouma, Gerlof
    Forsberg, Markus
    Kann, Viggo
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Smith, Aaron
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Nivre, Joakim
    SWORD: Towards Cutting-Edge Swedish Word Processing2016Inngår i: Proceedings of SLTC 2016, 2016Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Despite many years of research on Swedish language technology, there is still no well-documented standard for Swedish word processing covering the whole spectrum from low-level tokenization to morphological analysis and disambiguation. SWORD is a new initiative within the SWE-CLARIN consortium aiming to develop documented standards for Swedish word processing. In this paper, we report on a pilot study of Swedish tokenization, where we compare the output of six different tokenizers on four different text types. For one text type (Wikipedia articles), we also compare to the tokenization produced by six manual annotators.

  • 9.
    Dalianis, Hercules
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Östling, RobertStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.Weegar, RebeckaStockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.Wirén, MatsStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Special Issue of Selected Contributions from the Seventh Swedish Language Technology Conference (SLTC 2018)2019Konferanseproceedings (Annet vitenskapelig)
    Abstract [en]

    This Special Issue contains three papers that are extended versions of abstracts presented at the Seventh Swedish Language Technology Conference (SLTC 2018), held at Stockholm University 8–9 November 2018.1 SLTC 2018 received 34 submissions, of which 31 were accepted for presentation. The number of registered participants was 113, including both attendees at SLTC 2018 and two co-located workshops that took place on 7 November. 32 participants were internationally affiliated, of which 14 were from outside the Nordic countries. Overall participation was thus on a par with previous editions of SLTC, but international participation was higher.

  • 10.
    Ek, Adam
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Nilsson Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Grigonytė, Gintarė
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Gustafson Capková, Sofia
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Identifying Speakers and Addressees in Dialogues Extracted from Literary Fiction2018Inngår i: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) / [ed] Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga, European Language Resources Association, 2018, s. 817-824Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper describes an approach to identifying speakers and addressees in dialogues extracted from literary fiction, along with a dataset annotated for speaker and addressee. The overall purpose of this is to provide annotation of dialogue interaction between characters in literary corpora in order to allow for enriched search facilities and construction of social networks from the corpora. To predict speakers and addressees in a dialogue, we use a sequence labeling approach applied to a given set of characters. We use features relating to the current dialogue, the preceding narrative, and the complete preceding context. The results indicate that even with a small amount of training data, it is possible to build a fairly accurate classifier for speaker and addressee identification across different authors, though the identification of addressees is the more difficult task.

  • 11.
    Kurfali, Murathan
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Noisy Parallel Corpus Filtering through Projected Word Embeddings2019Inngår i: Proceedings of the Fourth Conference on Machine Translation (WMT), Association for Computational Linguistics, 2019, Vol. 3, s. 279-283Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present a very simple method for parallel text cleaning of low-resource languages, based on projection of word embeddings trained on large monolingual corpora in high-resource languages. In spite of its simplicity, we approach the strong baseline system in the downstream machine translation evaluation.

  • 12.
    Kurfali, Murathan
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Zero-shot transfer for implicit discourse relation classification2019Inngår i: 20th Annual Meeting of the Special Interest Group on Discourse and Dialogue: Proceedings of the Conference, Association for Computational Linguistics, 2019, s. 226-231Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Automatically classifying the relation between sentences in a discourse is a challenging task, in particular when there is no overt expression of the relation. It becomes even more challenging by the fact that annotated training data exists only for a small number of languages, such as English and Chinese. We present a new system using zero-shot transfer learning for implicit discourse relation classification, where the only resource used for the target language is unannotated parallel text. This system is evaluated on the discourse-annotated TEDMDB parallel corpus, where it obtains good results for all seven languages using only English training data.

  • 13.
    Loftsson, Hrafn
    et al.
    Reykjaviks universitet, Island.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Tagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic2013Inngår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013), Linköping University Electronic Press, Linköpings universitet, 2013, s. 105-119Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper, we experiment with using Stagger, an open-source implementation of an Averaged Perceptron tagger, to tag Icelandic, a morphologically complex language. By adding languagespecific linguistic features and using IceMorphy, an unknown word guesser, we obtain state-of- the-art tagging accuracy of 92.82%. Furthermore, by adding data from a morphological database, and word embeddings induced from an unannotated corpus, the accuracy increases to 93.84%. This is equivalent to an error reduction of 5.5%, compared to the previously best tagger for Icelandic, consisting of linguistic rules and a Hidden Markov Model.

  • 14.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Björkstrand, Thomas
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Grigonyté, Gintaré
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Gustafson-Capková, Sofia
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Mesch, Johanna
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Schönström, Krister
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för svenska som andraspråk för döva.
    Wallin, Lars
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    SWE-CLARIN partner presentation: Natural Language Processing Resources from the Department of Linguistics, Stockholm University2014Inngår i: The first Swedish national SWE-CLARIN workshop: LT-based e-HSS in Sweden – taking stock and looking ahead / [ed] Lars Borin, 2014Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    The aim of the CLARIN Research Infrastructure and SWE-CLARIN is to facilitate for scholars in the humanities and social sciences to access primary data in the form of natural language, and to provide tools for exploring, annotating and analysing these data. This paper gives an overview of the resources and tools developed at the Department of Linguistics at Stockholm University planned to be made available within the SWE-CLARIN project. The paper also outlines our collaborations with neighbouring areas in the humanities and social sciences where these resources and tools will be put to use.

  • 15.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Modelling the informativeness and timing of non-verbal cues in parent–child interaction2016Inngår i: The 54th Annual Meeting of the Association for Computational Linguistics: Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning, Stroudsburg, PA, USA: Association for Computational Linguistics, 2016, s. 82-90Konferansepaper (Fagfellevurdert)
    Abstract [en]

    How do infants learn the meanings of their first words? This study investigates the informativeness and temporal dynamics of non-verbal cues that signal the speaker's referent in a model of early word–referent mapping. To measure the information provided by such cues, a supervised classifier is trained on information extracted from a multimodally annotated corpus of 18 videos of parent–child interaction with three children aged 7 to 33 months. Contradicting previous research, we find that gaze is the single most informative cue, and we show that this finding can be attributed to our fine-grained temporal annotation. We also find that offsetting the timing of the non-verbal cues reduces accuracy, especially if the offset is negative. This is in line with previous research, and suggests that synchrony between verbal and non-verbal cues is important if they are to be perceived as causally related.

  • 16.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Modelling the informativeness of different modalities in parent-child interaction2015Inngår i: Workshop on Extensive and Intensive Recordings of Children's Language Environment / [ed] Alex Cristia, Melanie Soderstrom, 2015Konferansepaper (Fagfellevurdert)
  • 17.
    Sjons, Johan
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Hörberg, Thomas
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Bjerva, Johannes
    Articulation rate in Swedish child-directed speech increases as a function of the age of the child even when surprisal is controlled for2017Inngår i: / [ed] Francisco Lacerda, David House, Mattias Heldner, Joakim Gustafson, Sofia Strömbergsson, Marcin Włodarczak, The International Speech Communication Association (ISCA), 2017, s. 1794-1798Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In earlier work, we have shown that articulation rate in Swedish child-directed speech (CDS) increases as a function of the age of the child, even when utterance length and differences in articulation rate between subjects are controlled for. In this paper we show on utterance level in spontaneous Swedish speech that i) for the youngest children, articulation rate in CDS is lower than in adult-directed speech (ADS), ii) there is a significant negative correlation between articulation rate and surprisal (the negative log probability) in ADS, and iii) the increase in articulation rate in Swedish CDS as a function of the age of the child holds, even when surprisal along with utterance length and differences in articulation rate between speakers are controlled for. These results indicate that adults adjust their articulation rate to make it fit the linguistic capacity of the child.

  • 18. Tjong Kim Sang, Erik
    et al.
    Bollmann, Marcel
    Boschker, Remko
    Casacuberta, Francisco
    Dietz, Feike
    Dipper, Stefanie
    Domingo, Miguel
    van der Goot, Robe
    van Koppen, Marjo
    Ljubešić, Nikola
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Petran, Florian
    Pettersson, Eva
    Scherrer, Yves
    Schraagen, Marijn
    Sevens, Leen
    Tiedemann, Jörg
    Vanallemeersch, Tom
    Zervanou, Kalliopi
    The CLIN27 Shared Task: Translating Historical Text to Contemporary Language for Improving Automatic Linguistic Annotation2017Inngår i: Computational Linguistics in the Netherlands Journal, ISSN 2211-4009, Vol. 7, s. 53-64Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The CLIN27 shared task evaluates the effect of translating historical text to modern text with the goal of improving the quality of the output of contemporary natural language processing tools applied to the text. We focus on improving part-of-speech tagging analysis of seventeenth-century Dutch. Eight teams took part in the shared task. The best results were obtained by teams employing character-based machine translation. The best system obtained an error reduction of 51% in comparison with the baseline of tagging unmodified text. This is close to the error reduction obtained by human translation (57%).

  • 19.
    Wirén, Mats
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    N. Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Modelling the Informativeness of Non-Verbal Cues in Parent–Child Interaction2017Inngår i: Proceedings of Interspeech 2017 / [ed] Francisco Lacerda, David House, Mattias Heldner, Joakim Gustafson, Sofia Strömbergsson, Marcin Włodarczak, The International Speech Communication Association (ISCA), 2017, s. 2203-2207Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Non-verbal cues from speakers, such as eye gaze and hand positions, play an important role in word learning. This is consistent with the notion that for meaning to be reconstructed, acoustic patterns need to be linked to time-synchronous patterns from at least one other modality. In previous studies of a multimodally annotated corpus of parent–child interaction, we have shown that parents interacting with infants at the early word-learning stage (7–9 months) display a large amount of time-synchronous patterns, but that this behaviour tails off with increasing age of the children. Furthermore, we have attempted to quantify the informativeness of the different nonverbal cues, that is, to what extent they actually help to discriminate between different possible referents, and how critical the timing of the cues is. The purpose of this paper is to generalise our earlier model by quantifying informativeness resulting from non-verbal cues occurring both before and after their associated verbal references.

  • 20.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik. University of Helsinki, Finland.
    A Bayesian model for joint word alignment and part-of-speech transfer2016Inngår i: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan: Association for Computational Linguistics, 2016, s. 620-629Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Current methods for word alignment require considerable amounts of parallel text to deliver accurate results, a requirement which is met only for a small minority of the world’s approximately 7,000 languages. We show that by jointly performing word alignment and annotation transfer in a novel Bayesian model, alignment accuracy can be improved for language pairs where annotations are available for only one of the languages—a finding which could facilitate the study and processing of a vast number of low-resource languages. We also present an evaluation where our method is used to perform single-source and multi-source part-of-speech transfer with 22 translations of the same text in four different languages. This allows us to quantify the considerable variation in accuracy depending on the specific source text(s) used, even with different translations into the same language.

  • 21.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    A Construction Grammar Method for Disambiguating Swedish Compounds2010Inngår i: SLTC 2010 Workshop on Compounds and Multiword Expressions, 2010Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This study discusses the structure of Swedish compounds within the framework of Construction Grammar, and applies the result to Word Sense Disambiguation of compound components. A construction-based approach is shown to achieve significantly better results than a set of baselines.

  • 22.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Bayesian Models for Multilingual Word Alignment2015Doktoravhandling, monografi (Annet vitenskapelig)
    Abstract [en]

    In this thesis I explore Bayesian models for word alignment, how they can be improved through joint annotation transfer, and how they can be extended to parallel texts in more than two languages. In addition to these general methodological developments, I apply the algorithms to problems from sign language research and linguistic typology.

    In the first part of the thesis, I show how Bayesian alignment models estimated with Gibbs sampling are more accurate than previous methods for a range of different languages, particularly for languages with few digital resources available—which is unfortunately the state of the vast majority of languages today. Furthermore, I explore how different variations to the models and learning algorithms affect alignment accuracy.

    Then, I show how part-of-speech annotation transfer can be performed jointly with word alignment to improve word alignment accuracy. I apply these models to help annotate the Swedish Sign Language Corpus (SSLC) with part-of-speech tags, and to investigate patterns of polysemy across the languages of the world.

    Finally, I present a model for multilingual word alignment which learns an intermediate representation of the text. This model is then used with a massively parallel corpus containing translations of the New Testament, to explore word order features in 1001 languages.

  • 23.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Bayesian Word Alignment for Massively Parallel Texts2014Inngår i: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers, Association for Computational Linguistics, 2014, s. 123-127Konferansepaper (Fagfellevurdert)
    Abstract [en]

    There has been a great amount of work done in the field of bitext alignment, but the problem of aligning words in massively parallel texts with hundreds or thousands of languages is largely unexplored. While the basic task is similar, there are also important differences in purpose, method and evaluation between the problems. In this work, I present a non-parametric Bayesian model that can be used for simultaneous word alignment in massively parallel corpora. This method is evaluated on a corpus containing 1144 translations of the New Testament.

  • 24.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Part of Speech Tagging: Shallow or Deep Learning?2018Inngår i: Northern European Journal of Language Technology (NEJLT), ISSN 2000-1533, Vol. 5, nr 1, s. 1-15Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Deep neural networks have advanced the state of the art in numerous fields, but they generally suffer from low computational efficiency and the level of improvement compared to more efficient machine learning models is not always significant. We perform a thorough PoS tagging evaluation on the Universal Dependencies treebanks, pitting a state-of-the-art neural network approach against UDPipe and our sparse structured perceptron-based tagger, efselab. In terms of computational efficiency, efselab is three orders of magnitude faster than the neural network model, while being more accurate than either of the other systems on 47 of 65 treebanks.

  • 25.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Stagger: A modern POS tagger for Swedish2012Inngår i: / [ed] Pierre Nugues, 2012Konferansepaper (Fagfellevurdert)
  • 26.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Stagger: an Open-Source Part of Speech Tagger for Swedish2013Inngår i: Northern European Journal of Language Technology (NEJLT), ISSN 2000-1533, Vol. 3, s. 1-18Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert and Weston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpus dataset, making it the best single part of speech tagging system reported for Swedish. Accuracy increases to 96.6% on the latest version of the corpus, where the annotation has been revised to increase consistency. Stagger is also evaluated on a new corpus of Swedish blog posts, investigating its out-of-domain performance.

  • 27.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Studying colexification through massively parallell corpora2016Inngår i: The Lexical Typology of Semantic Shifts / [ed] Päivi Juvonen, Maria Koptjevskaja-Tamm, Berlin: Walter de Gruyter, 2016, s. 157-176Kapittel i bok, del av antologi (Fagfellevurdert)
    Abstract [en]

    Large-sample studies in lexical typology are limited by whatever lexical information is available or can be obtained for all the languages in the study. Various types of word lists, from simple Swadesh lists to large dictionaries, can be used for this purpose. Unfortunately, these resources often present only a very fragmentary view of a given language’s vocabulary. As a complement, we propose an additional source of lexical information: parallel texts. Books such as the New Testament have been translated into thousands of languages, and it is possible to automatically extract word lists from their vocabulary, which can then be applied to lexical typological studies. In particular, we focus on studying colexification using a sample of 1 001 different languages, based on 1 142 translations of the New Testament. We find that although the automatically extracted word lists contain errors, their quality can be sufficiently good to find real areal patterns, such as the ‘tree’/’fire’ colexification that is widespread in the Sahul area.

  • 28.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Svenska dialektkartor på sekunden2015Inngår i: Språkbruk, ISSN 0358-9293, Vol. 3, s. 10-13Artikkel i tidsskrift (Annet (populærvitenskap, debatt, mm))
  • 29.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Word order typology through multilingual word alignment2015Inngår i: The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: Proceedings of the Conference, Volume 2: Short Papers, 2015, s. 205-211Konferansepaper (Fagfellevurdert)
    Abstract [en]

    With massively parallel corpora of hundreds or thousands of translations of the same text, it is possible to automatically perform typological studies of language structure using very large language samples. We investigate the domain of wordorder using multilingual word alignment and high-precision annotation transfer in a corpus with 1144 translations in 986 languages of the New Testament. Results are encouraging, with 86% to 96% agreementbetween our method and the manually created WALS database for a range of different word order features. Beyond reproducing the categorical data in WALS and extending it to hundreds of other languages, we also provide quantitative data for therelative frequencies of different word orders, and show the usefulness of this for language comparison. Our method has applications for basic research in linguistic typology, as well as for NLP tasks like transfer learning for dependency parsing, which has been shown to benefit from word order information.

  • 30.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Bjerva, Johannes
    SU-RUG at the CoNLL-SIGMORPHON 2017 shared task: Morphological inflection with attentional sequence-to-sequence models2017Inngår i: Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection / [ed] Mans Hulden, Vancouver, Canada: Association for Computational Linguistics, 2017, s. 110-113Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper describes the Stockholm University/University of Groningen (SU-RUG) system for the SIGMORPHON 2017 shared task on morphological inflection. Our system is based on an attentional sequence-to-sequence neural network model using Long Short-Term Memory (LSTM) cells, with joint training of morphological inflection and the inverse transformation, i.e. lemmatization and morphological analysis. Our system outperforms the baseline with a large margin, and our submission ranks as the 4th best team for the track we participate in (task 1, high resource).

  • 31.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Börstell, Carl
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap. Radboud University, Netherlands.
    Courtaux, Servane
    Visual Iconicity Across Sign Languages: Large-Scale Automated Video Analysis of Iconic Articulators and Locations2018Inngår i: Frontiers in Psychology, ISSN 1664-1078, E-ISSN 1664-1078, Vol. 9, artikkel-id 725Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We use automatic processing of 120,000 sign videos in 31 different sign languages to show a cross-linguistic pattern for two types of iconic form–meaning relationships in the visual modality. First, we demonstrate that the degree of inherent plurality of concepts, based on individual ratings by non-signers, strongly correlates with the number of hands used in the sign forms encoding the same concepts across sign languages. Second, we show that certain concepts are iconically articulated around specific parts of the body, as predicted by the associational intuitions by non-signers. The implications of our results are both theoretical and methodological. With regard to theoretical implications, we corroborate previous research by demonstrating and quantifying, using a much larger material than previously available, the iconic nature of languages in the visual modality. As for the methodological implications, we show how automatic methods are, in fact, useful for performing large-scale analysis of sign language data, to a high level of accuracy, as indicated by our manual error analysis.

  • 32.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Börstell, Carl
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Gärdenfors, Moa
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Universal Dependencies for Swedish Sign Language2017Inngår i: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa / [ed] Jörg Tiedemann, Linköping: Linköping University Electronic Press, 2017, s. 303-308Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We describe the first effort to annotate a signed language with syntactic dependency structure: the Swedish Sign Language portion of the Universal Dependencies treebanks. The visual modality presents some unique challenges in analysis and annotation, such as the possibility of both hands articulating separate signs simultaneously, which has implications for the concept of projectivity in dependency grammars. Our data is sourced from the Swedish Sign Language Corpus, and if used in conjunction these resources contain very richly annotated data: dependency structure and parts of speech, video recordings, signer metadata, and since the whole material is also translated into Swedish the corpus is also a parallel text.

  • 33.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Börstell, Carl
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Wallin, Lars
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Enriching the Swedish Sign Language Corpus with Part of Speech Tags Using Joint Bayesian Word Alignment and Annotation Transfer2015Inngår i: Proceedings of the 20th Nordic Conference of Computational Linguistics: NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania / [ed] Beáta Megyesi, Linköping University Electronic Press, 2015, s. 263-268Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We have used a novel Bayesian model of joint word alignment and part of speech (PoS) annotation transfer to enrich the Swedish Sign Language Corpus with PoS tags. The annotations were then hand-corrected in order to both improve annotation quality for the corpus, and allow the empirical evaluation presented herein.

  • 34.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Grigonyte, Gintare
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Transparent text quality assessment with convolutional neural networks2017Inngår i: The Twelfth Workshop on Innovative Use of NLP for Building Educational Applications: Proceedings of the Workshop, Association for Computational Linguistics, 2017, s. 282-286Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present a very simple model for text quality assessment based on a deep convolutional neural network, where the only supervision required is one corpus of user-generated text of varying quality, and one contrasting text corpus of consistently high quality. Our model is able to provide local quality assessments in different parts of a text, which allows visual feedback about where potentially problematic parts of the text are located, as well as a way to evaluate which textual features are captured by our model. We evaluate our method on two corpora: a large corpus of manually graded student essays and a longitudinal corpus of language learner written production, and find that the text quality metric learned by our model is a fairly strong predictor of both essay grade and learner proficiency level.

  • 35.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Scherrer, Yves
    Tiedemann, Jörg
    Tang, Gongbo
    Nieminen, Tommi
    The Helsinki Neural Machine Translation System2017Inngår i: Proceedings of the Conference on Machine Translation (WMT): Shared Task Papers, Association for Computational Linguistics, 2017, Vol. 2, s. 338-347Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We introduce the Helsinki Neural Machine Translation system (HNMT) and how it is applied in the news translation task at WMT 2017, where it ranked first in both the human and automatic evaluations for English–Finnish. We discuss the successof English–Finnish translations and the overall advantage of NMT over a strong SMT baseline. We also discuss our sub-missions for English–Latvian, English–Chinese and Chinese–English.

  • 36.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Smolentzov, André
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Tyrefors Hinnerich, Björn
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Nationalekonomiska institutionen.
    Höglin, Erik
    Automated Essay Scoring for Swedish2013Inngår i: Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computational Linguistics, 2013, s. 42-47Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present the first system developed for automated grading of high school essays written in Swedish. The system uses standard text quality indicators and is able to compare vocabulary and grammar to large reference corpora of blog posts and newspaper articles. The system is evaluated on a corpus of 1 702 essays, each graded independently by the student’s own teacher and also in a blind re-grading process by another teacher. We show that our system’s performance is fair, given the low agreementbetween the two human graders, and furthermore show how it could improve efficiency in a practical setting where one seeks to identify incorrectly graded essays.

  • 37.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Tiedemann, Jörg
    Continuous multilinguality with language vectors2017Inngår i: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Short Papers / [ed] Mirella Lapata, Phil Blunsom, Alexander Koller, Association for Computational Linguistics, 2017, Vol. 2, s. 644-649Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Most existing models for multilingual natural language processing (NLP) treat language as a discrete category, and make predictions for either one language or the other. In contrast, we propose using continuous vector representations of language. We show that these can be learned efficiently with a character-based neural language model, and used to improve inference about language varieties not seen during training. In experiments with 1303 Bible translations into 990 different languages, we empirically explore the capacity of multilingual language models, and also show that the language vectors capture genetic relationships between languages.

  • 38.
    Östling, Robert
    et al.
    University of Helsinki, Finland.
    Tiedemann, Jörg
    University of Helsinki, Finland.
    Efficient word alignment with Markov Chain Monte Carlo2016Inngår i: Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, E-ISSN 1804-0462, nr 106, s. 125-146Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present efmaral, a new system for efficient and accurate word alignment using a Bayesian model with Markov Chain Monte Carlo (MCMC) inference. Through careful selection of data structures and model architecture we are able to surpass the fast_align system, commonly used for performance-critical word alignment, both in computational efficiency and alignment accuracy. Our evaluation shows that a phrase-based statistical machine translation (SMT) system produces translations of higher quality when using word alignments from efmaral than from fast_align, and that translation quality is on par with what is obtained using giza++, a tool requiring orders of magnitude more processing time. More generally we hope to convince the reader that Monte Carlo sampling, rather than being viewed as a slow method of last resort, should actually be the method of choice for the SMT practitioner and others interested in word alignment.

  • 39.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Compounding in a Swedish Blog Corpus2013Inngår i: Computer mediated discourse across languages / [ed] Laura Álvarez López, Charlotta Seiler Brylla & Philip Shaw, Stockholm: Acta Universitatis Stockholmiensis, 2013, s. 45-63Kapittel i bok, del av antologi (Fagfellevurdert)
1 - 39 of 39
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf