Ändra sökning
Avgränsa sökresultatet
1 - 46 av 46
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Andersson, Marta
    et al.
    Stockholms universitet, Humanistiska fakulteten, Engelska institutionen.
    Kurfali, Murathan
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    A sentiment-annotated dataset of English causal connectives2020Ingår i: Proceedings of the 14th Linguistic Annotation Workshop / [ed] Stefanie Dipper, Amir Zeldes, 2020, s. 24-33Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper investigates the semantic prosody of three causal connectives: due to, owing to and because of in seven varieties of the English language. While research in the domain of English causality exists, we are not aware of studies that would cover the domain of causal connectives in English. Our claim is that connectives such as because of link two arguments, (at least) one of which will include a phrase that contributes to the interpretation of the relation as positive or negative, and hence define the prosody of the connective used. As our results demonstrate, the majority of the prosodies identified are negative for all three connectives; the proportions are stable across the varieties of English studied, and contrary to our expectations, we find no significant differences between the functions of the connectives and discourse preferences. Further, we investigate whether automatizing the sentiment annotation procedure via a simple language-model based classifier is possible. The initial results highlights the complexity of the task and the need for complicated systems, probably aided with other related datasets to achieve reasonable performance.

    Ladda ner fulltext (pdf)
    fulltext
  • 2. Basirat, Ali
    et al.
    de Lhoneux, Miryam
    Kulmizev, Artur
    Kurfali, Murathan
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Nivre, Joakim
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Polyglot Parsing for One Thousand and One Languages (And Then Some)2019Konferensbidrag (Övrigt vetenskapligt)
  • 3. Berggren, Max
    et al.
    Karlgren, Jussi
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Parkvall, Mikael
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Inferring the location of authors from words in their texts2015Ingår i: Proceedings of the 20th Nordic Conference of Computational Linguistics: NODALIDA 2015 / [ed] Beáta Megyesi, Linköping: Linköping University Electronic Press, ACL Anthology , 2015, s. 211-218Konferensbidrag (Refereegranskat)
    Abstract [en]

    For the purposes of computational dialectology or other geographically bound text analysis tasks, texts must be annotated with their or their authors' location. Many texts are locatable but most have no ex- plicit annotation of place. This paper describes a series of experiments to determine how positionally annotated microblog posts can be used to learn location indicating words which then can be used to locate blog texts and their authors. A Gaussian distribution is used to model the locational qualities of words. We introduce the notion of placeness to describe how locational words are.

    We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating locational information in a centroid for each text gives the most useful results. The results are applied to data in the Swedish language.

  • 4. Bjerva, Johannes
    et al.
    Grigonyte, Gintare
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Plank, Barbara
    Neural Networks and Spelling Features for Native Language Identification2017Ingår i: The Twelfth Workshop on Innovative Use of NLP for Building Educational Applications: Proceedings of the Workshop, Association for Computational Linguistics, 2017, s. 235-239Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present the RUG-SU team's submission at the Native Language Identification Shared Task 2017. We combine several approaches into an ensemble, based on spelling error features, a simple neural network using word representations, a deep residual network using word and character features, and a system based on a recurrent neural network. Our best system is an ensemble of neural networks, reaching an F1 score of 0.8323. Although our system is not the highest ranking one, we do outperform the baseline by far.

    Ladda ner fulltext (pdf)
    fulltext
  • 5. Bjerva, Johannes
    et al.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Cross-lingual Learning of Semantic Textual Similarity with Multilingual Word Representations2017Ingår i: Proceedings of the 21st Nordic Conference on Computational Linguistics / [ed] Jörg Tiedemann, Linköping: Linköping University Electronic Press, 2017, s. 211-215, artikel-id 024Konferensbidrag (Refereegranskat)
    Abstract [en]

    Assessing the semantic similarity between sentences in different languages is challenging. We approach this problem by leveraging multilingual distributional word representations, where similar words in different languages are close to each other. The availability of parallel data allows us to train such representations on a large amount of languages. This allows us to leverage semantic similarity data for languages for which no such data exists. We train and evaluate on five language pairs, including English, Spanish, and Arabic. We are able to train wellperforming systems for several language pairs, without any labelled data for that language pair.

    Ladda ner fulltext (pdf)
    fulltext
  • 6. Bjerva, Johannes
    et al.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Han Veiga, Maria
    Tiedemann, Jörg
    Augenstein, Isabelle
    What Do Language Representations Really Represent?2019Ingår i: Computational linguistics - Association for Computational Linguistics (Print), ISSN 0891-2017, E-ISSN 1530-9312, Vol. 45, nr 2, s. 381-389Artikel i tidskrift (Övrigt vetenskapligt)
    Abstract [en]

    A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just as it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation similarity, whereas genetic relationships—a convenient benchmark used for evaluation in previous work—appears to be a confounding factor. Apart from implications about translation effects, we see this more generally as a case where NLP and linguistic typology can interact and benefit one another.

    Ladda ner fulltext (pdf)
    fulltext
  • 7.
    Börstell, Carl
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Hörberg, Thomas
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Distribution and duration of signs and parts of speech in Swedish Sign Language2016Ingår i: Sign Language and Linguistics, ISSN 1387-9316, E-ISSN 1569-996X, Vol. 19, nr 2, s. 143-196Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In this paper, we investigate frequency and duration of signs and parts of speech in Swedish Sign Language (SSL) using the SSL Corpus. The duration of signs is correlated with frequency, with high-frequency items having shorter duration than low-frequency items. Similarly, function words (e.g. pronouns) have shorter duration than content words (e.g. nouns). In compounds, forms annotated as reduced display shorter duration. Fingerspelling duration correlates with word length of corresponding Swedish words, and frequency and word length play a role in the lexicalization of fingerspellings. The sign distribution in the SSL Corpus shows a great deal of cross-linguistic similarity with other sign languages in terms of which signs appear as high-frequency items, and which categories of signs are distributed across text types (e.g. conversation vs. narrative). We find a correlation between an increase in age and longer mean sign duration, but see no significant difference in sign duration between genders.

  • 8.
    Börstell, Carl
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Iconic Locations in Swedish Sign Language: Mapping Form to Meaning with Lexical Databases2017Ingår i: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa / [ed] Jörg Tiedemann, Linköping: Linköping University Electronic Press, 2017, s. 221-225, artikel-id 026Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, we describe a method for mapping the phonological feature location of Swedish Sign Language (SSL) signs to the meanings in the Swedish semantic dictionary SALDO. By doing so, we observe clear differences in the distribution of meanings associated with different locations on the body. The prominence of certain locations for specific meanings clearly point to iconic mappings between form and meaning in the lexicon of SSL, which pinpoints modalityspecific properties of the visual modality.

    Ladda ner fulltext (pdf)
    fulltext
  • 9.
    Börstell, Carl
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Östling, Robert
    University of Helsinki, Finland.
    Visualizing Lects in a Sign Language Corpus: Mining Lexical Variation Data in Lects of Swedish Sign Language2016Ingår i: Workshop Proceedings: 7th Workshop on the Representation and Processing of Sign Languages: Corpus Mining / [ed] Eleni Efthimiou, Stavroula-Evita Fotinea, Thomas Hanke, Julie Hochgesang, Jette Kristoffersen, Johanna Mesch, Paris: ELRA , 2016, s. 13-18Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, we discuss the possibilities for mining lexical variation data across (potential) lects in Swedish Sign Language (SSL). The data come from the SSL Corpus (SSLC), a continuously expanding corpus of SSL, its latest release containing 43 307 annotated sign tokens, distributed over 42 signers and 75 time-aligned video and annotation files. After extracting the raw data from the SSLC annotation files, we created a database for investigating lexical distribution/variation across three possible lects, by merging the raw data with an external metadata file, containing information about the age, gender, and regional background of each of the 42 signers in the corpus. We go on to present a first version of an easy-to-use graphical user interface (GUI) that can be used as a tool for investigating lexical variation across different lects, and demonstrate a few interesting finds. This tool makes it easier for researchers and non-researchers alike to have the corpus frequencies for individual signs visualized in an instant, and the tool can easily be updated with future expansions of the SSLC.

    Ladda ner fulltext (pdf)
    fulltext
  • 10. Cap, Fabienne
    et al.
    Adesam, Yvonne
    Ahrenberg, Lars
    Borin, Lars
    Bouma, Gerlof
    Forsberg, Markus
    Kann, Viggo
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Smith, Aaron
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Nivre, Joakim
    SWORD: Towards Cutting-Edge Swedish Word Processing2016Ingår i: Proceedings of SLTC 2016, 2016Konferensbidrag (Refereegranskat)
    Abstract [en]

    Despite many years of research on Swedish language technology, there is still no well-documented standard for Swedish word processing covering the whole spectrum from low-level tokenization to morphological analysis and disambiguation. SWORD is a new initiative within the SWE-CLARIN consortium aiming to develop documented standards for Swedish word processing. In this paper, we report on a pilot study of Swedish tokenization, where we compare the output of six different tokenizers on four different text types. For one text type (Wikipedia articles), we also compare to the tokenization produced by six manual annotators.

    Ladda ner fulltext (pdf)
    fulltext
  • 11.
    Dalianis, Hercules
    et al.
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Östling, RobertStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.Weegar, RebeckaStockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.Wirén, MatsStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Special Issue of Selected Contributions from the Seventh Swedish Language Technology Conference (SLTC 2018)2019Proceedings (redaktörskap) (Övrigt vetenskapligt)
    Abstract [en]

    This Special Issue contains three papers that are extended versions of abstracts presented at the Seventh Swedish Language Technology Conference (SLTC 2018), held at Stockholm University 8–9 November 2018.1 SLTC 2018 received 34 submissions, of which 31 were accepted for presentation. The number of registered participants was 113, including both attendees at SLTC 2018 and two co-located workshops that took place on 7 November. 32 participants were internationally affiliated, of which 14 were from outside the Nordic countries. Overall participation was thus on a par with previous editions of SLTC, but international participation was higher.

  • 12.
    Ek, Adam
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Nilsson Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Grigonytė, Gintarė
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Gustafson Capková, Sofia
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Identifying Speakers and Addressees in Dialogues Extracted from Literary Fiction2018Ingår i: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) / [ed] Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga, European Language Resources Association, 2018, s. 817-824Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper describes an approach to identifying speakers and addressees in dialogues extracted from literary fiction, along with a dataset annotated for speaker and addressee. The overall purpose of this is to provide annotation of dialogue interaction between characters in literary corpora in order to allow for enriched search facilities and construction of social networks from the corpora. To predict speakers and addressees in a dialogue, we use a sequence labeling approach applied to a given set of characters. We use features relating to the current dialogue, the preceding narrative, and the complete preceding context. The results indicate that even with a small amount of training data, it is possible to build a fairly accurate classifier for speaker and addressee identification across different authors, though the identification of addressees is the more difficult task.

    Ladda ner fulltext (pdf)
    fulltext
  • 13. Kurfali, Murathan
    et al.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    A distantly supervised Grammatical Error Detection/Correction system for Swedish2023Ingår i: Proceedings of the 12th Workshop on NLP for Computer Assisted Language Learning, 2023, s. 35-39Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper presents our submission to the first Shared Task on Multilingual Grammatical Error Detection (MultiGED-2023). Our method utilizes a transformer-based sequence-to-sequence model, which was trained on a synthetic dataset consisting of 3.2 billion words. We adopt a distantly supervised approach, with the training process relying exclusively on the distribution of language learners' errors extracted from the annotated corpus used to construct the training data. In the Swedish track, our model ranks fourth out of seven submissions in terms of the target F0.5 metric, while achieving the highest precision. These results suggest that our model is conservative yet remarkably precise in its predictions.

    Ladda ner fulltext (pdf)
    fulltext
  • 14.
    Kurfali, Murathan
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Disambiguation of Potentially Idiomatic Expressions with Contextual Embeddings2020Ingår i: Proceedings of the Joint Workshop on MultiwordExpressions and Electronic Lexicons Proceedings of theWorkshop (MWE-LEX 2020) / [ed] Stella Markantonatou, John Mccrae, Jelena Mitrović, Carole Tiberiu, Carlos Ramisch, Ashwini Vaidya, Petya Osenova, Agata Savary, 2020, s. 85-94Konferensbidrag (Refereegranskat)
    Ladda ner fulltext (pdf)
    fulltext
  • 15.
    Kurfali, Murathan
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Let’s be explicit about that: Distant supervision for implicit discourse relation classification via connective prediction2021Konferensbidrag (Refereegranskat)
    Abstract [en]

    In implicit discourse relation classification, we want to predict the relation between adjacent sentences in the absence of any overt discourse connectives. This is challenging even for humans, leading to shortage of annotated data, a fact that makes the task even more difficult for supervised machine learning approaches. In the current study, we perform implicit discourse relation classification without relying on any labeled implicit relation. We sidestep the lack of data through explicitation of implicit relations to reduce the task to two sub-problems: language modeling and explicit discourse relation classification, a much easier problem. Our experimental results show that this method can even marginally outperform the state-of-the-art, in spite of being much simpler than alternative models of comparable performance. Moreover, we show that the achieved performance is robust across domains as suggested by the zero-shot experiments on a completely different domain. This indicates that recent advances in language modeling have made language models sufficiently good at capturing inter-sentence relations without the help of explicit discourse markers.

  • 16.
    Kurfali, Murathan
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Noisy Parallel Corpus Filtering through Projected Word Embeddings2019Ingår i: Proceedings of the Fourth Conference on Machine Translation (WMT), Association for Computational Linguistics, 2019, Vol. 3, s. 279-283Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present a very simple method for parallel text cleaning of low-resource languages, based on projection of word embeddings trained on large monolingual corpora in high-resource languages. In spite of its simplicity, we approach the strong baseline system in the downstream machine translation evaluation.

  • 17.
    Kurfali, Murathan
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Probing Multilingual Language Models for Discourse2021Konferensbidrag (Refereegranskat)
    Abstract [en]

    Pre-trained multilingual language models have become an important building block in multilingual natural language processing. In the present paper, we investigate a range of such models to find out how well they transfer discourse-level knowledge across languages. This is done with a systematic evaluation on a broader set of discourse-level tasks than has been previously been assembled. We find that the XLM-RoBERTa family of models consistently show the best performance, by simultaneously being good monolingual models and degrading relatively little in a zero-shot setting. Our results also indicate that model distillation may hurt the ability of cross-lingual transfer of sentence representations, while language dissimilarity at most has a modest effect. We hope that our test suite, covering 5 tasks with a total of 22 languages in 10 distinct families, will serve as a useful evaluation platform for multilingual performance at and beyond the sentence level. 

  • 18.
    Kurfali, Murathan
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Zero-shot transfer for implicit discourse relation classification2019Ingår i: 20th Annual Meeting of the Special Interest Group on Discourse and Dialogue: Proceedings of the Conference, 2019, s. 226-231Konferensbidrag (Refereegranskat)
    Abstract [en]

    Automatically classifying the relation between sentences in a discourse is a challenging task, in particular when there is no overt expression of the relation. It becomes even more challenging by the fact that annotated training data exists only for a small number of languages, such as English and Chinese. We present a new system using zero-shot transfer learning for implicit discourse relation classification, where the only resource used for the target language is unannotated parallel text. This system is evaluated on the discourse-annotated TEDMDB parallel corpus, where it obtains good results for all seven languages using only English training data.

  • 19.
    Kurfali, Murathan
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Sjons, Johan
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    A Multi-Word Expression Dataset for Swedish2020Ingår i: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille: European Language Resources Association (ELRA) , 2020, s. 4402-4409Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present a new set of 96 Swedish multi-word expressions annotated with degree of (non-)compositionality. In contrast to most previous compositionality datasets we also consider syntactically complex constructions and publish a formal specification of each expression. This allows evaluation of computational models beyond word bigrams, which have so far been the norm. Finally, we use the annotations to evaluate a system for automatic compositionality estimation based on distributional semantics. Our analysis of the disagreements between human annotators and the distributional model reveal interesting questions related to the perception of compositionality, and should be informative to future work in the area.

  • 20. Loftsson, Hrafn
    et al.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Tagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic2013Ingår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013) / [ed] Stephan Oepen; Kristin Hagen; Janne Bondi Johannessen, Sweden: Linköping University Electronic Press, 2013, s. 105-119Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper; we experiment with using Stagger; an open-source implementation of an Averaged Perceptron tagger; to tag Icelandic; a morphologically complex language. By adding languagespecific linguistic features and using IceMorphy; an unknown word guesser; we obtain state-of- the-art tagging accuracy of 92.82%. Furthermore; by adding data from a morphological database; and word embeddings induced from an unannotated corpus; the accuracy increases to 93.84%. This is equivalent to an error reduction of 5.5%; compared to the previously best tagger for Icelandic; consisting of linguistic rules and a Hidden Markov Model.

    Ladda ner fulltext (pdf)
    icestagger.pdf
  • 21.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Björkstrand, Thomas
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Grigonyté, Gintaré
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Gustafson-Capková, Sofia
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Mesch, Johanna
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Schönström, Krister
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för svenska som andraspråk för döva.
    Wallin, Lars
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    SWE-CLARIN partner presentation: Natural Language Processing Resources from the Department of Linguistics, Stockholm University2014Ingår i: The first Swedish national SWE-CLARIN workshop: LT-based e-HSS in Sweden – taking stock and looking ahead / [ed] Lars Borin, 2014Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    The aim of the CLARIN Research Infrastructure and SWE-CLARIN is to facilitate for scholars in the humanities and social sciences to access primary data in the form of natural language, and to provide tools for exploring, annotating and analysing these data. This paper gives an overview of the resources and tools developed at the Department of Linguistics at Stockholm University planned to be made available within the SWE-CLARIN project. The paper also outlines our collaborations with neighbouring areas in the humanities and social sciences where these resources and tools will be put to use.

    Ladda ner fulltext (pdf)
    "SWE-CLARIN partner presentation:.."
  • 22.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Modelling the informativeness and timing of non-verbal cues in parent–child interaction2016Ingår i: The 54th Annual Meeting of the Association for Computational Linguistics: Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning, Stroudsburg, PA, USA: Association for Computational Linguistics, 2016, s. 82-90Konferensbidrag (Refereegranskat)
    Abstract [en]

    How do infants learn the meanings of their first words? This study investigates the informativeness and temporal dynamics of non-verbal cues that signal the speaker's referent in a model of early word–referent mapping. To measure the information provided by such cues, a supervised classifier is trained on information extracted from a multimodally annotated corpus of 18 videos of parent–child interaction with three children aged 7 to 33 months. Contradicting previous research, we find that gaze is the single most informative cue, and we show that this finding can be attributed to our fine-grained temporal annotation. We also find that offsetting the timing of the non-verbal cues reduces accuracy, especially if the offset is negative. This is in line with previous research, and suggests that synchrony between verbal and non-verbal cues is important if they are to be perceived as causally related.

    Ladda ner fulltext (pdf)
    Modelling the informativeness and timing of non-verbal cues in parent–child interaction
  • 23.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Modelling the informativeness of different modalities in parent-child interaction2015Ingår i: Workshop on Extensive and Intensive Recordings of Children's Language Environment / [ed] Alex Cristia, Melanie Soderstrom, 2015Konferensbidrag (Refereegranskat)
  • 24.
    Sjons, Johan
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Hörberg, Thomas
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Bjerva, Johannes
    Articulation rate in Swedish child-directed speech increases as a function of the age of the child even when surprisal is controlled for2017Ingår i: / [ed] Francisco Lacerda, David House, Mattias Heldner, Joakim Gustafson, Sofia Strömbergsson, Marcin Włodarczak, The International Speech Communication Association (ISCA), 2017, s. 1794-1798Konferensbidrag (Refereegranskat)
    Abstract [en]

    In earlier work, we have shown that articulation rate in Swedish child-directed speech (CDS) increases as a function of the age of the child, even when utterance length and differences in articulation rate between subjects are controlled for. In this paper we show on utterance level in spontaneous Swedish speech that i) for the youngest children, articulation rate in CDS is lower than in adult-directed speech (ADS), ii) there is a significant negative correlation between articulation rate and surprisal (the negative log probability) in ADS, and iii) the increase in articulation rate in Swedish CDS as a function of the age of the child holds, even when surprisal along with utterance length and differences in articulation rate between speakers are controlled for. These results indicate that adults adjust their articulation rate to make it fit the linguistic capacity of the child.

    Ladda ner fulltext (pdf)
    fulltext
  • 25. Tjong Kim Sang, Erik
    et al.
    Bollmann, Marcel
    Boschker, Remko
    Casacuberta, Francisco
    Dietz, Feike
    Dipper, Stefanie
    Domingo, Miguel
    van der Goot, Robe
    van Koppen, Marjo
    Ljubešić, Nikola
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Petran, Florian
    Pettersson, Eva
    Scherrer, Yves
    Schraagen, Marijn
    Sevens, Leen
    Tiedemann, Jörg
    Vanallemeersch, Tom
    Zervanou, Kalliopi
    The CLIN27 Shared Task: Translating Historical Text to Contemporary Language for Improving Automatic Linguistic Annotation2017Ingår i: Computational Linguistics in the Netherlands Journal, ISSN 2211-4009, Vol. 7, s. 53-64Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The CLIN27 shared task evaluates the effect of translating historical text to modern text with the goal of improving the quality of the output of contemporary natural language processing tools applied to the text. We focus on improving part-of-speech tagging analysis of seventeenth-century Dutch. Eight teams took part in the shared task. The best results were obtained by teams employing character-based machine translation. The best system obtained an error reduction of 51% in comparison with the baseline of tagging unmodified text. This is close to the error reduction obtained by human translation (57%).

    Ladda ner fulltext (pdf)
    fulltext
  • 26.
    Wirén, Mats
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    N. Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Modelling the Informativeness of Non-Verbal Cues in Parent–Child Interaction2017Ingår i: Proceedings of Interspeech 2017 / [ed] Francisco Lacerda, David House, Mattias Heldner, Joakim Gustafson, Sofia Strömbergsson, Marcin Włodarczak, The International Speech Communication Association (ISCA), 2017, s. 2203-2207Konferensbidrag (Refereegranskat)
    Abstract [en]

    Non-verbal cues from speakers, such as eye gaze and hand positions, play an important role in word learning. This is consistent with the notion that for meaning to be reconstructed, acoustic patterns need to be linked to time-synchronous patterns from at least one other modality. In previous studies of a multimodally annotated corpus of parent–child interaction, we have shown that parents interacting with infants at the early word-learning stage (7–9 months) display a large amount of time-synchronous patterns, but that this behaviour tails off with increasing age of the children. Furthermore, we have attempted to quantify the informativeness of the different nonverbal cues, that is, to what extent they actually help to discriminate between different possible referents, and how critical the timing of the cues is. The purpose of this paper is to generalise our earlier model by quantifying informativeness resulting from non-verbal cues occurring both before and after their associated verbal references.

    Ladda ner fulltext (pdf)
    fulltext
  • 27.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik. University of Helsinki, Finland.
    A Bayesian model for joint word alignment and part-of-speech transfer2016Ingår i: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan: Association for Computational Linguistics, 2016, s. 620-629Konferensbidrag (Refereegranskat)
    Abstract [en]

    Current methods for word alignment require considerable amounts of parallel text to deliver accurate results, a requirement which is met only for a small minority of the world’s approximately 7,000 languages. We show that by jointly performing word alignment and annotation transfer in a novel Bayesian model, alignment accuracy can be improved for language pairs where annotations are available for only one of the languages—a finding which could facilitate the study and processing of a vast number of low-resource languages. We also present an evaluation where our method is used to perform single-source and multi-source part-of-speech transfer with 22 translations of the same text in four different languages. This allows us to quantify the considerable variation in accuracy depending on the specific source text(s) used, even with different translations into the same language.

    Ladda ner fulltext (pdf)
    fulltext
  • 28.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    A Construction Grammar Method for Disambiguating Swedish Compounds2010Ingår i: SLTC 2010 Workshop on Compounds and Multiword Expressions, 2010Konferensbidrag (Refereegranskat)
    Abstract [en]

    This study discusses the structure of Swedish compounds within the framework of Construction Grammar, and applies the result to Word Sense Disambiguation of compound components. A construction-based approach is shown to achieve significantly better results than a set of baselines.

    Ladda ner fulltext (pdf)
    FULLTEXT01
  • 29.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Bayesian Models for Multilingual Word Alignment2015Doktorsavhandling, monografi (Övrigt vetenskapligt)
    Abstract [en]

    In this thesis I explore Bayesian models for word alignment, how they can be improved through joint annotation transfer, and how they can be extended to parallel texts in more than two languages. In addition to these general methodological developments, I apply the algorithms to problems from sign language research and linguistic typology.

    In the first part of the thesis, I show how Bayesian alignment models estimated with Gibbs sampling are more accurate than previous methods for a range of different languages, particularly for languages with few digital resources available—which is unfortunately the state of the vast majority of languages today. Furthermore, I explore how different variations to the models and learning algorithms affect alignment accuracy.

    Then, I show how part-of-speech annotation transfer can be performed jointly with word alignment to improve word alignment accuracy. I apply these models to help annotate the Swedish Sign Language Corpus (SSLC) with part-of-speech tags, and to investigate patterns of polysemy across the languages of the world.

    Finally, I present a model for multilingual word alignment which learns an intermediate representation of the text. This model is then used with a massively parallel corpus containing translations of the New Testament, to explore word order features in 1001 languages.

    Ladda ner fulltext (pdf)
    fulltext
    Ladda ner (jpg)
    omslagsframsida
  • 30.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Bayesian Word Alignment for Massively Parallel Texts2014Ingår i: 14th Conference of the European Chapter of the Association for Computational Linguistics: Proceedings of the Conference (Volume 2: Short Papers), Stroudsberg: Association for Computational Linguistics, 2014, s. 123-127Konferensbidrag (Refereegranskat)
    Abstract [en]

    There has been a great amount of work done in the field of bitext alignment, but the problem of aligning words in massively parallel texts with hundreds or thousands of languages is largely unexplored. While the basic task is similar, there are also important differences in purpose, method and evaluation between the problems. In this work, I present a non-parametric Bayesian model that can be used for simultaneous word alignment in massively parallel corpora. This method is evaluated on a corpus containing 1144 translations of the New Testament.

    Ladda ner fulltext (pdf)
    eacl2014.pdf
  • 31.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Mot en mänskligare maskinöversättning2022Ingår i: LIVE and LEARN - Festschrift in honor of Lars Borin / [ed] Volodina, Elena and Dannélls, Dana and Berdicevskis, Aleksandrs and Forsberg, Markus and Virk, Shafqat, Göteborg: Göteborgs universitet, 2022, s. 171-173Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Abstract [en]

    Over the lifetime of Lars Borin, machine translation has made a gigantic leap -- from simple rule-based systems residing on vacuum tube computers, to the latest zero-shot translation systems.  The amount of text data used by modern systems can reach hundreds of billions of words, but is this really necessary? What is the lower limit on training data for a translation system?  Here I suggest a simple experiment, entirely without computers, that could go some way towards answering this question.

    Ladda ner fulltext (pdf)
    fulltext
  • 32.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Part of Speech Tagging: Shallow or Deep Learning?2018Ingår i: Northern European Journal of Language Technology (NEJLT), ISSN 2000-1533, Vol. 5, nr 1, s. 1-15Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Deep neural networks have advanced the state of the art in numerous fields, but they generally suffer from low computational efficiency and the level of improvement compared to more efficient machine learning models is not always significant. We perform a thorough PoS tagging evaluation on the Universal Dependencies treebanks, pitting a state-of-the-art neural network approach against UDPipe and our sparse structured perceptron-based tagger, efselab. In terms of computational efficiency, efselab is three orders of magnitude faster than the neural network model, while being more accurate than either of the other systems on 47 of 65 treebanks.

    Ladda ner fulltext (pdf)
    fulltext
  • 33.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Stagger: A modern POS tagger for Swedish2012Ingår i: Proceedings of SLTC 2012: The Fourth Swedish Language Technology Conference, SLTC , 2012, s. 83-84Konferensbidrag (Refereegranskat)
    Abstract [en]

    The field of Part of Speech (POS) tagging has made slow but steady progress during the last decade, though many of the new methods developed have not previously been applied to Swedish. I present a new system, based on the Averaged Perceptron algorithm and semi-supervised learning, that is more accurate than previous Swedish POS taggers. Furthermore, a new version of the Stockholm-Umeå Corpus is presented, whose more consistent annotation leads to significantly lower error rates for the POS tagger. Finally, a new, freely available annotated corpus of Swedish blog posts is presented and used to evaluate the tagger’s accuracy on this increasingly important genre. Details of the evaluation are presented throughout, to ensure easy comparison with future results.

    Ladda ner fulltext (pdf)
    Stagger
  • 34.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Stagger: an Open-Source Part of Speech Tagger for Swedish2013Ingår i: Northern European Journal of Language Technology (NEJLT), ISSN 2000-1533, Vol. 3, s. 1-18Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert andWeston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpus dataset, making it the best single part of speech tagging system reported for Swedish. Accuracy increases to 96.6% on the latest version of the corpus, where the annotation has been revised to increase consistency. Stagger is also evaluated on a new corpus of Swedish blog posts, investigating its out-of-domain performance.

    Ladda ner fulltext (pdf)
    stagger.pdf
  • 35.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Studying colexification through massively parallell corpora2016Ingår i: The Lexical Typology of Semantic Shifts / [ed] Päivi Juvonen, Maria Koptjevskaja-Tamm, Berlin: Walter de Gruyter, 2016, s. 157-176Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    Large-sample studies in lexical typology are limited by whatever lexical information is available or can be obtained for all the languages in the study. Various types of word lists, from simple Swadesh lists to large dictionaries, can be used for this purpose. Unfortunately, these resources often present only a very fragmentary view of a given language’s vocabulary. As a complement, we propose an additional source of lexical information: parallel texts. Books such as the New Testament have been translated into thousands of languages, and it is possible to automatically extract word lists from their vocabulary, which can then be applied to lexical typological studies. In particular, we focus on studying colexification using a sample of 1 001 different languages, based on 1 142 translations of the New Testament. We find that although the automatically extracted word lists contain errors, their quality can be sufficiently good to find real areal patterns, such as the ‘tree’/’fire’ colexification that is widespread in the Sahul area.

  • 36.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Svenska dialektkartor på sekunden2015Ingår i: Språkbruk, ISSN 0358-9293, Vol. 3, s. 10-13Artikel i tidskrift (Övrig (populärvetenskap, debatt, mm))
    Ladda ner fulltext (pdf)
    fulltext
  • 37.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Word order typology through multilingual word alignment2015Ingår i: The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: Proceedings of the Conference, Volume 2: Short Papers, 2015, s. 205-211Konferensbidrag (Refereegranskat)
    Abstract [en]

    With massively parallel corpora of hundreds or thousands of translations of the same text, it is possible to automatically perform typological studies of language structure using very large language samples. We investigate the domain of wordorder using multilingual word alignment and high-precision annotation transfer in a corpus with 1144 translations in 986 languages of the New Testament. Results are encouraging, with 86% to 96% agreementbetween our method and the manually created WALS database for a range of different word order features. Beyond reproducing the categorical data in WALS and extending it to hundreds of other languages, we also provide quantitative data for therelative frequencies of different word orders, and show the usefulness of this for language comparison. Our method has applications for basic research in linguistic typology, as well as for NLP tasks like transfer learning for dependency parsing, which has been shown to benefit from word order information.

    Ladda ner fulltext (pdf)
    Word order typology through multilingual word alignment
  • 38.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Bjerva, Johannes
    SU-RUG at the CoNLL-SIGMORPHON 2017 shared task: Morphological inflection with attentional sequence-to-sequence models2017Ingår i: Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection / [ed] Mans Hulden, Vancouver, Canada: Association for Computational Linguistics, 2017, s. 110-113Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper describes the Stockholm University/University of Groningen (SU-RUG) system for the SIGMORPHON 2017 shared task on morphological inflection. Our system is based on an attentional sequence-to-sequence neural network model using Long Short-Term Memory (LSTM) cells, with joint training of morphological inflection and the inverse transformation, i.e. lemmatization and morphological analysis. Our system outperforms the baseline with a large margin, and our submission ranks as the 4th best team for the track we participate in (task 1, high resource).

    Ladda ner fulltext (pdf)
    fulltext
  • 39.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Börstell, Carl
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap. Radboud University, Netherlands.
    Courtaux, Servane
    Visual Iconicity Across Sign Languages: Large-Scale Automated Video Analysis of Iconic Articulators and Locations2018Ingår i: Frontiers in Psychology, E-ISSN 1664-1078, Vol. 9, artikel-id 725Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We use automatic processing of 120,000 sign videos in 31 different sign languages to show a cross-linguistic pattern for two types of iconic form–meaning relationships in the visual modality. First, we demonstrate that the degree of inherent plurality of concepts, based on individual ratings by non-signers, strongly correlates with the number of hands used in the sign forms encoding the same concepts across sign languages. Second, we show that certain concepts are iconically articulated around specific parts of the body, as predicted by the associational intuitions by non-signers. The implications of our results are both theoretical and methodological. With regard to theoretical implications, we corroborate previous research by demonstrating and quantifying, using a much larger material than previously available, the iconic nature of languages in the visual modality. As for the methodological implications, we show how automatic methods are, in fact, useful for performing large-scale analysis of sign language data, to a high level of accuracy, as indicated by our manual error analysis.

    Ladda ner fulltext (pdf)
    fulltext
  • 40.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Börstell, Carl
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Gärdenfors, Moa
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Universal Dependencies for Swedish Sign Language2017Ingår i: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa / [ed] Jörg Tiedemann, Linköping: Linköping University Electronic Press, 2017, s. 303-308Konferensbidrag (Refereegranskat)
    Abstract [en]

    We describe the first effort to annotate a signed language with syntactic dependency structure: the Swedish Sign Language portion of the Universal Dependencies treebanks. The visual modality presents some unique challenges in analysis and annotation, such as the possibility of both hands articulating separate signs simultaneously, which has implications for the concept of projectivity in dependency grammars. Our data is sourced from the Swedish Sign Language Corpus, and if used in conjunction these resources contain very richly annotated data: dependency structure and parts of speech, video recordings, signer metadata, and since the whole material is also translated into Swedish the corpus is also a parallel text.

    Ladda ner fulltext (pdf)
    fulltext
  • 41.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Börstell, Carl
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Wallin, Lars
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Enriching the Swedish Sign Language Corpus with Part of Speech Tags Using Joint Bayesian Word Alignment and Annotation Transfer2015Ingår i: Proceedings of the 20th Nordic Conference of Computational Linguistics: NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania / [ed] Beáta Megyesi, Linköping University Electronic Press, 2015, s. 263-268Konferensbidrag (Refereegranskat)
    Abstract [en]

    We have used a novel Bayesian model of joint word alignment and part of speech (PoS) annotation transfer to enrich the Swedish Sign Language Corpus with PoS tags. The annotations were then hand-corrected in order to both improve annotation quality for the corpus, and allow the empirical evaluation presented herein.

    Ladda ner fulltext (pdf)
    Enriching the Swedish Sign Language Corpus with Part of Speech Tags Using Joint Bayesian Word Alignment and Annotation Transfer
  • 42.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Grigonyte, Gintare
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Transparent text quality assessment with convolutional neural networks2017Ingår i: The Twelfth Workshop on Innovative Use of NLP for Building Educational Applications: Proceedings of the Workshop, Association for Computational Linguistics, 2017, s. 282-286Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present a very simple model for text quality assessment based on a deep convolutional neural network, where the only supervision required is one corpus of user-generated text of varying quality, and one contrasting text corpus of consistently high quality. Our model is able to provide local quality assessments in different parts of a text, which allows visual feedback about where potentially problematic parts of the text are located, as well as a way to evaluate which textual features are captured by our model. We evaluate our method on two corpora: a large corpus of manually graded student essays and a longitudinal corpus of language learner written production, and find that the text quality metric learned by our model is a fairly strong predictor of both essay grade and learner proficiency level.

    Ladda ner fulltext (pdf)
    fulltext
  • 43.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Scherrer, Yves
    Tiedemann, Jörg
    Tang, Gongbo
    Nieminen, Tommi
    The Helsinki Neural Machine Translation System2017Ingår i: Proceedings of the Conference on Machine Translation (WMT): Shared Task Papers, Association for Computational Linguistics, 2017, Vol. 2, s. 338-347Konferensbidrag (Refereegranskat)
    Abstract [en]

    We introduce the Helsinki Neural Machine Translation system (HNMT) and how it is applied in the news translation task at WMT 2017, where it ranked first in both the human and automatic evaluations for English–Finnish. We discuss the successof English–Finnish translations and the overall advantage of NMT over a strong SMT baseline. We also discuss our sub-missions for English–Latvian, English–Chinese and Chinese–English.

    Ladda ner fulltext (pdf)
    fulltext
  • 44.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Smolentzov, André
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Tyrefors Hinnerich, Björn
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Nationalekonomiska institutionen.
    Höglin, Erik
    Automated Essay Scoring for Swedish2013Ingår i: Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computational Linguistics, 2013, s. 42-47Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present the first system developed for automated grading of high school essays written in Swedish. The system uses standard text quality indicators and is able to compare vocabulary and grammar to large reference corpora of blog posts and newspaper articles. The system is evaluated on a corpus of 1 702 essays, each graded independently by the student’s own teacher and also in a blind re-grading process by another teacher. We show that our system’s performance is fair, given the low agreementbetween the two human graders, and furthermore show how it could improve efficiency in a practical setting where one seeks to identify incorrectly graded essays.

    Ladda ner fulltext (pdf)
    W13-1705.pdf
  • 45.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Tiedemann, Jörg
    Continuous multilinguality with language vectors2017Ingår i: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Short Papers / [ed] Mirella Lapata, Phil Blunsom, Alexander Koller, Association for Computational Linguistics, 2017, Vol. 2, s. 644-649Konferensbidrag (Refereegranskat)
    Abstract [en]

    Most existing models for multilingual natural language processing (NLP) treat language as a discrete category, and make predictions for either one language or the other. In contrast, we propose using continuous vector representations of language. We show that these can be learned efficiently with a character-based neural language model, and used to improve inference about language varieties not seen during training. In experiments with 1303 Bible translations into 990 different languages, we empirically explore the capacity of multilingual language models, and also show that the language vectors capture genetic relationships between languages.

    Ladda ner fulltext (pdf)
    fulltext
  • 46.
    Östling, Robert
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik.
    Compounding in a Swedish Blog Corpus2013Ingår i: Computer mediated discourse across languages / [ed] Laura Álvarez López; Charlotta Seiler Brylla; Philip Shaw, Stockholm: Acta Universitatis Stockholmiensis, 2013, s. 45-63Kapitel i bok, del av antologi (Refereegranskat)
    Ladda ner fulltext (pdf)
    Compounding in a Swedish Blog Corpus
1 - 46 av 46
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf