Endre søk
Begrens søket
123 51 - 100 of 120
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 51.
    Nilsson Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    SUC-CORE: A Balanced Corpus Annotated with Noun Phrase Coreference2013Inngår i: Northern European Journal of Language Technology (NEJLT), ISSN 2000-1533, Vol. 3, nr 2, s. 19-39Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper describes SUC-CORE, a subset of the Stockholm Umeå Corpus and the Swedish Treebank annotated with noun phrase coreference. While most coreference annotated corpora consist of texts of similar types within related domains, SUC-CORE consists of both informative and imaginative prose and covers a wide range of literary genres and domains.This allows for exploration of coreference across different text types, but it also means that there are limited amounts of data within each type. Future work on coreference resolution for Swedish should include making more annotated data available for the research community.

  • 52.
    Nilsson Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    The MINGLE annotation scheme: Multimodal annotation of parent-child interation in a free play setting (version 1.0)2012Rapport (Annet vitenskapelig)
    Abstract [en]

    A cognitive model of language learning must be dialogue-driven and multimodal to reflect how parent and child interact, using words, eye gaze, and object manipulation. We present a scheme for multimodal annotation of parent-child interaction. The purpose is to add verbal and non-verbal annotation to a corpus of longitudinal video and sound recordings of parent-child dyads. In this guideline, we describe the transcription of adult and child speech and vocalizations, and the annotation of both empty-hand gestures and object-related actions by both adults and children.

  • 53.
    Nilsson Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    What is a corpus and why are corpora important tools?2013Konferansepaper (Annet vitenskapelig)
  • 54.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Björkstrand, Thomas
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Grigonyté, Gintaré
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Gustafson-Capková, Sofia
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Mesch, Johanna
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Schönström, Krister
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för svenska som andraspråk för döva.
    Wallin, Lars
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    SWE-CLARIN partner presentation: Natural Language Processing Resources from the Department of Linguistics, Stockholm University2014Inngår i: The first Swedish national SWE-CLARIN workshop: LT-based e-HSS in Sweden – taking stock and looking ahead / [ed] Lars Borin, 2014Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    The aim of the CLARIN Research Infrastructure and SWE-CLARIN is to facilitate for scholars in the humanities and social sciences to access primary data in the form of natural language, and to provide tools for exploring, annotating and analysing these data. This paper gives an overview of the resources and tools developed at the Department of Linguistics at Stockholm University planned to be made available within the SWE-CLARIN project. The paper also outlines our collaborations with neighbouring areas in the humanities and social sciences where these resources and tools will be put to use.

  • 55.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Byström, Emil
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    SUC-CORE: SUC 2.0 Annotated with NP Coreference2012Inngår i: Proceedings of the Fourth Swedish Language Technology Conference (SLTC), October 24-26, 2012, Lund / [ed] Pierre Nugues, 2012Konferansepaper (Fagfellevurdert)
    Abstract [en]

    SUC-CORE is a subset of Stockholm Umeå Corpus 2.0 and Swedish Treebank, annotated with noun phrase coreference. While most coreference annotated corpora consist of texts of similar types within related domains, SUC-CORE consists of both informative and imaginative prose and covers a wide range of literary genres and domains.

  • 56.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Grigonyté, Gintaré
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Trump säger det igen, igen och igen2017Inngår i: Språktidningen, ISSN 1654-5028, nr 2, s. 24-27Artikkel i tidsskrift (Annet (populærvitenskap, debatt, mm))
  • 57.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Gustafson Capková, Sofia
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    The Stockholm University Strindberg Corpus: Content and Possibilities2014Inngår i: Strindberg on International Stages/Strindberg in Translation / [ed] Roland Lysell, Cambridge: Cambridge Scholars Publishing, 2014Kapittel i bok, del av antologi (Annet vitenskapelig)
    Abstract [en]

    We have approached the works of August Strindberg from  a computational linguistic point of view, resulting in The Stockholm University Strindberg Corpus, consisting of seven of Strindberg's autobiographical works with linguistic annotation. The corpus is freely available for research. We use this corpus for three quantitative studies of Strindberg’s work: in the first, we describe the novels included in the corpus by keywords; in the second, we compare Strindberg’s use of emotionally charged words with selected prose of both his contemporaries and present-day authors; in the third, we explore the semantic prosody of KVINNA (“woman”) and MAN (“man”).

  • 58.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Gustafson-Capková, Sofia
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Stockholm University Strindberg Corpus: Contents and possibilities2012Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    The Stockholm University Strindberg Corpus (SUSC) consists of seven novels by August Strindberg annotated for parts-of-speech with morphological analysis and lemmas. The corpus is freely available.

    SUSC consists of approximately 400 000 tokens annotated for parts-of-speech, including morphological analysis and lemmas, using the Stockholm-Umeå Corpus tag set in PAROLE-format. The annotated texts have been converted to XML which makes the corpus searchable with corpus analysis tools such as Xaira. This allows for e.g., searching for concordances with a specific wordform, part-of-speech and/or lemma, for pattern matching, and collocation extraction.

    The current version of the corpus includes seven works which can be classified as autobiographical:

    • Tjänstekvinnans son (The son of a servant, 1886-87)
    • Han och hon (He and she, 1919)
    • Inferno (Inferno, 1897)
    • Legender and Jakob brottas (Legends and Jacob wrestles, 1898)
    • Fagervik och Skamsund (Fair haven and Foulstrand, 1902)
    • Ensam (Alone, 1903)

    We are aware of three other electronic collections of Strindberg’s works: Projekt Runeberg, Litteraturbanken and Språkbanken. While these are valuable resources, SUSC is an important addition because, unlike the first two, it is linguistically annotated, and unlike the third, the data is available for download and thus can be fully inspected and processed using the researcher’s software of choice. Even more importantly, researchers can add their analyses as new layers of annotation of the corpus.

  • 59.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Gustavsson, Lisa
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för fonetik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    En korpusstudie om multimodal synkroni i tidig ordinlärning2013Konferansepaper (Annet vitenskapelig)
    Abstract [sv]

    I denna studie undersöker vi synkroni i tidig multimodal interaktion mellan föräldrar och barn. Med synkroni menas här återkommande mönster eller strukturella regelbundenheter (vad gäller ord, prosodi, blickriktning, gester och handlingar) som kan reducera komplexitet i språkinlärning.

    Data består av inspelningar av fem longitudinella dyader med två barn (0;7-2;7 år) och deras föräldrar. Inspelningarna transkriberas och annoteras med grundtonsfrekvens, blickriktning, gester och hantering av objekt. Vi undersöker synkroni genom att studera samtliga omnämnanden av två valda objekt (två dockor). För varje omnämnande undersöks grundtonsfrekvens och om omnämnandet kombineras med att den vuxne/barnet tittar på, pekar mot eller rör objektet.

    Man tänker sig att barnet använder sig av grundläggande perceptuella processer för att ta fasta på mönster och regelbundenheter i interaktionen med den vuxne, både i den akustiska signalen men också i den fysiska omgivningen (Gogate & Hollich, 2010). Den vuxne är dessutom benägen att framhäva den språkliga strukturen i interaktion med barnet, t ex genom att den vuxne talar om ett objekt och samtidigt visar objektet för barnet eller låter barnet känna på objektet. Denna synkroniserade multimodala input blir en hjälp för barnet att strukturera och sortera talsignalen och göra kopplingar mellan ord och objekt. I den här studien vill vi försöka fånga den här typen av multimodal synkroni genom att studera två specifika målord och hur interaktionen ser ut just kring dessa ord. Vi tänker oss att regelbundenheter vad gäller prosodi, blickriktningar och gester kommer att vara mer synkroniserade när barnet är mindre och målorden nya, än när barnen är äldre och målorden bekanta.

    Studien är del av ett projekt där vi försöker förklara tidig språkinlärning utifrån generella sociala och kognitiva förmågor. Genom att studera tidig förälder-barn-interaktion vill vi undersöka hur språkliga konstruktioner växer fram, vilka funktioner de har och hur de korrelerar med andra stimuli i barnets omgivning.

    Gogate, L., Hollich, G. 2010. Invariance detection within an interactive system: A perceptual gateway to language development. Psychological Review 117(2), 496-516.

  • 60.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Multimodal annotation of parent-child interaction in a free-play setting2013Inngår i: Multimodal Corpora 2013: Beyond Audio and Video / [ed] J. Edlund, D. Heylen, P. Paggio, 2013Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper describes the verbal, non-verbal, and discourse annotation of a longitudinal corpus of parent-child interaction. The verbal annotation includes transcription of child-directed speech and child vocalizations. The non-verbal annotation describes gestures and objectrelatedactions by both parent and child. The verbal and non-verbal annotation is combined in discourse annotation that distinguishes initial from subsequent mentions, and further categorizes initial mentions depending on initiative.

  • 61.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Multimodal Annotation of Synchrony in Longitudinal Parent–Child Interaction2014Inngår i: MMC 2014 Multimodal Corpora: Combining applied and basic research targets: Workshop at LREC 2014, European Language Resources Association, 2014Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper describes the multimodal annotation of speech, gaze and hand movement in a corpus of longitudinal parent–child interaction,and reports results on synchrony, structural regularities which appear to be a key means for parents to facilitate learning of new conceptsto children. The results provide additional support for our previous finding that parents display decreasing synchrony as a function ofthe age of the child.

  • 62.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Reference to Objects in Longitudinal Parent-Child Interaction2012Inngår i: Workshop on Language, Action and Perception (APL), 2012Konferansepaper (Fagfellevurdert)
    Abstract [en]

    A cognitive model of language learning needs to be dialogue-driven and multimodal to reflect how parent and child interact, using words, eye gaze, and object manipulation.

    In this paper, we present a scheme for multimodal annotation of parent-child interaction. We use this annotation for studying invariance across modalities. Our basic hypothesis is that perception of invariance (or synchrony) in multimodal patterns in auditory-visual speech is the device primarily used to reduce complexity in language learning.

    To this end, we have added verbal and non-verbal annotation to a corpus of longitudinal video and sound recordings of parent-child dyads. We use this data to try to determine if the amount of synchrony across modalities of parent-child interaction decreases as the child grows older and learns more language and gestures.

  • 63.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Variation sets in child-directed speech2015Inngår i: / [ed] Ellen Marklund, Iris-Corinna Schwarz, 2015Konferansepaper (Fagfellevurdert)
  • 64.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Eklund, Robert
    Linköping University.
    Disfluency in Child-Directed Speech2013Inngår i: Proceedings of Fonetik 2013: The XXVIth Annual Phonetics Meeting 12–13 June 2013, Linköping University Linköping, Sweden / [ed] Robert Eklund, Linköping: Department of Culture a nd Communication, Linköping University, Sweden , 2013, s. 57-60Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We report results from a longitudinal study of the rate and location of disfluencies in child-directed speech, using data for children between 0;6 and 2;9 years. We compare these results to adult-directed speech by the same speakers.

  • 65.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Modelling the informativeness and timing of non-verbal cues in parent–child interaction2016Inngår i: The 54th Annual Meeting of the Association for Computational Linguistics: Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning, Stroudsburg, PA, USA: Association for Computational Linguistics, 2016, s. 82-90Konferansepaper (Fagfellevurdert)
    Abstract [en]

    How do infants learn the meanings of their first words? This study investigates the informativeness and temporal dynamics of non-verbal cues that signal the speaker's referent in a model of early word–referent mapping. To measure the information provided by such cues, a supervised classifier is trained on information extracted from a multimodally annotated corpus of 18 videos of parent–child interaction with three children aged 7 to 33 months. Contradicting previous research, we find that gaze is the single most informative cue, and we show that this finding can be attributed to our fine-grained temporal annotation. We also find that offsetting the timing of the non-verbal cues reduces accuracy, especially if the offset is negative. This is in line with previous research, and suggests that synchrony between verbal and non-verbal cues is important if they are to be perceived as causally related.

  • 66.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Modelling the informativeness of different modalities in parent-child interaction2015Inngår i: Workshop on Extensive and Intensive Recordings of Children's Language Environment / [ed] Alex Cristia, Melanie Soderstrom, 2015Konferansepaper (Fagfellevurdert)
  • 67.
    Nilsson, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Borin, Lars
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Living off the land:The Web as a source of practice texts for learners of less prevalent languages2002Inngår i: Third International Conference on Language Resources and Evaluation. Proceedings. Vol II. Las Palmas, Spain: ELRA. 2002., 2002, s. 411-418Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This study focuses on how to automatically locate text sources published on the World Wide Web in order to produce adequate and upto-date learning materials for second language learners of Nordic languages. The Web is an excellent source of authentic text materials.However, the large amount of information available on the Web makes search services necessary. Hence, we are developing Squirrel, aprototype Web meta-search service, described in this paper, which collects text material in the Nordic languages according to language,topic and difficulty level. Our primary target group consists of exchange students to Nordic institutions of higher education, and theirlanguage teachers, although in the longer perspective, we would also like to be able to do something for minority language communities. We describe the basic implementation of Squirrel, and present preliminary results from trying it out. Finally we discuss the (lack of) Web resources in less prevalent languages, and how we imagine that applications like Squirrel could fit into a second or foreign languagelearning situation.

  • 68.
    Nilsson, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Hjelm, Hans
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Using Semantic Features Derived from Word-Space Models for Swedish Coreference Resolution2009Inngår i: Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. NEALT Proceedings Series, Vol. 4 (2009), 134-141. / [ed] Kristiina Jokinen and Eckhard Bick, Tartu, Estonia: Northern European Association for Language Technology (NEALT) , 2009, s. 134-141Konferansepaper (Fagfellevurdert)
  • 69.
    Nilsson, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Hjelm, Hans
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Oxhammar, Henrik
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    SUiS - Cross-language Ontology-driven Information Retrieval in a Restricted Domain2006Inngår i: Proceedings of the 15th NODALIDA conference, Joensuu 2005, 2006Konferansepaper (Fagfellevurdert)
  • 70.
    Nilsson, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Malmgren, Aisha
    Towards automatic recognition of product names: an exploratory study of brand names in economic texts2006Inngår i: Proceedings of the 15th NODALIDA conference, Joensuu 2005 / [ed] Stefan Werner, Joensuu: Ling@JoY : University of Joensuu electronic publications in linguistics and language technology 1 , 2006Konferansepaper (Fagfellevurdert)
  • 71.
    Parkvall, Mikael
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Källgren, Gunnel
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Kreolspråk över alla gränser1997Inngår i: Forskning och framsteg, ISSN 0015-7937, nr 2, s. 38-43Artikkel i tidsskrift (Annet (populærvitenskap, debatt, mm))
  • 72.
    Parkvall, Mikael
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Källgren, Gunnel
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Pidgin pelastaa ummikot, sitten kehitty kreolikieli1997Inngår i: Tiede 2000, nr 4, s. 56-60Artikkel i tidsskrift (Annet (populærvitenskap, debatt, mm))
  • 73.
    Persson, Peter
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Starved neural learning: Morpheme segmentation using low amounts of data2018Independent thesis Basic level (degree of Bachelor), 10 poäng / 15 hpOppgave
    Abstract [en]

    Automatic morpheme segmentation as a field has been dominated by unsupervised methods since its inception. Partly due to theoretical motivations, but also due to resource constraints. Given the success neural network methods have shown on a wide variety of field in later years, it would seem compelling to apply these methods to the morpheme segmentation field. This study explores the efficacy of modern neural networks, specifically convolutional neural networks and Bi-directional LSTM networks, on the morpheme segmentation task in a resource low setting to determine their viability as contenders with previous unsupervised, minimally supervised, and semi-supervised systems in the field. One architecture of each type is implemented and trained on a new gold standard data set and the results are compared to previously established methods. A qualitative error analysis of the architectures’ segmentations is also performed. The study demonstrates that a BLSTM system can be trained with minimal effort to produce a proof of concept solution at low levels of training data and suggests that BLSTM methods may be a fruitful direction for further research in this field.

  • 74.
    Rinaldi, Fabio
    et al.
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    Clematide, Simon
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    Hafner, Simon
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    Schneider, Gerold
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    Grigonyte, Gintare
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Romacker, Martin
    Novartis Pharma AG, NIBR-IT, Text Mining Services, Basel, Switzerland.
    Vachon, Therese
    Novartis Pharma AG, NIBR-IT, Text Mining Services, Basel, Switzerland.
    Using the OntoGene pipeline for the triage task of BioCreative 20122013Inngår i: Database: The Journal of Biological Databases and Curation, ISSN 1758-0463, E-ISSN 1758-0463, ISSN 1758-0463Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In this article, we describe the architecture of the OntoGene Relation mining pipeline and its application in the triage task of BioCreative 2012. The aim of the task is to support the triage of abstracts relevant to the process of curation of the Comparative Toxicogenomics Database. We use a conventional information retrieval system (Lucene) to provide a baseline ranking, which we then combine with information provided by our relation mining system, in order to achieve an optimized ranking. Our approach additionally delivers domain entities mentioned in each input document as well as candidate relationships, both ranked according to a confidence score computed by the system. This information is presented to the user through an advanced interface aimed at supporting the process of interactive curation. Thanks, in particular, to the high-quality entity recognition, the OntoGene system achieved the best overall results in the task.

  • 75. Rosén, Dan
    et al.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Volodina, Elena
    Error Coding of Second-Language Learner Texts Based on Mostly Automatic Alignment of Parallel Corpora2018Inngår i: CLARIN Annual Conference 2018: Proceedings / [ed] Inguna Skadina, Maria Eskevich, 2018, s. 181-184Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Error coding of second-language learner text, that is, detecting, correcting and annotating errors, is a cumbersome task which in turn requires interpretation of the text to decide what the errors are. This paper describes a system with which the annotator corrects the learner text by editing it prior to the actual error annotation. During the editing, the system automatically generates a parallel corpus of the learner and corrected texts. Based on this, the work of the annotator consists of three independent tasks that are otherwise often conflated in error coding: correcting the learner text, repairing inconsistent alignments, and performing the actual error annotation.

  • 76.
    Samuelsson, Yvonne
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Dickinson, Markus
    Department of Linguistics, Indiana University .
    Consistency Checking for Treebank Alignment2010Inngår i: Proceedings of the Fourth Linguistic Annotation Workshop / [ed] Nianwen Xue and Massimo Poesio, Association for Computational Linguistics , 2010, s. 38-46Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper explores ways to detect errors in aligned corpora, using very little technology. In the first method, applicableto any aligned corpus, we consider alignment as a string-to-string mapping. Treating the target string as a label, we examine each source string to find inconsistencies in alignment. Despite setting up the problem on a par with grammatical annotation, we demonstrate crucial differences in sorting errors from legitimate variations. The second method examines phrase nodes which are predicted to be aligned, based on the alignment of their yields. Both methods are effective in complementary ways.

  • 77.
    Samuelsson, Yvonne
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Volk, Martin
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Alignment Tools for Parallel Treebanks2007Inngår i: Data Structures for Linguistic Resources and Applications: Proceedings of the Biennial GLDV Conference 2007, 2007Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper reports about our efforts in creating a tri-lingual parallel treebank. The focal points are consistency checking and all aspects of sub-sentential alignment. We discuss the alignment guidelines, the importance of quality checks, and special alignment problems. Then we look at alignment algorithms and alignment visualization tools and we compare our own TreeAligner with other alignment tools. Our constituent structure treebanks contain just over 1,000 sentences and around 18,000 tokens in each language.

  • 78.
    Samuelsson, Yvonne
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Volk, Martin
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Automatic Phrase Alignment: Using Statistical N-Gram Alignment for Syntactic Phrase Alignment2007Inngår i: Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories (TLT 2007) / [ed] Koenraad De Smedt, Jan Hajič and Sandra Kübler, Northern European Association for Language Technology (NEALT) , 2007, s. 139-150Konferansepaper (Fagfellevurdert)
    Abstract [en]

    A parallel treebank consists of syntactically annotated sentences in two or more languages, taken from translated documents. These parallel sentences are linked through alignment. This paper explores the use of word n-gram alignment, computed for statistical machine translation, to create syntactic phrase alignment. We achieve a weighted F0.5-score of over 65%.

  • 79.
    Schneider, Gerold
    et al.
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    Clematide, Simon
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    Ellendorf, Tilia
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    Tuggener, Don
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    Rinaldi, Fabio
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    Grigonyté, Gintaré
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    UZH in the BioNLP 2013 GENIA Shared Task2013Inngår i: Proceedings of the BioNLP Shared Task 2013 Workshop, 2013, s. 116-120Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We describe a biological event detec- tion method implemented for the Genia Event Extraction task of BioNLP 2013. The method relies on syntactic dependency relations provided by a general NLP pipeline, supported by statistics derived from Maximum Entropy models for candidate trigger words, for potential arguments, and for argument frames. 

  • 80. Schneider, Gerold
    et al.
    Grigonyte, Gintare
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    From surprisal to tagging and syntactic parsing: measuring the idiom and syntax principle2014Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We introduced surprisal as abstraction from lexical bundles to lexical bundleness. There are forces beyond lexical bundles: on the one hand word-sequence abstractions to word classes, on the other hand the syntax principle (SSP) in contradistinction to the idiom principle (SIP). We ultimately aim for a model of their mutual influence (Sinclair 1991).We motivate the use of models, then abstract to word-class models using a part-of-speech tagger, and to syntactic models, using a large-scale parser. Part-of-speech taggers assign word-classes based on sequences. They typically achieve high accuracy. Areas of low accuracy and low tagger confidence for word class assignment indicate low model fit, and thus often high entropy, lack of formulaic sequences. Tagger model fit can be used as measure of morphosyntactic bundleness.Although creative language (SSP) is rarer, it needs to be respected. We thus also use a syntactic parser language model (Schneider 2008) which combines SSP in form of a hand-written competence grammar and SIP as probabilistic performance disambiguation, paying tribute to Hoey (2005)'s insights on lexical priming. We show that parser model fit is lower on low-level L2 texts, as we can expect according to Pawley and Syder (1983). Finally, we introduce measures of syntactic surprisal.

  • 81. Schneider, Gerold
    et al.
    Grigonyte, Gintare
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Statistical sequence and parsing models for descriptive linguistics and psycholinguistics2016Inngår i: New Approaches to English Linguistics: Building bridges / [ed] Olga Timofeeva, Anne-Christine Gardner, Alpo Honkapohja, Sarah Chevalier, John Benjamins Publishing Company, 2016, s. 281-320Kapittel i bok, del av antologi (Fagfellevurdert)
    Abstract [en]

    This study shows that using computational linguistic models is beneficial for descriptive linguistics and psycholinguistics. It applies two models to various English genres and learner language: 1) surprisal and 2) a syntactic parser, allowing us to investigate the role of ambiguity and the interplay between idiom and syntax principles. We find that surprisal and ambiguity are higher for learner language, while parser scores and model fit are lower. In addition, the random application of alternations leads to more ambiguous sentences. Failures to generate optimal orderings in the sense of relevance theory, such as nonnative-like utterances by language learners exhibit, increase processing load, both for human and automatic processors. As human and automatic parsing difficulties correlate, we suggest syntactic parsers as psycholinguistic processing models.

  • 82.
    Schneider, Gerold
    et al.
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    Grigonyté, Gintaré
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Using an automatic parser as a language learner model2013Konferansepaper (Fagfellevurdert)
  • 83.
    Sjons, Johan
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Automatic Induction of Word Classes in Swedish Sign Language2013Independent thesis Advanced level (degree of Master (One Year)), 10 poäng / 15 hpOppgave
    Abstract [en]

    Identifying word classes is an important part of describing a language. Research about sign languages often lack distinctions crucial for identifying word classes, e.g. the difference between sign and gesture. Additionally, sign languages typically lack written form, something that often constrains quantitative research on sign language to the use of glosses translated to the spoken language in the area. In this thesis, such glosses have been extracted from The Swedish Sign Language Corpus. The glosses were mapped to utterances based on Swedish translations in the corpus, and these utterances served as input data to a word space model, producing a co-occurence matrix. This matrix was clustered with the K-means algorithm. The extracted utterances were also clustered with the Brown algorithm. By using V-measure, the clusters were compared to a gold standard annotated manually with word classes. The Brown algorithm performs significantly better in inducing word classes than a random baseline. This work shows that utilizing unsupervised learning is a feasible approach for doing research on word classes in Swedish Sign Language. However, future studies of this kind should employ a deeper linguistic analysis of the language as a part of choosing the algorithms.

  • 84.
    Sjons, Johan
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Hörberg, Thomas
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Articulation rate in child-directed speech increases as a function of child age2016Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    It has been shown that articulation rate (AR), the number of produced linguistic units per time unit with pauses excluded, is lower in child-directed speech (CDS) than in adult-directed speech (ADS). The present study is the first corpus-based longitudinal study to investigate AR in Swedish CDS as a function of child age while also control-ling for utterance length in terms of number of syllables and for individual differences between speakers. AR in transcribed utterances of 7 parents directed at their respective child during different ages was analyzed with mixed effects modeling. Results show a signif-icantly higher AR in longer than in shorter utterances and a significant increase in AR as a function of infant age. Future studies include comparison with entropy-based measures.

  • 85.
    Sjons, Johan
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Hörberg, Thomas
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Bjerva, Johannes
    Articulation rate in Swedish child-directed speech increases as a function of the age of the child even when surprisal is controlled for2017Inngår i: / [ed] Francisco Lacerda, David House, Mattias Heldner, Joakim Gustafson, Sofia Strömbergsson, Marcin Włodarczak, The International Speech Communication Association (ISCA), 2017, s. 1794-1798Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In earlier work, we have shown that articulation rate in Swedish child-directed speech (CDS) increases as a function of the age of the child, even when utterance length and differences in articulation rate between subjects are controlled for. In this paper we show on utterance level in spontaneous Swedish speech that i) for the youngest children, articulation rate in CDS is lower than in adult-directed speech (ADS), ii) there is a significant negative correlation between articulation rate and surprisal (the negative log probability) in ADS, and iii) the increase in articulation rate in Swedish CDS as a function of the age of the child holds, even when surprisal along with utterance length and differences in articulation rate between speakers are controlled for. These results indicate that adults adjust their articulation rate to make it fit the linguistic capacity of the child.

  • 86.
    Smith, Kelly
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    The Rumble in the Disambiguation Jungle: Towards the comparison of a traditional word sense disambiguation system with a novel paraphrasing system2011Independent thesis Basic level (degree of Bachelor), 15 poäng / 22,5 hpOppgave
    Abstract [en]

    Word sense disambiguation (WSD) is the process of computationally identifying and labeling poly- semous words in context with their correct meaning, known as a sense. WSD is riddled with various obstacles that must be overcome in order to reach its full potential. One of these problems is the aspect of the representation of word meaning. Traditional WSD algorithms make the assumption that a word in a given context has only one meaning and therfore can return only one discrete sense. On the other hand, a novel approach is that a given word can have multiple senses. Studies on graded word sense assignment (Erk et al., 2009) as well as in cognitive science (Hampton, 2007; Murphy, 2002) support this theory. It has therefore been adopted in a novel, paraphrasing system which performs word sense disambiguation by returning a probability distribution over potential paraphrases (in this case synonyms) of a given word. However, it is unknown how well this type of algorithm fares against the traditional one. The current study thus examines if and how it is possible to make a comparison of the two. A method of comparison is evaluated and subsequently rejected. Reasons for this as well as suggestions for a fair and accurate comparison are presented.

  • 87.
    Smolentzov, Andre
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Automated Essay Scoring: Scoring Essays in Swedish2013Independent thesis Basic level (degree of Bachelor), 10 poäng / 15 hpOppgave
    Abstract [en]

    Good writing skills are essential in the education system at all levels. However, the evaluation of essays is labor intensive and can entail a subjective bias. Automated Essay Scoring (AES) is a tool that may be able to save teacher time and provide more objective evaluations. There are several successful AES systems for essays in English that are used in large scale tests. Supervised machine learning algorithms are the core component in developing these systems.

    In this project four AES systems were developed and evaluated. The AES systems were based on standard supervised machine learning software, i.e., LDAC, SVM with RBF kernel, polynomial kernel and Extremely Randomized Trees. The training data consisted of 1500 high school essays that had been scored by the students' teachers and blind raters. To evaluate the AES systems, the agreement between blind raters' scores and AES scores was compared to agreement between blind raters' and teacher scores. On average, the agreement between blind raters and the AES systems was better than between blind raters and teachers. The AES based on LDAC software had the best agreement with a quadratic weighted kappa value of 0.475. In comparison, the teachers and blind raters had a value of 0.391. However the AES results do not meet the required minimum agreement of a quadratic weighted kappa of 0.7 as defined by the US based nonprofit organization Educational Testing Services.

  • 88. Strömbergsson, Sofia
    et al.
    Edlund, Jens
    Götze, Jana
    Nilsson Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Approximating phonotactic input in children’s linguistic environments from orthographic transcripts2017Inngår i: Proceedings of Interspeech 2017 / [ed] Francisco Lacerda, David House, Mattias Heldner, Joakim Gustafson, Sofia Strömbergsson, Marcin Włodarczak, The International Speech Communication Association (ISCA), 2017, s. 2214-2217Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Child-directed spoken data is the ideal source of support for claims about children’s linguistic environments. However, phonological transcriptions of child-directed speech are scarce,compared to sources like adult-directed speech or text data. Acquiring reliable descriptions of children’s phonological environments from more readily accessible sources would mean considerable savings of time and money. The first step towards this goal is to quantify the reliability of descriptions derived from such secondary sources. We investigate how phonological distributions vary across different modalities (spoken vs. written), and across the age of the intended audience (children vs. adults). Using a previously unseen collection of Swedish adult- and child-directed spoken and written data, we combine lexicon look-up and grapheme-to-phonemeconversion to approximate phonological characteristics. The analysis shows distributional differences across datasets both for single phonemes and for longer phoneme sequences. Some of these are predictably attributed to lexical and contextual characteristics of text vs. speech.The generated phonological transcriptions are remarkably reliable. The differences in phonological distributions between child-directed speech and secondary sources highlight a need for compensatory measures when relying on written data or onadult-directed spoken data, and/or for continued collection ofactual child-directed speech in research on children’s language environments.

  • 89. Strömbergsson, Sofia
    et al.
    Nilsson Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Götze, Jana
    Edlund, Jens
    Simulating Speech Errors in Swedish, Norwegian and English2018Konferansepaper (Annet vitenskapelig)
  • 90. Tjong Kim Sang, Erik
    et al.
    Bollmann, Marcel
    Boschker, Remko
    Casacuberta, Francisco
    Dietz, Feike
    Dipper, Stefanie
    Domingo, Miguel
    van der Goot, Robe
    van Koppen, Marjo
    Ljubešić, Nikola
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Petran, Florian
    Pettersson, Eva
    Scherrer, Yves
    Schraagen, Marijn
    Sevens, Leen
    Tiedemann, Jörg
    Vanallemeersch, Tom
    Zervanou, Kalliopi
    The CLIN27 Shared Task: Translating Historical Text to Contemporary Language for Improving Automatic Linguistic Annotation2017Inngår i: Computational Linguistics in the Netherlands Journal, ISSN 2211-4009, Vol. 7, s. 53-64Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The CLIN27 shared task evaluates the effect of translating historical text to modern text with the goal of improving the quality of the output of contemporary natural language processing tools applied to the text. We focus on improving part-of-speech tagging analysis of seventeenth-century Dutch. Eight teams took part in the shared task. The best results were obtained by teams employing character-based machine translation. The best system obtained an error reduction of 51% in comparison with the baseline of tagging unmodified text. This is close to the error reduction obtained by human translation (57%).

  • 91. Utka, A.
    et al.
    Grigonyté, GintaréStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.Kapočiūtė-Dzikienė, J.Vaičenonienė, J.
    Human Language Technologies – The Baltic Perspective: Proceedings of the Sixth International Conference Baltic HLT 20142014Konferanseproceedings (Fagfellevurdert)
    Abstract [en]

    This book contains papers from the Fourth International Conference on Human Language Technologies - the Baltic Perspective (Baltic HLT 2010), held in Riga in October 2010. This conference is the latest in a series which provides a forum for sharing recent advances in human language processing, and promotes cooperation between the computer science and linguistics communities of the Baltic countries and the rest of the world. Bringing together scientists, developers, providers and users, the conference is an opportunity to exchange information, discuss problems, find new synergies and promote initiatives for international cooperation.

    The 32 papers collected have been submitted by 77 authors from 11 countries, after review by an international program committee. They cover a wide range of research topics in corpus linguistics, machine translation, speech technologies, semantics and other areas of HLT research. This proceedings reflects the current state of HLT in the Baltic countries and the work towards creating a Baltic linguistic infrastructure. Human Language Technologies – The Baltic Perspective is a useful and comprehensive repository of information and will facilitate further research and development of HLT in the Baltic region, and the creation of a pan-European research infrastructure of the language resources and technology.

  • 92. Volodina, Elena
    et al.
    Grigonyté, GintaréStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.Pilán, IldikóNilsson Björkenstam, KristinaStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.Borin, Lars
    Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 20162016Konferanseproceedings (Fagfellevurdert)
  • 93. Volodina, Elena
    et al.
    Megyesi, Beata
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Granstedt, Lena
    Prentice, Julia
    Reichenberg, Monica
    Sundberg, Gunlög
    Stockholms universitet, Humanistiska fakulteten, Institutionen för svenska och flerspråkighet, Svenska/Nordiska språk.
    A Friend in Need? Research agenda for electronic Second Language infrastructure2016Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this article, we describe the research and societal needs as well as ongoing efforts to shape Swedish as a Second Language (L2) infrastructure. Our aim is to develop an electronic research infrastructure that would stimulate empiric research into learners' language development by preparing data and developing language technology methods and algorithms that can successfully deal with deviations in the learner language.

  • 94. Volodina, Elena
    et al.
    Pilán, IldikóBorin, LarsGintare, GrigonyteStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.Nilsson Björkenstam, KristinaStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Proceedings of the Joint 6th Workshop on NLP for Computer Assisted Language Learning and 2nd Workshop on NLP for Research on Language Acquisition2017Konferanseproceedings (Fagfellevurdert)
    Abstract [en]

    For the second year in a row we brought two related themes of NLP for Computer-Assisted Language Learning and NLP for Language Acquisition together. The goal of organizing joint workshops is to provide a meeting place for researchers working on language learning issues including both empirical and experimental studies and NLP-based applications. The resulting volume covers a variety of topics from the two fields and - hopefully - showcases the challenges and achievements in the field.

    The seven papers in this volume cover native language identification in learner writings, using syntactic complexity development in language learner language to identify reading comprehension texts of appropriate level, exploring the potential of parallel corpora to predict mother-language specific problem areas for learners of another language, tools for learning languages - both well-resourced ones such as English as well as endangered or under-resourced ones such as Yakut and Võro, as well as exploring the potential of automatically identifying and correcting word-level errors in Swedish learner writing.

  • 95.
    Wikse Barrow, Carla
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för fonetik. Karolinska Institutet, Sweden.
    Nilsson Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Strömbergsson, Sofia
    Subjective ratings of age-of-acquisition: exploring issues of validity and rater reliability2019Inngår i: Journal of Child Language, ISSN 0305-0009, E-ISSN 1469-7602, Vol. 46, nr 2, s. 199-213Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This study aimed to investigate concerns of validity and reliability in subjective ratings of age-of-acquisition (AoA), through exploring characteristics of the individual rater. An additional aim was to validate the obtained AoA ratings against two corpora – one of child speech and one of adult speech – specifically exploring whether words over-represented in the child-speech corpus are rated with lower AoA than words characteristic of the adult-speech corpus. The results show that less than one-third of participating informants’ ratings are valid and reliable. However, individuals with high familiarity with preschool-aged children provide more valid and reliable ratings, compared to individuals who do not work with or have children of their own. The results further show a significant, age-adjacent difference in rated AoA for words from the two different corpora, thus strengthening their validity. The study provides AoA data, of high specificity, for 100 child-specific and 100 adult-specific Swedish words.

  • 96.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Language and Computers, Markus Dickinson, Chris Brew, Detmar Meurers, Wiley-Blackwell, 20132013Inngår i: Computational linguistics - Association for Computational Linguistics (Print), ISSN 0891-2017, E-ISSN 1530-9312, Vol. 39, nr 3, s. 777-780Artikkel, omtale (Annet vitenskapelig)
  • 97.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Review of "Web Corpus Construction" by Schäfer & Bildhauer2014Inngår i: Nordic Journal of Linguistics, ISSN 0332-5865, E-ISSN 1502-4717, Vol. 37, nr 3, s. 457-463Artikkel, omtale (Annet vitenskapelig)
  • 98.
    Wirén, Mats
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Matsson, Arild
    Rosén, Dan
    Volodina, Elena
    SVALA: Annotation of Second-Language Learner Text Based on Mostly Automatic Alignment of Parallel Corpora2019Inngår i: Selected papers from the CLARIN Annual Conference 2018, Pisa, 8-10 October 2018 / [ed] Inguna Skadina, Maria Eskevich, Linköping: Linköping University Electronic Press, 2019, s. 222-234, artikkel-id 023Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Annotation of second-language learner text is a cumbersome manual task which in turn requires interpretation to postulate the intended meaning of the learner’s language. This paper describes SVALA, a tool which separates the logical steps in this process while providing rich visual support for each of them. The first step is to pseudonymize the learner text to fulfil the legal and ethical requirements for a distributable learner corpus. The second step is to correct the text, which is carried out in the simplest possible way by text editing. During the editing, SVALA automatically maintains a parallel corpus with alignments between words in the learner source text and corrected text, while the annotator may repair inconsistent word alignments. Finally, the actual labelling of the corrections (the postulated errors) is performed. We describe the objectives, design and workflow of SVALA, and our plans for further development.

  • 99.
    Wirén, Mats
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    N. Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Modelling the Informativeness of Non-Verbal Cues in Parent–Child Interaction2017Inngår i: Proceedings of Interspeech 2017 / [ed] Francisco Lacerda, David House, Mattias Heldner, Joakim Gustafson, Sofia Strömbergsson, Marcin Włodarczak, The International Speech Communication Association (ISCA), 2017, s. 2203-2207Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Non-verbal cues from speakers, such as eye gaze and hand positions, play an important role in word learning. This is consistent with the notion that for meaning to be reconstructed, acoustic patterns need to be linked to time-synchronous patterns from at least one other modality. In previous studies of a multimodally annotated corpus of parent–child interaction, we have shown that parents interacting with infants at the early word-learning stage (7–9 months) display a large amount of time-synchronous patterns, but that this behaviour tails off with increasing age of the children. Furthermore, we have attempted to quantify the informativeness of the different nonverbal cues, that is, to what extent they actually help to discriminate between different possible referents, and how critical the timing of the cues is. The purpose of this paper is to generalise our earlier model by quantifying informativeness resulting from non-verbal cues occurring both before and after their associated verbal references.

  • 100.
    Wirén, Mats
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Nilsson Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Grigonytė, Gintarė
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Cortes, Elisabet Eir
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för fonetik.
    Longitudinal Studies of Variation Sets in Child-directed Speech2016Inngår i: The 54th Annual Meeting of the Association for Computational Linguistics: Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning, Stroudsburg, PA, USA: Association for Computational Linguistics, 2016, s. 44-52Konferansepaper (Fagfellevurdert)
    Abstract [en]

    One of the characteristics of child-directed speech is its high degree of repetitiousness. Sequences of repetitious utterances with a constant intention, variation sets, have been shown to be correlated with children’s language acquisition. To obtain a baseline for the occurrences of variation sets in Swedish, we annotate 18 parent–child dyads using a generalised definition according to which the varying form may pertain not just to the wording but also to prosody and/or non-verbal cues. To facilitate further empirical investigation, we introduce a surface algorithm for automatic extraction of variation sets which is easily replicable and language-independent. We evaluate the algorithm on the Swedish gold standard, and use it for extracting variation sets in Croatian, English and Russian. We show that the proportion of variation sets in child-directed speech decreases consistently as a function of children's age across Swedish, Croatian, English and Russian.

123 51 - 100 of 120
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf