Endre søk
Begrens søket
123 1 - 50 of 104
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1. Berggren, Max
    et al.
    Karlgren, Jussi
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Parkvall, Mikael
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Inferring the location of authors from words in their texts2015Inngår i: Proceedings of the 20th Nordic Conference of Computational Linguistics: NODALIDA 2015 / [ed] Beáta Megyesi, Linköping: Linköping University Electronic Press, ACL Anthology , 2015, s. 211-218Konferansepaper (Fagfellevurdert)
    Abstract [en]

    For the purposes of computational dialectology or other geographically bound text analysis tasks, texts must be annotated with their or their authors' location. Many texts are locatable but most have no ex- plicit annotation of place. This paper describes a series of experiments to determine how positionally annotated microblog posts can be used to learn location indicating words which then can be used to locate blog texts and their authors. A Gaussian distribution is used to model the locational qualities of words. We introduce the notion of placeness to describe how locational words are.

    We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating locational information in a centroid for each text gives the most useful results. The results are applied to data in the Swedish language.

  • 2. Bielinskiene, Agne
    et al.
    Boizou, Loic
    Grigonyte, Gintare
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Kovalevskaite, Jolanta
    Markievicz, Irena
    Rimkute, Erika
    Utka, Andrius
    Viliunas, Giedrius
    Švietimo ir mokslo terminų žodynas (Dictionary of Terms of Science and Education)2013Annet (Annet vitenskapelig)
  • 3. Bielinskiene, Agne
    et al.
    Boizou, Loic
    Grigonyté, Gintaré
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Kovalevskaite, Jolanta
    Rimkute, Erika
    Utka, Andrius
    Lietuvių kalbos terminų automatinis atpažinimas ir apibrėžimas2015 (oppl. 1)Bok (Fagfellevurdert)
    Abstract [lt]

    Monografijoje pristatyti naujausi automatizuoto lietuvių kalbos terminų nustatymo ir apibrėžimo tyrimai. Šie tyrimai remiasi deskriptyviosios terminologijos ir tekstynų lingvistikos principais. Knygoje aprašyta, kaip buvo sudarytas specialusis švietimo ir mokslo tekstynas, kokiais metodais remiantis automatiškai nustatyti galimi terminai, kaip iš jų atsirinkti analizuotos srities terminai, kokia jiems būdinga struktūra, su kokiomis problemomis susidurta bandant automatiškai nustatyti terminų antraštines formas. Didelis dėmesys skirtas metodologijai aptarti, kaip pusiau automatiškai iš tekstyno nustatyti dalykinę informaciją apie terminus, kurią būtų galima panaudoti apibrėžtims sudaryti. Monografijoje pristatyti viso tyrimo praktiniai rezultatai: Švietimo ir mokslo terminų žodynas, Švietimo ir mokslo terminų ontologija.

  • 4.
    Bjerva, Johannes
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Genetic Algorithms in the Brill Tagger: Moving towards language independence2013Independent thesis Advanced level (degree of Master (One Year)), 10 poäng / 15 hpOppgave
    Abstract [no]

    Da Brill (1992) presenterte sin enkle regelbaserte ordklasse-tagger ble det igjen aktuelt å bruke regelbaserte system for tagging av ordklasser. Taggerens grunnlag er en algoritme som automatisk lærer seg transformasjonsregler fra et korpus. I tillegg til at taggeren yter like bra som moderne stokastiske metoder for ordklasse-tagging har Brill-taggeren den fordelen at reglene den lærer seg kan presenteres i et format som lett kan oppfattes av mennesker.

    Til tross for sine styrker er Brill-taggeren relativt språkavhengig ettersom den fungerer mye bedre for språk som ligner engelsk enn språk med rikere morfologi. Denne oppgaven forsøker å løse dette problemet gjennom å definere regelmaler automatisk med et søk som er optimert med Genetiske Algoritmer. Dette lar Brill GA-taggeren søke gjennom et mye større område enn den ellers kunne ha gjort etter maler som i sin tur genererer regler som er tilpasset målspråket, hvilket også har fordelen at forskere ikke trenger å definere regelmaler manuelt.

    Brill GA-taggeren yter signifikant bedre (p<0.001) enn Brill-taggeren på alle 9 målspråk (Kinesisk, Japansk, Tyrkisk, Slovensk, Portugisisk, Engelsk, Nederlandsk, Svensk og Islandsk), med en feilprosent som er mellom 2% og 15% lavere i alle språk.

  • 5.
    Bjerva, Johannes
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Predicting the N400 Component in Manipulated and Unchanged Texts with a Semantic Probability Model2012Independent thesis Basic level (degree of Bachelor), 10 poäng / 15 hpOppgave
    Abstract [no]

    Innom datalingvistikken har tidligere forskning gjort framsteg når det gjelder å kombinere ordromsmodeller og n-grammodeller. Dette er av spesiell interesse når det er ønskelig å ha en modell som fanger både semantisk og syntaktisk informasjon. Et potensielt bruksområde for en slik modell finnes innom psykolingvistikk, der en neural respons som kalles N400 vist seg å oppstå i kontekster med semantisk inkongruens. Tidligere forskning har oppdaget en sterk korrelasjon mellom cloze probabilities og N400, og nylig forskning har funnet korrelasjoner mellom cloze probabilities og sannsynlighetsmodeller fra datalingvistikk.

    Denne oppgaven har som mål å undersøke hvorvidt en mer direkte kobling mellom slike kombinerte modeller og N400 finnes, med hypotesen at lave sannsynligheter leder til store N400-responser og omvendt. Et antall forsøkspersoner leste en tekst manipulert ved hjelp av en slik modell, og en naturlig tekst, i et EEG-eksperiment. Resultatsanalysen viser at manipuleringene til en viss grad gav resultat som støtter hypotesen. Tilsvarende resultat ble funnet under resultatanalysen av responsene til den naturlige teksten. Ingen signifikante korrelasjoner ble oppdaget mellom N400 og den kombinerte modellen. Forbedringer for videre forskning involverer å blant annet forbedre eksperimentparadigmet slik at en storstilt EEG-inspilling kan gjennomføres for å konstruere en EEG-korpus.

  • 6.
    Bjerva, Johannes
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik. University of Groningen.
    Börstell, Carl
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Morphological complexity influences Verb–Object order in Swedish Sign Language2016Inngår i: Proceedings of the 1st Workshop on Computational Linguistics for Linguistic Complexity (CL4LC) / [ed] Dominique Brunato, Felice Dell'Orletta, Giulia Venturi, Thomas François & Philippe Blache, Osaka: International Committee on Computational Linguistics (ICCL) , 2016, s. 137-141Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Computational linguistic approaches to sign languages could benefit from investigating how complexity influences structure. We investigate whether morphological complexity has an effect on the order of Verb (V) and Object (O) in Swedish Sign Language (SSL), on the basis of elicited data from five Deaf signers. We find a significant difference in the distribution of the orderings OV vs. VO, based on an analysis of morphological weight. While morphologically heavy verbs exhibit a general preference for OV, humanness seems to affect the ordering in the opposite direction, with [+human] Objects pushing towards a preference for VO.

  • 7.
    Bjerva, Johannes
    et al.
    University of Groningen.
    Grigonyte, Gintare
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Plank, Barbara
    University of Groningen.
    Neural Networks and Spelling Features for Native Language Identification2017Inngår i: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computational Linguistics, 2017, s. 235-239Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present the RUG-SU team's submission at the Native Language Identification Shared Task 2017. We combine several approaches into an ensemble, based on spelling error features, a simple neural network using word representations, a deep residual network using word and character features, and a system based on a recurrent neural network. Our best system is an ensemble of neural networks, reaching an F1 score of 0.8323. Although our system is not the highest ranking one, we do outperform the baseline by far.

  • 8. Bjerva, Johannes
    et al.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Cross-lingual Learning of Semantic Textual Similarity with Multilingual Word Representations2017Inngår i: Proceedings of the 21st Nordic Conference on Computational Linguistics / [ed] Jörg Tiedemann, Linköping: Linköping University Electronic Press, 2017, s. 211-215, artikkel-id 024Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Assessing the semantic similarity between sentences in different languages is challenging. We approach this problem by leveraging multilingual distributional word representations, where similar words in different languages are close to each other. The availability of parallel data allows us to train such representations on a large amount of languages. This allows us to leverage semantic similarity data for languages for which no such data exists. We train and evaluate on five language pairs, including English, Spanish, and Arabic. We are able to train wellperforming systems for several language pairs, without any labelled data for that language pair.

  • 9.
    Byström, Emil
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Knowledge-based CoreferenceResolution in Swedish2012Independent thesis Basic level (degree of Bachelor), 10 poäng / 15 hpOppgave
    Abstract [en]

    Automatic coreference resolution is the automatic identification of expressions with the same referents. The state of the art systems are data driven and based on machine learning algorithms. Data drivenapproaches to coreference resolution require big amounts of annotated data, which is time consumingand expensive to obtain. Haghigi and Klein [1] present a knowledge based approach where coreference is resolved with heuristics using rich syntactic and semantic features. Haghigi and Klein’s system isinteresting because its performance is in line with data-driven systems and the requirements of annotateddata is low. In the present study a knowledge based system for coreference resolution in Swedish was implementedand its performance evaluated. The system is based on the system of Haghigi and Klein. To be able to evaluate and implement the algorithm, a database annotated with coreferential chains is needed. Asthere is no freely available resource with data annotated with coreference in Swedish, the annotation ofthe gold standard part of SUC 2.0 is also described. Results from the evaluation of the implementation show that the syntactic and semantic filters implemented did not improve baseline results. The filters falsely allow or constrain coreference as insufficient linguistic information is available. It is argued thatfocusing on rich syntactic and semantic features improves future work on knowledge-based coreferenceresolution in Swedish.

  • 10.
    Börstell, Carl
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Hörberg, Thomas
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Distribution and duration of signs and parts of speech in Swedish Sign Language2016Inngår i: Sign Language and Linguistics, ISSN 1387-9316, E-ISSN 1569-996X, Vol. 19, nr 2, s. 143-196Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In this paper, we investigate frequency and duration of signs and parts of speech in Swedish Sign Language (SSL) using the SSL Corpus. The duration of signs is correlated with frequency, with high-frequency items having shorter duration than low-frequency items. Similarly, function words (e.g. pronouns) have shorter duration than content words (e.g. nouns). In compounds, forms annotated as reduced display shorter duration. Fingerspelling duration correlates with word length of corresponding Swedish words, and frequency and word length play a role in the lexicalization of fingerspellings. The sign distribution in the SSL Corpus shows a great deal of cross-linguistic similarity with other sign languages in terms of which signs appear as high-frequency items, and which categories of signs are distributed across text types (e.g. conversation vs. narrative). We find a correlation between an increase in age and longer mean sign duration, but see no significant difference in sign duration between genders.

  • 11.
    Börstell, Carl
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Mesch, Johanna
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Gärdenfors, Moa
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Towards an Annotation of Syntactic Structure in the Swedish Sign Language Corpus2016Inngår i: Workshop Proceedings: 7th Workshop on the Representation and Processing of Sign Languages: Corpus Mining / [ed] Eleni Efthimiou, Stavroula-Evita Fotinea, Thomas Hanke, Julie Hochgesang, Jette Kristoffersen, Johanna Mesch, Paris: ELRA , 2016, s. 19-24Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper describes on-going work on extending the annotation of the Swedish Sign Language Corpus (SSLC) with a level of syntactic structure. The basic annotation of SSLC in ELAN consists of six tiers: four for sign glosses (two tiers for each signer; one for each of a signer’s hands), and two for written Swedish translations (one for each signer). In an additional step by Östling et al. (2015), all ¨ glosses of the corpus have been further annotated for parts of speech. Building on the previous steps, we are now developing annotation of clause structure for the corpus, based on meaning and form. We define a clause as a unit in which a predicate asserts something about one or more elements (the arguments). The predicate can be a (possibly serial) verbal or nominal. In addition to predicates and their arguments, criteria for delineating clauses include non-manual features such as body posture, head movement and eye gaze. The goal of this work is to arrive at two additional annotation tier types in the SSLC: one in which the sign language texts are segmented into clauses, and the other in which the individual signs are annotated for their argument types.

  • 12.
    Börstell, Carl
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Iconic Locations in Swedish Sign Language: Mapping Form to Meaning with Lexical Databases2017Inngår i: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa / [ed] Jörg Tiedemann, Linköping: Linköping University Electronic Press, 2017, s. 221-225, artikkel-id 026Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper, we describe a method for mapping the phonological feature location of Swedish Sign Language (SSL) signs to the meanings in the Swedish semantic dictionary SALDO. By doing so, we observe clear differences in the distribution of meanings associated with different locations on the body. The prominence of certain locations for specific meanings clearly point to iconic mappings between form and meaning in the lexicon of SSL, which pinpoints modalityspecific properties of the visual modality.

  • 13. Cap, Fabienne
    et al.
    Adesam, Yvonne
    Ahrenberg, Lars
    Borin, Lars
    Bouma, Gerlof
    Forsberg, Markus
    Kann, Viggo
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Smith, Aaron
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Nivre, Joakim
    SWORD: Towards Cutting-Edge Swedish Word Processing2016Inngår i: Proceedings of SLTC 2016, 2016Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Despite many years of research on Swedish language technology, there is still no well-documented standard for Swedish word processing covering the whole spectrum from low-level tokenization to morphological analysis and disambiguation. SWORD is a new initiative within the SWE-CLARIN consortium aiming to develop documented standards for Swedish word processing. In this paper, we report on a pilot study of Swedish tokenization, where we compare the output of six different tokenizers on four different text types. For one text type (Wikipedia articles), we also compare to the tokenization produced by six manual annotators.

  • 14.
    Cortes, Elisabet Eir
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för fonetik.
    Gerholm, ToveStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.Marklund, EllenStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för fonetik.Marklund, UlrikaStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för fonetik.Molnar, MonikaNilsson Björkenstam, KristinaStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.Schwarz, Iris-CorinnaStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för fonetik.Sjons, JohanStockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    WILD 2015: Book of Abstracts2015Konferanseproceedings (Annet vitenskapelig)
    Abstract [en]

    WILD 2015 is the second Workshop on Infant Language Development, held June 10-12 2015 in Stockholm, Sweden. WILD 2015 was organized by Stockholm Babylab and the Department of Linguistics, Stockholm University. About 150 delegates met over three conference days, convening on infant speech perception, social factors of language acquisition, bilingual language development in infancy, early language comprehension and lexical development, neurodevelopmental aspects of language acquisition, methodological issues in infant language research, modeling infant language development, early speech production, and infant-directed speech. Keynote speakers were Alejandrina Cristia, Linda Polka, Ghislaine Dehaene-Lambertz, Angela D. Friederici and Paula Fikkert.

    Organizing this conference would of course not have been possible without our funding agencies Vetenskapsrådet and Riksbankens Jubiléumsfond. We would like to thank Francisco Lacerda, Head of the Department of Linguistics, and the Departmental Board for agreeing to host WILD this year. We would also like to thank the administrative staff for their help and support in this undertaking, especially Ann Lorentz-Baarman and Linda Habermann.

    The WILD 2015 Organizing Committee: Ellen Marklund, Iris-Corinna Schwarz, Elísabet Eir Cortes, Johan Sjons, Ulrika Marklund, Tove Gerholm, Kristina Nilsson Björkenstam and Monika Molnar.

  • 15.
    Drangert, Lisette
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Longitudinella förändringar av yttranden inom variationsmängder i barnriktat tal: En korpusstudie av yttrandetyper och verb2016Independent thesis Basic level (degree of Bachelor), 10 poäng / 15 hpOppgave
    Abstract [sv]

    Variationsmängder är en egenskap i barnriktat tal som karaktäriseras av successiva yttranden i vilka den vuxne talaren upprepar och omformulerar sitt budskap med en konstant intention. Syftet med studien var att undersöka variationsmängder över tid i tal riktat till barn i åldrarna 7-33 månader. Målet var att studera vilka typer av yttranden som dominerar variationsmängderna vid olika åldrar, samt vilka yttrandetyper som tenderade att förekomma tillsammans inom variationsmängderna. Vidare undersöktes intention och tempusförändringar hos verb i dessa variationsmängder. Ett skript skrevs för att kategorisera yttrandetyper med data från en korpus över barnriktat tal. Resultatet undersöktes sedan kvantitativt utifrån fyra åldersgrupper.

    Yttrandenas komplexitet inom variationsmängder visade sig stiga ju äldre barn det rörde sig om. Vidare sågs en skillnad i den vuxnes intention då barnen blev äldre, samt en minskning i användandet av interjektioner i kombination med ja/nej-frågor och komplexa satser ju äldre barnen blev. En tolkning av resultatet föreslogs vara att den vuxne själv tenderar att hålla i båda sidor av konversationen då de talar med yngre barn till skillnad från när de talar med äldre, mer verbala, barn som själva kan bidra med svaret. 

  • 16.
    Ek, Adam
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Nilsson Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Grigonytė, Gintarė
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Gustafson Capková, Sofia
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Identifying Speakers and Addressees in Dialogues Extracted from Literary Fiction2018Inngår i: 11th edition of the Language Resources and Evaluation Conference, European Language Resources Association, 2018Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper describes an approach to identifying speakers and addressees in dialogues extracted from literary fiction, along with a dataset annotated for speaker and addressee. The overall purpose of this is to provide annotation of dialogue interaction between characters in literary corpora in order to allow for enriched search facilities and construction of social networks from the corpora. To predict speakers and addressees in a dialogue, we use a sequence labeling approach applied to a given set of characters. We use features relating to the current dialogue, the preceding narrative, and the complete preceding context. The results indicate that even with a small amount of training data, it is possible to build a fairly accurate classifier for speaker and addressee identification across different authors, though the identification of addressees is the more difficult task.

  • 17. Eklund, Robert
    et al.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Effects of open and directed prompts on filled pauses and utterance production2010Inngår i: Proceedings from Fonetik 2010, Lund, June 2–4, 2010 / [ed] Susanne Schötz and Gilbert Ambrazaitis, Lund: Mediatryck , 2010, s. 23-28Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    This paper describes an experiment where open and directed prompts were alternated when collecting speech data for the deployment of a call-routing application. The experiment tested whether open and directed prompts resulted in any differences with respect to the filled pauses exhibited by the callers, which is interesting in the light of the “many-options” hypothesis of filled pause production. The experiment also investigated the effects of the prompts on utterance form and meaning of the callers.

  • 18.
    Eklås Tejman, Claudia
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Automatisk citatidentifiering för nyhetstext på svenska2015Independent thesis Basic level (degree of Bachelor), 10 poäng / 15 hpOppgave
    Abstract [sv]

    Svenskans strategier för att markera citat skiljer sig från många av de övriga europeiska språkens. Eftersom de flesta system för automatisk citatidentifiering är utvecklade för engelska, var det angeläget att utveckla ett system speciellt anpassat för svensk text. En manuellt annoterad guldstandard bestående av 100 citat från SUC 3.0 och 206 citat från rå webbnyhetstext sammanställdes för att analysera citatens syntaktiska struktur och markeringsmönster. Markeringsmönstren användes sedan för att utveckla ett regelbaserat system för citatextrahering. Systemet uppnådde en F-score på 0,79 för partiella matchningar i den oredigerade nyhetstext som innehöll guldstandardcitaten. 13 av 19 markeringsmönster identifierades helt eller delvis av reglerna. Dock kunde systemet inte avgöra om citaten fortsatte efter anföringsfrasen eller ej, då nytt stycke inte fanns utmärkt i den råa textdatan.

  • 19.
    Engdahl, Johan
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Tremänning eller syssling: Automatisk sökning i bloggar efter ordisoglosser i Sverige2012Independent thesis Basic level (degree of Bachelor), 10 poäng / 15 hpOppgave
    Abstract [sv]

    Ibland används i två dialekter olika ord för samma sak. Syftet med denna studie är att visa vad somkan automatiseras i sökandet efter ord-isoglosser. Detta undersöks genom att skriva och utvärdera ettprogram som genom att analasyera bloggtext söker efter ordisoglosser i Sverige. En isogloss är engeografisk gräns mellan två olika språkliga egenskaper, till exempel prosodi eller betoning, eller som idetta fall ord. Programmet mappar skribentens kommun till orden från bloggtexterna i en databas. Lagttill detta låter programmet användaren söka efter antingen hur vanligt ett ord är i Sveriges kommunerjämfört med riksgenomsnittet; eller vilket av två olika ord som är vanligast inom varje kommun, enligtett två-sidigt proportionstest. Resultatet av de gjorda sökningarna skrevs till en fil och plottades sedanmanuellt. Utvärderingen visar att programmet kan hitta några ordisoglosser mellan kommuner, och attkartorna i viss utsträckning stämmer överrens med de resultat som Parkvall (Parkvall, 2011; Parkvall,2012) påvisar. Detta indikerar att programmet är en bra början för liknande studier. Förbättringar avprogrammet är att användaren tillåts använda reguljära uttryck för att få bort ambuigitet.

  • 20.
    Grigonyte, Gintare
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Baldwin, Timothy
    University of Melbourne.
    Automatic Detection of Multilingual Dictionaries on the Web2014Inngår i: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014, s. 93-98Konferansepaper (Fagfellevurdert)
  • 21.
    Grigonyte, Gintare
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Clematide, Simon
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    Rinaldi, Fabio
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    How preferred are preferred terms?2013Inngår i: eLex 2013 / [ed] Kosem, I., Kallas, J., Gantar, P., Krek, S., Langemets, M., Tuulik, M., 2013, s. 452-459Konferansepaper (Fagfellevurdert)
  • 22.
    Grigonyte, Gintare
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Clematide, SimonUniversity of Zurich.Volk, MartinUniversity of Zurich.Utka, AndriusVytautas Magnus University.
    Proceedings of the Workshop on Innovative Corpus Query and Visualization Tools, NODALIDA 20152015Konferanseproceedings (Fagfellevurdert)
    Abstract [en]

    Recent years have seen an increased interest in and availability of many different kinds of corpora. These range from small, but carefully annotated treebanks to large parallel corpora and very large monolingual corpora for big data research.

    It remains a challenge to offer flexible and powerful query tools for multilayer annotations of small corpora. When dealing with large corpora, query tools also need to scale in terms of processing speed and reporting through statistical information and visualization options. This becomes evident, for example, when dealing with very large corpora (such as complete Wikipedia corpora) or multi-parallel corpora (such as Europarl or JRC Acquis).

    The QueryVis workshop has gathered researchers who develop and evaluate new corpus query and visualization tools for linguistics, language technology and related disciplines. The papers focus on the design of query languages, and on various new visualization options for monolingual and parallel corpora, both for written and spoken language.

    We hope that QueryVis will stimulate discussions and trigger new ideas for the workshop participants and any reader of the proceedings. The preparation of the workshop and the reviewing of the submissions has already been an inspiring experience.

    All papers were peer-reviewed by three program committee members. We would like to thank all reviewers and contributors for their work and for sharing their thoughts and experiences with us.

    Let us all join our forces to make corpus exploration a rewarding, entertaining, and exciting experience which will grant us ever new insights into language and thought.

  • 23.
    Grigonyte, Gintare
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Hammarberg, Björn
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Pronunciation and Spelling: the Case of Misspellings in Swedish L2 Written Essays2014Inngår i: Human Language Technologies - The Baltic Perspective, Baltic HLT 2014 / [ed] Andrius Utka, Gintarė Grigonytė, Jurgita Kapočiūtė-Dzikienė, Jurgita Vaičenonienė, Amsterdam: IOS Press, 2014, s. 95-98Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This research presents an investigation performed on the ASU corpus. We analyse to what extent does the pronunciation of intended words reflects in spelling errors done by L2 Swedish learners. We also propose a method that helps to automatically discriminate the misspellings affected by pronunciation from other types of misspellings.

  • 24.
    Grigonyte, Gintare
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Kvist, Maria
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Velupillai, Sumithra
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Henriksson, Aron
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Swedification patterns of Latin and Greek affixes in clinical text2016Inngår i: Nordic Journal of Linguistics, ISSN 0332-5865, E-ISSN 1502-4717, Vol. 39, nr 1, s. 5-37Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Swedish medical language is rich with Latin and Greek terminology which has undergone a Swedification since the 1980s. However, many original expressions are still used by clinical professionals. The goal of this study is to obtain precise quantitative measures of how the foreign terminology is manifested in Swedish clinical text. To this end, we explore the use of Latin and Greek affixes in Swedish medical texts in three genres: clinical text, scientific medical text and online medical information for laypersons. More specifically, we use frequency lists derived from tokenised Swedish medical corpora in the three domains, and extract word pairs belonging to types that display both the original and Swedified spellings. We describe six distinct patterns explaining the variation in the usage of Latin and Greek affixes in clinical text. The results show that to a large extent affixes in clinical text are Swedified and that prefixes are used more conservatively than suffixes.

  • 25.
    Grigonyte, Gintare
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Schneider, Gerold
    From lexical bundles to surprisal: Measuring the idiom principle2014Inngår i: Lexical bundles in English non-fiction writing: forms and functions, 2014Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Lexical bundles (LB) testify to Sinclair's idiom principle (SIP), and measure formulaicity, complexity and (non-) creativity (FCN). We exploit the information-theoretic measure of surprisal to analyze these.Frequency as measure of LB has been criticized (McEnery et al, 2006:208–220), instead collocation measures were suggested until Biber (2009:286–290) raised three criticisms. First, MI ranks rare collocations, which often include idioms, highest. We answer that also idioms are formulaic, and there are collocation measures which have a bias towards frequent collocations.Second, MI doesn't respect word order. We thus use directed word transition probabilities like surprisal (Levy and Jaeger 2007):3-gram surprisal =Third, formulaic sequences are often discontinuous. We thus sum over sequences, use 3-grams as atoms, and address syntactic surprisal.We argue that abstracting to surprisal as measure of LB and FCN is appropriate, as it expresses reader expectations and text entropy. We use surprisal to analyse differences between:

    1. spoken and written learner language (L2);
    2. L2 across proficiency levels;
    3. L2 compared with L1

    We test Pawley and Syder (1983)'s and Levy and Jaeger (2007)'s hypothesis that native speakers play the tug-of-war between formulaicity and expressiveness best, thus minimizing comprehension difficulty, according to the uniform information density principle.

  • 26.
    Grigonyte, Gintare
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Schneider, Gerold
    English Department, University of Zurich, Switzerland.
    Measuring Encoding Efficiency in Swedish and English Language Learner Speech Production2017Inngår i: The 18th Annual Conference of the International Speech Communication Association Interspeech 2017 / [ed] Marcin Włodarczak, The International Speech Communication Association (ISCA), 2017, artikkel-id 337Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We use n-gram language models to investigate how far lan- guage approximates an optimal code for human communication in terms of Information Theory [1], and what differences there are between Learner proficiency levels. Although the language of lower level learners is simpler, it is less optimal in terms of information theory, and as a consequence more difficult to pro- cess. 

  • 27.
    Grigonyté, Gintaré
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Kvist, Maria
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. Karolinska Institutet, Sweden.
    Velupillai, Sumithra
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Improving Readability of Swedish Electronic Health Records through Lexical Simplification: First Results2014Inngår i: Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR), Stroudsburg, USA: Association for Computational Linguistics, 2014, s. 74-83Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper describes part of an ongoing effort to improve the readability of Swedish electronic health records (EHRs). An EHR contains systematic documentation of a single patient’s medical history across time, entered by healthcare professionals with the purpose of enabling safe and informed care. Linguistically, medical records exemplify a highly specialised domain, which can be superficially characterised as having telegraphic sentences involving displaced or missing words, abundant abbreviations, spelling variations including misspellings, and terminology. We report results on lexical simplification of Swedish EHRs, by which we mean detecting the unknown, out-ofdictionary words and trying to resolve them either as compounded known words, abbreviations or misspellings.

  • 28.
    Grigonyté, Gintaré
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Kvist, Maria
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. Karolinska Institute, Sweden.
    Velupillai, Sumithra
    Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Spelling Variation of Latin and Greek words in Swedish Medical Text2014Konferansepaper (Fagfellevurdert)
  • 29.
    Grigonyté, Gintaré
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Nilsson Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Language-independent exploration of repetition and variation in longitudinal child-directed speech: A tool and resources2016Inngår i: Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016 / [ed] Elena Volodina, Gintarė Grigonytė, Ildikó Pilán, Kristina Nilsson Björkenstam, Lars Borin, Linköping: Linköping University Electronic Press, 2016, s. 41-50Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present a language-independent tool, called Varseta, for extracting variation sets in child-directed speech. This tool is evaluated against a gold standard corpus annotated with variation sets, MINGLE-3-VS, and used to explore variation sets in 26 languages in CHILDES-26-VS, a comparable corpus derived from the CHILDES database. The tool and the resources are freely available for re-search.

  • 30.
    Hammarberg, Björn
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för allmän språkvetenskap.
    Grigonyté, Gintaré
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Non-Native Writers’ Errors – a Challenge to a Spell-Checker2014Inngår i: 1st Nordic workshop on evaluation of spellchecking and proofing tools (NorWEST2014), 2014, , s. 3Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Spell checkers are widely used and if they do their job properly are also highly useful. Usually they are built on the assumption that the text to be corrected is written by a mature native speaker. However non-native speakers are in an even greater need of using spell checkers than native speakers. On the other hand current spell checkers do not take the linguistic problems of learners into account and thus they are poor in identifying errors and supplying the adequate corrections. There is a number of linguistic complexities specific to non-native learners that a spell-checker would need to handle in order to be successful.

  • 31.
    Hjelm, Hans
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Extraction of Cross Language Term Correspondences2006Inngår i: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'2006), 2006Konferansepaper (Fagfellevurdert)
  • 32.
    Hjelm, Hans
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Identifying Cross Language Term Equivalents Using Statistical Machine Translation and Distributional Association Measures2007Inngår i: Proceedings of Nodalida 2007, the 16th Nordic Conference of Computational Linguistics / [ed] Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit, 2007Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This article presents a comparison of the accuracy of a number of different approaches for identifying cross language term equivalents (translations). The methods investigated are on the one hand associative measures, commonly used in word-space models or in Information Retrieval and on the other hand a Statistical Machine Translation (SMT) approach. I have performed tests on six language pairs, using the JRC-Acquis parallel corpus as training material and Eurovoc as a gold standard. The SMT approach is shown to be more effective than the associative measures. The best results are achieved by taking a weighted average of the scores of the SMT approach and disparate associative measures.

  • 33.
    Hjelm, Hans
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Buitelaar, Paul
    Multilingual Evidence Improves Clustering-based Taxonomy Extraction2008Inngår i: Proceedings of the 18th European Conference on Artificial Intelligence (ECAI 2008), 2008Konferansepaper (Fagfellevurdert)
  • 34.
    Hjelm, Hans
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Schwarz, Christoph
    LiSa - Morphological Analysis for Information Retrieval2006Inngår i: Proceedings of the 15th NODALIDA conference, Joensuu 2005, 2006Konferansepaper (Fagfellevurdert)
  • 35.
    Hultin, Felix
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Understanding Context-free Grammars through Data Visualization2016Independent thesis Basic level (degree of Bachelor), 10 poäng / 15 hpOppgave
    Abstract [en]

    Ever since the late 1950's, context-free grammars have played an important role within the field of linguistics, been a part of introductory courses and expanded into other fields of study. Meanwhile, data visualization in modern web development has made it possible to do feature rich visualization in the browser. In this thesis, these two developments are united, by developing a browser based app, to write context-free grammars, parse sentences and visualize the output. A user experience study with usability-tests and user-interviews is conducted, in order to investigate the possible benefits and disadvantages of said visualization when writing context-free grammars. The results show that data visualization was limitedly used by participants, in that it helped them to see if sentences were parsed and, if a sentence was not parsed, at which position parsing went wrong. Future improvements on the software and studies on them are proposed as well as the expansion of the field of data visualization within linguistics.

  • 36.
    Hägglöf, Hillevi
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Tengstrand, Lisa
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    A Random Indexing Approach to Unsupervised Selectional Preference Induction2011Independent thesis Basic level (degree of Bachelor), 10 poäng / 15 hpOppgave
    Abstract [en]

    A selectional preference is the relation between a head-word and plausible arguments of that head-word. Estimation of the association feature between these words is important to natural language processing applications such as Word Sense Disambiguation. This study presents a novel approach to selectional preference induction within a Random Indexing word space. This is a spatial representation of meaning where distributional patterns enable estimation of the similarity between words. Using only frequency statistics about words to estimate how strongly one word selects another, the aim of this study is to develop a flexible method that is not language dependent and does not require any annotated resourceswhich is in contrast to methods from previous research. In order to optimize the performance of the selectional preference model, experiments including parameter tuning and variation of corpus size were conducted. The selectional preference model was evaluated in a pseudo-word evaluation which lets the selectional preference model decide which of two arguments have a stronger correlation to a given verb. Results show that varying parameters and corpus size does not affect the performance of the selectional preference model in a notable way. The conclusion of the study is that the language modelused does not provide the adequate tools to model selectional preferences. This might be due to a noisy representation of head-words and their arguments.

  • 37.
    Ibbotson, Paul
    et al.
    School of Childhood, Youth and Sport, Open University, Walton Hall, Bedfordshire, UK.
    Hartman, Rose M.
    Department of Psychology, University of Oregon, Oregon, USA.
    Nilsson Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Frequency filter: an open access tool for analysing language development2018Inngår i: Language, Cognition and Neuroscience, ISSN 2327-3798, E-ISSN 2327-3801, Vol. 33, nr 6, s. 1-15Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present an open-access analytic tool, which allows researchers to simultaneously control for and combine language data from the child, the caregiver, multiple languages, and across multiple time points to make inferences about the social and cognitive factors driving the shape of language development. We demonstrate how the tool works in three domains of language learning and across six languages. The results demonstrate the usefulness of this approach as well as providing deeper insight into three areas of language production and acquisition: egocentric language use, the learnability of nouns versus verbs, and imageability. We have made the Frequency Filter tool freely available as an R-package for other researchers to use at https://github.com/rosemm/FrequencyFilter.

  • 38.
    Kasaty, Anna
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Koponen, Eeva
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Klintfors, Eeva
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för fonetik.
    Swedish Nominal Morphophonology Implemented within the Two-level Model in PC-Kimmo1998Independent thesis Basic level (degree of Bachelor), 10 poäng / 15 hpOppgave
    Abstract [en]

    This paper presents a description of Swdish morphophonology and an attempt to create a Swedish pronunciation morpheme lexicon as a part of a text-to-speech system at Telia Research AB.

  • 39.
    Koponen, Eeva
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för fonetik. Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Klintfors, Eeva
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för fonetik.
    Effects of Target-Word Frequency Rate on Sound-Meaning-Connection in Five to Fifteen Month-Old Swedish Infants1999Independent thesis Basic level (degree of Bachelor), 10 poäng / 15 hpOppgave
    Abstract [en]

    The purpose of this study was to examine the effects of manipulating target-word frequency rate and target-word phrase position on sound-meaning-connection in five to fifteen month old Swedish infants. Three different test conditions, each one of them a film showing objects and corresponding phrases made of randomly generated artificial words, were designed. The structure of the first, high variability test condition included context-dependent information and the structures of the second and the third, low variability test conditions were characterised by frequent nonsense target-word rate, target-words occurring in phrase final position. The aim of the artificial input language was to ensure the novelty of test material, and to simulate the type of learning situation - when the semantic content of words is arbitrary - facing young infants in the beginning of language learning. Analysis of informants looking behaviour, prior to, and after exposure to the objects and the corresponding audio input, were performed. Results showed that the structure of high variability test condition and the structure of low variability test conditions were associated with significant between-group differences. This finding indicates that the nonsense phrases in low variability test conditions managed to 'explain' the objects just like semantically meaningful phrases do. When compared with past research, these findings seem to suggest that experience-dependent mechanisms may support, besides word segmentation, even more complicated aspects of language learning, such as acquisition of syntax.

  • 40.
    Lindström, Mathias
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Automatic Segmentation of Swedish Medical Words with Greek and Latin Morphemes: A Computational Morphological Analysis2015Independent thesis Basic level (degree of Bachelor), 10 poäng / 15 hpOppgave
    Abstract [en]

    Raw text data online has increased the need for designing artificial systems capable of processing raw data efficiently and at a low cost in the field of natural language processing (NLP). A well-developed morphological analysis is an important cornerstone of NLP, in particular when word look-up is an important stage of processing. Morphological analysis has many advantages, including reducing the number of word forms to be stored computationally, as well as being cost-efficient and time-efficient. NLP is relevant in the field of medicine, especially in automatic text analysis, which is a relatively young field in Swedish medical texts. Much of the stored information is highly unstructured and disorganized.

    Using raw corpora, this paper aims to contribute to automatic morphological segmentation by experimenting with state-of-art-tools for unsupervised and semi-supervised word segmentation of Swedish words in medical texts. The results show that a reasonable segmentation is more dependent on a high number of word types, rather than a special type of corpora. The results also show that semi-supervised word segmentation in the form of annotated training data greatly increases the performance.

  • 41.
    Ljunglöf, Peter
    et al.
    Göteborgs universitet.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Syntactic parsing2010Inngår i: Handbook of Natural Language Processing / [ed] Nitin Indurkhya & Fred J. Damerau, Boca Raton, Florida: Chapman & Hall/CRC , 2010, 2, s. 59-91Kapittel i bok, del av antologi (Annet (populærvitenskap, debatt, mm))
    Abstract [en]

    This chapter presents basic techniques for grammar-driven natural language parsing, that is, analyzing a string of words (typically a sentence) to determine its structural description according to a formal grammar. In most circumstances, this is not a goal in itself but rather an intermediary step for the purpose of further processing, such as the assignment of a meaning to the sentence. To this end, the desired output of grammar-driven parsing is typically a hierarchical, syntactic structure suitable for semantic interpretation (the topic of Chapter 5). The string of words constituting the input will usually have been processed in separate phases of tokenization (Chapter 2) and lexical analysis (Chapter 3), which is hence not part of parsing proper.

  • 42.
    Loftsson, Hrafn
    et al.
    Reykjaviks universitet, Island.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Tagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic2013Inngår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013), Linköping University Electronic Press, Linköpings universitet, 2013, s. 105-119Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper, we experiment with using Stagger, an open-source implementation of an Averaged Perceptron tagger, to tag Icelandic, a morphologically complex language. By adding languagespecific linguistic features and using IceMorphy, an unknown word guesser, we obtain state-of- the-art tagging accuracy of 92.82%. Furthermore, by adding data from a morphological database, and word embeddings induced from an unannotated corpus, the accuracy increases to 93.84%. This is equivalent to an error reduction of 5.5%, compared to the previously best tagger for Icelandic, consisting of linguistic rules and a Hidden Markov Model.

  • 43.
    Marklund, Ellen
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för fonetik.
    Cortes, Elísabet Eir
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för fonetik.
    Sjons, Johan
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    MMN responses in adults after exposure to bimodal and unimodal frequency distributions of rotated speech2017Inngår i: Proceedings of Interspeech 2017, The International Speech Communication Association (ISCA), 2017, s. 1804-1808Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The aim of the present study is to further the understanding of the relationship between perceptual categorization and exposure to different frequency distributions of sounds. Previous studies have shown that speech sound discrimination proficiency is in- fluenced by exposure to different distributions of speech sound continua varying along one or several acoustic dimensions, both in adults and in infants. In the current study, adults were presented with either a bimodal or a unimodal frequency distri- bution of spectrally rotated sounds along a continuum (a vowel continuum before rotation). Categorization of the sounds, quantified as amplitude of the event-related potential (ERP) component mismatch negativity (MMN) in response to two of the sounds, was measured before and after exposure. It was expected that the bimodal group would have a larger MMN amplitude after exposure whereas the unimodal group would have a smaller MMN amplitude after exposure. Contrary to expectations, the MMN amplitude was smaller overall after exposure, and no difference was found between groups. This suggests that either the previously reported sensitivity to frequency distributions of speech sounds is not present for non-speech sounds, or the MMN amplitude is not a sensitive enough measure of categorization to detect an influence from passive exposure, or both.

  • 44.
    Nilsson Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    SUC-CORE: A Balanced Corpus Annotated with Noun Phrase Coreference2013Inngår i: Northern European Journal of Language Technology (NEJLT), ISSN 2000-1533, Vol. 3, nr 2, s. 19-39Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper describes SUC-CORE, a subset of the Stockholm Umeå Corpus and the Swedish Treebank annotated with noun phrase coreference. While most coreference annotated corpora consist of texts of similar types within related domains, SUC-CORE consists of both informative and imaginative prose and covers a wide range of literary genres and domains.This allows for exploration of coreference across different text types, but it also means that there are limited amounts of data within each type. Future work on coreference resolution for Swedish should include making more annotated data available for the research community.

  • 45.
    Nilsson Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    The MINGLE annotation scheme: Multimodal annotation of parent-child interation in a free play setting (version 1.0)2012Rapport (Annet vitenskapelig)
    Abstract [en]

    A cognitive model of language learning must be dialogue-driven and multimodal to reflect how parent and child interact, using words, eye gaze, and object manipulation. We present a scheme for multimodal annotation of parent-child interaction. The purpose is to add verbal and non-verbal annotation to a corpus of longitudinal video and sound recordings of parent-child dyads. In this guideline, we describe the transcription of adult and child speech and vocalizations, and the annotation of both empty-hand gestures and object-related actions by both adults and children.

  • 46.
    Nilsson Björkenstam, Kristina
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    What is a corpus and why are corpora important tools?2013Konferansepaper (Annet vitenskapelig)
  • 47.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Björkstrand, Thomas
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Grigonyté, Gintaré
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Gustafson-Capková, Sofia
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Mesch, Johanna
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Östling, Robert
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Schönström, Krister
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för svenska som andraspråk för döva.
    Wallin, Lars
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för teckenspråk.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    SWE-CLARIN partner presentation: Natural Language Processing Resources from the Department of Linguistics, Stockholm University2014Inngår i: The first Swedish national SWE-CLARIN workshop: LT-based e-HSS in Sweden – taking stock and looking ahead / [ed] Lars Borin, 2014Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    The aim of the CLARIN Research Infrastructure and SWE-CLARIN is to facilitate for scholars in the humanities and social sciences to access primary data in the form of natural language, and to provide tools for exploring, annotating and analysing these data. This paper gives an overview of the resources and tools developed at the Department of Linguistics at Stockholm University planned to be made available within the SWE-CLARIN project. The paper also outlines our collaborations with neighbouring areas in the humanities and social sciences where these resources and tools will be put to use.

  • 48.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Byström, Emil
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    SUC-CORE: SUC 2.0 Annotated with NP Coreference2012Inngår i: Proceedings of the Fourth Swedish Language Technology Conference (SLTC), October 24-26, 2012, Lund / [ed] Pierre Nugues, 2012Konferansepaper (Fagfellevurdert)
    Abstract [en]

    SUC-CORE is a subset of Stockholm Umeå Corpus 2.0 and Swedish Treebank, annotated with noun phrase coreference. While most coreference annotated corpora consist of texts of similar types within related domains, SUC-CORE consists of both informative and imaginative prose and covers a wide range of literary genres and domains.

  • 49.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Grigonyté, Gintaré
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Trump säger det igen, igen och igen2017Inngår i: Språktidningen, ISSN 1654-5028, nr 2, s. 24-27Artikkel i tidsskrift (Annet (populærvitenskap, debatt, mm))
  • 50.
    Nilsson Björkenstam, Kristina
    et al.
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Gustafson Capková, Sofia
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    Wirén, Mats
    Stockholms universitet, Humanistiska fakulteten, Institutionen för lingvistik, Avdelningen för datorlingvistik.
    The Stockholm University Strindberg Corpus: Content and Possibilities2014Inngår i: Strindberg on International Stages/Strindberg in Translation / [ed] Roland Lysell, Cambridge: Cambridge Scholars Publishing, 2014Kapittel i bok, del av antologi (Annet vitenskapelig)
    Abstract [en]

    We have approached the works of August Strindberg from  a computational linguistic point of view, resulting in The Stockholm University Strindberg Corpus, consisting of seven of Strindberg's autobiographical works with linguistic annotation. The corpus is freely available for research. We use this corpus for three quantitative studies of Strindberg’s work: in the first, we describe the novels included in the corpus by keywords; in the second, we compare Strindberg’s use of emotionally charged words with selected prose of both his contemporaries and present-day authors; in the third, we explore the semantic prosody of KVINNA (“woman”) and MAN (“man”).

123 1 - 50 of 104
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf