Change search
Refine search result
1 - 29 of 29
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Beskow, Jonas
    et al.
    KTH Speech, Music and Hearing.
    Carlson, Rolf
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Granström, Björn
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Hjalmarsson, Anna
    KTH Speech, Music and Hearing.
    Skantze, Gabriel
    KTH Speech, Music and Hearing.
    Multimodal Interaction Control2009In: Computers in the Human Interaction Loop / [ed] Waibel, Alex and Stiefelhagen, Rainer, Berlin/Heidelberg: Springer Berlin/Heidelberg, 2009, p. 143-158Chapter in book (Refereed)
  • 2.
    Beskow, Jonas
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Gustafson, Joakim
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Hjalmarsson, Anna
    KTH Speech, Music and Hearing.
    House, David
    KTH Speech, Music and Hearing.
    Research focus: Interactional aspects of spoken face-to-face communication2010In: Proceedings from Fonetik 2010, Lund: Lund University , 2010, p. 7-10Conference paper (Other academic)
  • 3.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Gustafson, Joakim
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Hjalmarsson, Anna
    KTH Speech, Music and Hearing.
    Towards human-like spoken dialogue systems2008In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 50, no 8-9, p. 630-645Article in journal (Refereed)
    Abstract [en]

    This paper presents an overview of methods that can be used to collect and analyse data on user responses to spoken dialogue system components intended to increase human-likeness, and to evaluate how well the components succeed in reaching that goal. Wizard-of-Oz variations, human-human data manipulation, and micro-domains are discussed ill this context, as is the use of third-party reviewers to get a measure of the degree of human-likeness. We also present the two-way mimicry target, a model for measuring how well a human-computer dialogue mimics or replicates some aspect of human-human dialogue, including human flaws and inconsistencies. Although we have added a measure of innovation, none of the techniques is new in its entirely. Taken together and described from a human-likeness perspective, however, they form a set of tools that may widen the path towards human-like spoken dialogue systems.

  • 4.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Exploring prosody in interaction control2005In: Phonetica, ISSN 0031-8388, E-ISSN 1423-0321, Vol. 62, no 2-4, p. 215-226Article in journal (Refereed)
    Abstract [en]

    This paper investigates prosodic aspects of turn-taking in conversation with a view to improving the efficiency of identifying relevant places at which a machine can legitimately begin to talk to a human interlocutor. It examines the relationship between interaction control, the communicative function of which is to regulate the flow of information between interlocutors, and its phonetic manifestation. Specifically, the listener's perception of such interaction control phenomena is modelled. Algorithms for automatic online extraction of prosodic phenomena liable to be relevant for interaction control, such as silent pauses and intonation patterns, are presented and evaluated in experiments using Swedish map task data. We show that the automatically extracted prosodic features can be used to avoid many of the places where current dialogue systems run the risk of interrupting their users, as well as to identify suitable places to take the turn.

  • 5.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    /nailon/ – software for online analysis of prosody2006In: Proceedings Interspeech 2006, Pittsburgh, PA, USA: ISCA , 2006, p. 2022-2025Conference paper (Refereed)
  • 6.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Underpinning /nailon/: automatic estimation of pitch range and speaker relative pitch2007In: Speaker Classification II / [ed] Müller, Christian, Berlin/Heidelberg, Germany: Springer Berlin/Heidelberg, 2007, p. 229-242Chapter in book (Refereed)
  • 7.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Al Moubayed, Samer
    KTH Speech, Music and Hearing.
    Gravano, Agustìn
    Hirschberg, Julia
    Columbia University Computer Science.
    Very short utterances in conversation2010In: Proceedings from Fonetik 2010, Lund: Lund University , 2010, p. 11-16Conference paper (Other academic)
  • 8.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Gustafson, Joakim
    KTH Speech, Music and Hearing.
    Utterance segmentation and turn-taking in spoken dialogue systems2005In: Sprachtechnologie, mobile kommunikation und linguistische ressourcen, Frankfurt am Main: Peter Lang Publishing Group, 2005, p. 576-587Chapter in book (Refereed)
  • 9.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Hirschberg, Julia
    Columbia University Computer Science.
    Pause and gap length in face-to-face interaction2009In: Proceedings of Interspeech 2009, Brighton, UK: ISCA , 2009, p. 2779-2782Conference paper (Refereed)
  • 10.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Pelcé, Antoine
    KTH Speech, Music and Hearing.
    Prosodic features of very short utterances in dialogue2009In: Nordic Prosody: Proceedings of the Xth Conference, Helsinki 2008, Frankfurt am Main: Peter Lang Publishing Group, 2009, p. 57-68Conference paper (Refereed)
  • 11.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Włodarczak, Marcin
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Catching wind of multiparty conversation2014In: Proceedings of Multimodal Corpora: Combining applied and basic research targets (MMC 2014) / [ed] Jens Edlund, Dirk Heylen, Patrizia Paggio, Reykjavik, Iceland: European Language Resources Association , 2014, p. 35-36Chapter in book (Other academic)
    Abstract [en]

    The paper describes the design of a novel multimodal corpus of spontaneous multiparty conversations in Swedish. The corpus is collected with the primary goal of investigating the role of breathing and its perceptual cues for interactive control of interaction. Physiological correlates of breathing are captured by means of respiratory belts, which measure changes in cross sectional area of the rib cage and the abdomen. Additionally, auditory and visual correlates of breathing are recorded in parallel to the actual conversations. The corpus allows studying respiratory mechanisms underlying organisation of spontaneous conversation, especially in connection with turn management. As such, it is a valuable resource both for fundamental research and speech techonology applications.

  • 12.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics. Department of Linguistics.
    Włodarczak, Marcin
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Is breathing prosody?2014In: International Symposium on Prosody to Commemorate Gösta Bruce, Lund: Lund University , 2014Conference paper (Other academic)
    Abstract [en]

    Even though we may not be aware of it, much breathing in face-to-face conversation is both clearly audible and visible. Consequently, it has been suggested that respiratory activity is used in the joint coordination of conversational flow. For instance, it has been claimed that inhalation is an interactionally salient cue to speech initiation, that exhalation is a turn yielding device, and that breath holding is a marker of turn incompleteness (e.g. Local & Kelly, 1986; Schegloff, 1996). So far, however, few studies have addressed the interactional aspects of breathing (one notable exeption is McFarland, 2001). In this poster, we will describe our ongoing efforts to fill this gap. We will present the design of a novel corpus of respiratory activity in spontaneous multiparty face-to-face conversations in Swedish. The corpus will contain physiological measurements relevant to breathing, high-quality audio, and video. Minimally, the corpus will be annotated with interactional events derived from voice activity detection and (semi-) automatically detected inhalation and exhalation events in the respiratory data. We will also present initial analyses of the material collected. The question is whether breathing is prosody and relevant to this symposium? What we do know is that the turntaking phenomena that of particular interest to us are closely (almost by definition) related to several prosodic phenomena, and in particular to those associated with prosodic phrasing, grouping and boundaries. Thus, we will learn more about respiratory activity in phrasing (and the like) through analyses of breathing in conversation. References Local, John K., & Kelly, John. (1986). Projection and 'silences': Notes on phonetic and conversational structure. Human Studies, 9, 185-204. McFarland, David H. (2001). Respiratory markers of conversational interaction. Journal of Speech, Language, and Hearing Research, 44, 128-143. Schegloff, E. A. (1996). Turn organization: One intersection of grammar and interaction. In E. Ochs, E. A. Schegloff & S. A. Thompson (Eds.), Interaction and Grammar (pp. 52-133), Cambridge: Cambridge University Press.

  • 13.
    Gustafson, Joakim
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Potential benefits of human-like dialogue behaviour in the call routing domain2008In: Perception in Multimodal Dialogue Systems, Berlin/Heidelberg, Germany: Springer Berlin/Heidelberg, 2008, p. 240-251Chapter in book (Refereed)
    Abstract [en]

    This paper presents a Wizard-of-Oz (Woz) experiment in the call routing domain that took place during the development of a call routing system for the TeliaSonera residential customer care in Sweden. A corpus of 42,000 calls was used as a basis for identifying problematic dialogues and the strategies used by operators to overcome the problems. A new Woz recording was made, implementing some of these strategies. The collected data is described and discussed with a view to explore the possible benefits of more human-like dialogue behaviour in call routing applications.

  • 14.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Pauses, gaps and overlaps in conversations2010In: Journal of Phonetics, ISSN 0095-4470, E-ISSN 1095-8576, Vol. 38, no 4, p. 555-568Article in journal (Refereed)
    Abstract [en]

    This paper explores durational aspects of pauses gaps and overlaps in three different conversational corpora with a view to challenge claims about precision timing in turn-taking Distributions of pause gap and overlap durations in conversations are presented and methodological issues regarding the statistical treatment of such distributions are discussed The results are related to published minimal response times for spoken utterances and thresholds for detection of acoustic silences in speech It is shown that turn-taking is generally less precise than is often claimed by researchers in the field of conversation analysis or interactional linguistics These results are discussed in the light of their implications for models of timing in turn-taking and for interaction control models in speech technology In particular it is argued that the proportion of speaker changes that could potentially be triggered by information immediately preceding the speaker change is large enough for reactive interaction controls models to be viable in speech technology.

  • 15.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    What turns speech into conversation?: A project description2007In: Quarterly progress and status report: proceedings from Fonetik 2007, May 30-June 1, 2007, Stockholm: Department of Speech, Music and Hearing, KTH , 2007, p. 45-48Conference paper (Other academic)
  • 16.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Carlson, Rolf
    KTH Speech, Music and Hearing.
    Interruption impossible2006In: Nordic Prosody: Proceedings of the IXth Conference, Lund 2004, Frankfurt am Main: Peter Lang Publishing Group, 2006, p. 97-105Conference paper (Refereed)
  • 17.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Hirschberg, Julia
    Columbia University Computer Science.
    Pitch similarity in the vicinity of backchannels2010In: Proceedings Interspeech 2010, Makuhari, Japan: ISCA , 2010, p. 3054-3057Conference paper (Refereed)
  • 18.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Hjalmarsson, Anna
    KTH Speech, Music and Hearing.
    Laskowski, Kornel
    KTH Speech, Music and Hearing.
    Very short utterances and timing in turn-taking2011In: Proceedings Interspeech 2011, Florence, Italy: ISCA , 2011, p. 2837-2840Conference paper (Refereed)
  • 19.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    A single-port non-parametric model of turn-taking in multi-party conversation2011In: Proceedings ICASSP 2011, Prague, Czech Republic: ICASSP , 2011, p. 5600-5603Conference paper (Refereed)
  • 20.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Machine learning of prosodic sequences using the fundamental frequency variation spectrum2008In: Proceedings of the Speech Prosody 2008 Conference, Campinas, Brazil: Editora RG/CNPq , 2008, p. 151-154Conference paper (Refereed)
  • 21.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    The fundamental frequency variation spectrum2008In: Proceedings FONETIK 2008, Gothenburg: Göteborg University , 2008, p. 29-32Conference paper (Other academic)
  • 22.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    A general-purpose 32 ms prosodic vector for hidden Markov modeling2009In: Proceedings Interspeech 2009, Brighton, UK: ISCA , 2009, p. 724-727Conference paper (Refereed)
  • 23.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Exploring the prosody of floor mechanisms in English using the fundamental frequency variation spectrum2009In: Proceedings of EUSIPCO 2009, Glasgow, Scotland: ISCA , 2009, p. 2539-2543Conference paper (Refereed)
    Abstract [en]

    A basic requirement for participation in conversation is the ability to jointly manage interaction, and to recognize the attempts of interlocutors to do same. Examples of management activity include efforts to acquire, re-acquire, hold, release, and acknowledge floor ownership, and they are often implemented using dedicated dialog act types. In this work, we explore the prosody of one class of such dialog acts, known as floor mechanisms, using a methodology based on a recently proposed representation of fundamental frequency variation. Models over the representation illustrate significant differences between floor mechanisms and other dialog act types, and lead to automatic detection accuracies in equal-prior test data of up to 75%. description of floor mechanism prosody. We note that this work is also the first attempt to compute and model FFV spectra for multiparty rather than two-party conversation, as well as the first attempt to infer dialogue structure from non-anechoic-chamber recordings.

  • 24.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Wölfel, Matthias
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Computing the fundamental frequency variation spectrum in conversational spoken dialogue systems2008In: Proceedings of the 155th Meeting of the Acoustical Society of America, 5th EAA Forum Acusticum, and 9th SFA Congrés Français d'Acoustique (Acoustics2008), Paris, France: ASA , 2008, p. 3305-3310Conference paper (Refereed)
    Abstract [en]

    Continuous modeling of intonation in natural speech has long been hampered by a focus on modeling pitch, of which several normative aspects are particularly problematic. The latter include, among others, the fact that pitch is undefined in unvoiced segments, that its absolute magnitude is speaker-specific, and that its robust estimation and modeling, at a particular point in time, rely on a patchwork of long-time stability heuristics. In the present work, we continue our analysis of the fundamental frequency variation (FFV) spectrum, a recently proposed instantaneous, continuous, vector-valued representation of pitch variation, which is obtained by comparing the harmonic structure of the frequency magnitude spectra of the left and right half of an analysis frame. We analyze the sensitivity of a task-specific error rate in a conversational spoken dialogue system to the specific definition of the left and right halves of a frame, resulting in operational recommendations regarding the framing policy and window shape.

  • 25. Oertel, Catharine
    et al.
    Salvi, Giampiero
    Götze, Jana
    Edlund, Jens
    Gustafson, Joakim
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    The KTH Games Corpora: How to Catch a Werewolf2013Conference paper (Refereed)
  • 26.
    Strömbergsson, Sofia
    et al.
    Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology (CLINTEC), Karolinska Institutet (KI), Stockholm, Sweden.
    Edlund, Jens
    Department of Speech, Music and Hearing, KTH, Stockholm, Sweden.
    Götze, Jana
    Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology (CLINTEC), Karolinska Institutet (KI), Stockholm, Sweden.
    Nilsson Björkenstam, Kristina
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Approximating phonotactic input in children’s linguistic environments from orthographic transcripts2017In: Proceedings of the 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017), Stockholm: The International Speech Communication Association (ISCA), 2017., Stockholm: The International Speech Communication Association (ISCA), 2017, p. 2214-2217Conference paper (Refereed)
    Abstract [en]

    Child-directed spoken data is the ideal source of support for claims about children’s linguistic environments. However, phonological transcriptions of child-directed speech are scarce,compared to sources like adult-directed speech or text data. Acquiring reliable descriptions of children’s phonological environments from more readily accessible sources would mean considerable savings of time and money. The first step towards this goal is to quantify the reliability of descriptions derived from such secondary sources. We investigate how phonological distributions vary across different modalities (spoken vs. written), and across the age of the intended audience (children vs. adults). Using a previously unseen collection of Swedish adult- and child-directed spoken and written data, we combine lexicon look-up and grapheme-to-phonemeconversion to approximate phonological characteristics. The analysis shows distributional differences across datasets both for single phonemes and for longer phoneme sequences. Some of these are predictably attributed to lexical and contextual characteristics of text vs. speech.The generated phonological transcriptions are remarkably reliable. The differences in phonological distributions between child-directed speech and secondary sources highlight a need for compensatory measures when relying on written data or onadult-directed spoken data, and/or for continued collection ofactual child-directed speech in research on children’s language environments.

  • 27. Strömbergsson, Sofia
    et al.
    Nilsson Björkenstam, Kristina
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Götze, Jana
    Edlund, Jens
    Simulating Speech Errors in Swedish, Norwegian and English2018Conference paper (Refereed)
  • 28.
    Włodarczak, Marcin
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Edlund, Jens
    Breathing in Conversation: An Unwritten History2015In: Proceedings of the 2nd European and the 5th Nordic Symposium on Multimodal Communication / [ed] Kristiina Jokinen, Martin Vels, Linköping, 2015, p. 107-112Conference paper (Refereed)
    Abstract [en]

    This paper attempts to draw attention of the multimodal communication research community to what we consider a long overdue topic, namely respiratory activity in conversation. We submit that a turn towards spontaneous interaction is a natural extension of the recent interest in speech breathing, and is likely to offer valuable insights into mechanisms underlying organisation of interaction and collaborative human action in general, as well as to make advancement in existing speech technology applications. Particular focus is placed on the role of breathing as a perceptually and interactionally salient turn-taking cue. We also present the recording setup developed in the Phonetics Laboratory at Stockholm University with the aim of studying communicative functions of physiological and audio-visual breathing correlates in spontaneous multiparty interactions

  • 29.
    Włodarczak, Marcin
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Edlund, Jens
    Communicative needs and respiratory constraints2015In: 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015): Speech Beyond Speech Towards a Better Understanding of the Most Important Biosignal, 2015, p. 3051-3055Conference paper (Refereed)
    Abstract [en]

    This study investigates timing of communicative behaviour with respect to speaker’s respiratory cycle. The data is drawn from a corpus of multiparty conversations in Swedish. We find that while longer utterances (> 1 s) are tied, predictably, primarily to exhalation onset, shorter vocalisations are spread more uni- formly across the respiratory cycle. In addition, nods, which are free from any respiratory constraints, are most frequently found around exhalation offsets, where respiratory requirements for even a short utterance are not satisfied. We interpret the results to reflect the economy principle in speech production, whereby respiratory effort, associated primarily with starting a new respiratory cycle, is minimised within the scope of speaker’s communicative goals. 

1 - 29 of 29
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf