Change search
Refine search result
1 - 24 of 24
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Beskow, Jonas
    et al.
    KTH Speech, Music and Hearing.
    Carlson, Rolf
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Granström, Björn
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Hjalmarsson, Anna
    KTH Speech, Music and Hearing.
    Skantze, Gabriel
    KTH Speech, Music and Hearing.
    Multimodal Interaction Control2009In: Computers in the Human Interaction Loop / [ed] Waibel, Alex and Stiefelhagen, Rainer, Berlin/Heidelberg: Springer Berlin/Heidelberg, 2009, p. 143-158Chapter in book (Refereed)
  • 2.
    Beskow, Jonas
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Gustafson, Joakim
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Hjalmarsson, Anna
    KTH Speech, Music and Hearing.
    House, David
    KTH Speech, Music and Hearing.
    Research focus: Interactional aspects of spoken face-to-face communication2010In: Proceedings from Fonetik 2010, Lund: Lund University , 2010, p. 7-10Conference paper (Other academic)
  • 3.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Gustafson, Joakim
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Hjalmarsson, Anna
    KTH Speech, Music and Hearing.
    Towards human-like spoken dialogue systems2008In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 50, no 8-9, p. 630-645Article in journal (Refereed)
    Abstract [en]

    This paper presents an overview of methods that can be used to collect and analyse data on user responses to spoken dialogue system components intended to increase human-likeness, and to evaluate how well the components succeed in reaching that goal. Wizard-of-Oz variations, human-human data manipulation, and micro-domains are discussed ill this context, as is the use of third-party reviewers to get a measure of the degree of human-likeness. We also present the two-way mimicry target, a model for measuring how well a human-computer dialogue mimics or replicates some aspect of human-human dialogue, including human flaws and inconsistencies. Although we have added a measure of innovation, none of the techniques is new in its entirely. Taken together and described from a human-likeness perspective, however, they form a set of tools that may widen the path towards human-like spoken dialogue systems.

  • 4.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Exploring prosody in interaction control2005In: Phonetica, ISSN 0031-8388, E-ISSN 1423-0321, Vol. 62, no 2-4, p. 215-226Article in journal (Refereed)
    Abstract [en]

    This paper investigates prosodic aspects of turn-taking in conversation with a view to improving the efficiency of identifying relevant places at which a machine can legitimately begin to talk to a human interlocutor. It examines the relationship between interaction control, the communicative function of which is to regulate the flow of information between interlocutors, and its phonetic manifestation. Specifically, the listener's perception of such interaction control phenomena is modelled. Algorithms for automatic online extraction of prosodic phenomena liable to be relevant for interaction control, such as silent pauses and intonation patterns, are presented and evaluated in experiments using Swedish map task data. We show that the automatically extracted prosodic features can be used to avoid many of the places where current dialogue systems run the risk of interrupting their users, as well as to identify suitable places to take the turn.

  • 5.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Underpinning /nailon/: automatic estimation of pitch range and speaker relative pitch2007In: Speaker Classification II / [ed] Müller, Christian, Berlin/Heidelberg, Germany: Springer Berlin/Heidelberg, 2007, p. 229-242Chapter in book (Refereed)
  • 6.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Al Moubayed, Samer
    KTH Speech, Music and Hearing.
    Gravano, Agustìn
    Hirschberg, Julia
    Columbia University Computer Science.
    Very short utterances in conversation2010In: Proceedings from Fonetik 2010, Lund: Lund University , 2010, p. 11-16Conference paper (Other academic)
  • 7.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Hirschberg, Julia
    Columbia University Computer Science.
    Pause and gap length in face-to-face interaction2009In: Proceedings of Interspeech 2009, Brighton, UK: ISCA , 2009, p. 2779-2782Conference paper (Refereed)
  • 8.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Pelcé, Antoine
    KTH Speech, Music and Hearing.
    Prosodic features of very short utterances in dialogue2009In: Nordic Prosody: Proceedings of the Xth Conference, Helsinki 2008, Frankfurt am Main: Peter Lang Publishing Group, 2009, p. 57-68Conference paper (Refereed)
  • 9.
    Gustafson, Joakim
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Potential benefits of human-like dialogue behaviour in the call routing domain2008In: Perception in Multimodal Dialogue Systems, Berlin/Heidelberg, Germany: Springer Berlin/Heidelberg, 2008, p. 240-251Chapter in book (Refereed)
    Abstract [en]

    This paper presents a Wizard-of-Oz (Woz) experiment in the call routing domain that took place during the development of a call routing system for the TeliaSonera residential customer care in Sweden. A corpus of 42,000 calls was used as a basis for identifying problematic dialogues and the strategies used by operators to overcome the problems. A new Woz recording was made, implementing some of these strategies. The collected data is described and discussed with a view to explore the possible benefits of more human-like dialogue behaviour in call routing applications.

  • 10.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Detection thresholds for gaps, overlaps and no-gap-no-overlaps2011In: Journal of the Acoustical Society of America, ISSN 0001-4966, Vol. 130, no 1, p. 508-513Article in journal (Refereed)
    Abstract [en]

    Detection thresholds for gaps and overlaps, that is acoustic and perceived silences and stretches of overlapping speech in speaker changes, were determined. Subliminal gaps and overlaps were cate- gorized as no-gap-no-overlaps. The established gap and overlap detection thresholds both corre- sponded to the duration of a long vowel, or about 120 ms. These detection thresholds are valuable for mapping the perceptual speaker change categories gaps, overlaps, and no-gap-no-overlaps into the acoustic domain. Furthermore, the detection thresholds allow generation and understanding of gaps, overlaps, and no-gap-no-overlaps in human-like spoken dialogue systems.

  • 11.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Pauses, gaps and overlaps in conversations2010In: Journal of Phonetics, ISSN 0095-4470, E-ISSN 1095-8576, Vol. 38, no 4, p. 555-568Article in journal (Refereed)
    Abstract [en]

    This paper explores durational aspects of pauses gaps and overlaps in three different conversational corpora with a view to challenge claims about precision timing in turn-taking Distributions of pause gap and overlap durations in conversations are presented and methodological issues regarding the statistical treatment of such distributions are discussed The results are related to published minimal response times for spoken utterances and thresholds for detection of acoustic silences in speech It is shown that turn-taking is generally less precise than is often claimed by researchers in the field of conversation analysis or interactional linguistics These results are discussed in the light of their implications for models of timing in turn-taking and for interaction control models in speech technology In particular it is argued that the proportion of speaker changes that could potentially be triggered by information immediately preceding the speaker change is large enough for reactive interaction controls models to be viable in speech technology.

  • 12.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Carlson, Rolf
    KTH Speech, Music and Hearing.
    Interruption impossible2006In: Nordic Prosody: Proceedings of the IXth Conference, Lund 2004, Frankfurt am Main: Peter Lang Publishing Group, 2006, p. 97-105Conference paper (Refereed)
  • 13.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Hirschberg, Julia
    Columbia University Computer Science.
    Pitch similarity in the vicinity of backchannels2010In: Proceedings Interspeech 2010, Makuhari, Japan: ISCA , 2010, p. 3054-3057Conference paper (Refereed)
  • 14.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Hjalmarsson, Anna
    KTH Speech, Music and Hearing.
    Laskowski, Kornel
    KTH Speech, Music and Hearing.
    Very short utterances and timing in turn-taking2011In: Proceedings Interspeech 2011, Florence, Italy: ISCA , 2011, p. 2837-2840Conference paper (Refereed)
  • 15.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Laskowski, Kornel
    KTH Speech, Music and Hearing.
    Pelcé, Antoine
    KTH Speech, Music and Hearing.
    Prosodic features in the vicinity of silences and overlaps2009In: Nordic Prosody: Proceedings of the Xth Conference, Helsinki 2008, Frankfurt am Main: Peter Lang , 2009, p. 95-105Conference paper (Refereed)
  • 16.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    A single-port non-parametric model of turn-taking in multi-party conversation2011In: Proceedings ICASSP 2011, Prague, Czech Republic: ICASSP , 2011, p. 5600-5603Conference paper (Refereed)
  • 17.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    An instantaneous vector representation of delta pitch for speaker-change prediction in conversational dialogue systems2008In: Proceedings of ICASSP 2008, Las Vegas, Nevada, USA: ICASSP , 2008, p. 5041-5044Conference paper (Refereed)
  • 18.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Incremental learning and forgetting in stochastic turn-taking models2011In: Proceedings Interspeech 2011, Florence, Italy: ISCA , 2011, p. 2069-2072Conference paper (Refereed)
  • 19.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Machine learning of prosodic sequences using the fundamental frequency variation spectrum2008In: Proceedings of the Speech Prosody 2008 Conference, Campinas, Brazil: Editora RG/CNPq , 2008, p. 151-154Conference paper (Refereed)
  • 20.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    The fundamental frequency variation spectrum2008In: Proceedings FONETIK 2008, Gothenburg: Göteborg University , 2008, p. 29-32Conference paper (Other academic)
  • 21.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    A general-purpose 32 ms prosodic vector for hidden Markov modeling2009In: Proceedings Interspeech 2009, Brighton, UK: ISCA , 2009, p. 724-727Conference paper (Refereed)
  • 22.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Exploring the prosody of floor mechanisms in English using the fundamental frequency variation spectrum2009In: Proceedings of EUSIPCO 2009, Glasgow, Scotland: ISCA , 2009, p. 2539-2543Conference paper (Refereed)
    Abstract [en]

    A basic requirement for participation in conversation is the ability to jointly manage interaction, and to recognize the attempts of interlocutors to do same. Examples of management activity include efforts to acquire, re-acquire, hold, release, and acknowledge floor ownership, and they are often implemented using dedicated dialog act types. In this work, we explore the prosody of one class of such dialog acts, known as floor mechanisms, using a methodology based on a recently proposed representation of fundamental frequency variation. Models over the representation illustrate significant differences between floor mechanisms and other dialog act types, and lead to automatic detection accuracies in equal-prior test data of up to 75%. description of floor mechanism prosody. We note that this work is also the first attempt to compute and model FFV spectra for multiparty rather than two-party conversation, as well as the first attempt to infer dialogue structure from non-anechoic-chamber recordings.

  • 23.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Wölfel, Matthias
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Computing the fundamental frequency variation spectrum in conversational spoken dialogue systems2008In: Proceedings of the 155th Meeting of the Acoustical Society of America, 5th EAA Forum Acusticum, and 9th SFA Congrés Français d'Acoustique (Acoustics2008), Paris, France: ASA , 2008, p. 3305-3310Conference paper (Refereed)
    Abstract [en]

    Continuous modeling of intonation in natural speech has long been hampered by a focus on modeling pitch, of which several normative aspects are particularly problematic. The latter include, among others, the fact that pitch is undefined in unvoiced segments, that its absolute magnitude is speaker-specific, and that its robust estimation and modeling, at a particular point in time, rely on a patchwork of long-time stability heuristics. In the present work, we continue our analysis of the fundamental frequency variation (FFV) spectrum, a recently proposed instantaneous, continuous, vector-valued representation of pitch variation, which is obtained by comparing the harmonic structure of the frequency magnitude spectra of the left and right half of an analysis frame. We analyze the sensitivity of a task-specific error rate in a conversational spoken dialogue system to the specific definition of the left and right halves of a frame, resulting in operational recommendations regarding the framing policy and window shape.

  • 24.
    Östling, Robert
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Stagger: A modern POS tagger for Swedish2012In: / [ed] Pierre Nugues, 2012Conference paper (Refereed)
1 - 24 of 24
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf