Change search
Refine search result
12 1 - 50 of 68
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Aare, Kätlin
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Włodarczak, Marcin
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Backchannels and breathing2014In: Proceedings from FONETIK 2014: Stockholm, June 9-11, 2014 / [ed] Mattias Heldner, Stockholm: Department of Linguistics, Stockholm University , 2014, 47-52 p.Conference paper (Other academic)
    Abstract [en]

    The present study investigated the timing of backchannel onsets within speaker’s own and dialogue partner’s breathing cycle in two spontaneous conversations in Estonian. Results indicate that backchannels are mainly produced near the beginning, but also in the second half of the speaker’s exhalation phase. A similar tendency was observed in short non-backchannel utterances, indicating that timing of backchannels might be determined by their duration rather than their pragmatic function. By contrast, longer non-backchannel utterances were initiated almost exclusively right at the beginning of the exhalation. As expected, backchannels in the conversation partner’s breathing cycle occurred predominantly towards the end of the exhalation or at the beginning of the inhalation. 

  • 2.
    Aare, Kätlin
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Włodarczak, Marcin
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Inhalation amplitude and turn-taking in spontaneous Estonian conversations2015In: Proceedings from Fonetik 2015 Lund, June 8-10, 2015 / [ed] Malin Svensson Lundmark, Gilbert Ambrazaitis, Joost van de Weijer, Lund: Lund University , 2015, 1-5 p.Conference paper (Other academic)
    Abstract [en]

    This study explores the relationship between inhalation amplitude and turn management in four approximately 20 minute long spontaneous multiparty conversations in Estonian. The main focus of interest is whether inhalation amplitude is greater before turn onset than in the following inhalations within the same speaking turn. The results show that inhalations directly before turn onset are greater in amplitude than those later in the turn. The difference seems to be realized by ending the inhalation at a greater lung volume value, whereas the initial lung volume before inhalation onset remains roughly the same across a single turn. The findings suggest that the increased inhalation amplitude could function as a cue for claiming the conversational floor.

  • 3.
    Bell, Linda
    et al.
    KTH Speech, Music and Hearing.
    Gustafson, Joakim
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Prosodic adaptation in human-computer interaction2003In: Proceedings ICPhS 2003, Barcelona, Spain: ISCA , 2003, 2453-2456 p.Conference paper (Refereed)
    Abstract [en]

    State-of-the-art speech recognizers are trained on predominantly normal speech and have difficulties handling either exceedingly slow and hyperarticulated or fast and sloppy speech. Explicitly instructing users on how to speak, however, can make the human–computer interaction stilted and unnatural. If it is possible to affect users’ speaking rate while maintaining the naturalness of the dialogue, this could prove useful in the development of future human–computer interfaces. Users could thus be subtly influenced to adapt their speech to better match the current capabilities of the system, so that errors can be reduced and the overall quality of the human–computer interaction is improved. At the same time, speakers are allowed to express themselves freely and naturally. In this article, we investigate whether people adapt their speech as they interact with an animated character in a simulated spoken dialogue system. A user experiment involving 16 subjects was performed to examine whether people who speak with a simulated dialogue system adapt their speaking rate to that of the system. The experiment confirmed that the users adapted to the speaking rate of the system, and no subjects afterwards seemed to be aware they had been affected in this way. Another finding was that speakers varied their speaking rate substantially in the course of the dialogue. In particular, problematic sequences where subjects had to repeat or rephrase the same utterance several times elicited slower speech.

  • 4.
    Berger, Alexandra
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics. Karolinska Institutet, Sweden.
    Hedström Lindenhäll, Rosanna
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics. Karolinska Institutet, Sweden.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Karlsson, Sofia
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics. Karolinska Institutet, Sweden.
    Nyberg Pergament, Sarah
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics. Karolinska Institutet, Sweden.
    Vojnovic, Ivan
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics. Karolinska Institutet, Sweden.
    Voices after midnight: How a night out affects voice quality2014In: Proceedings from FONETIK 2014: Stockholm, June 9-11, 2014 / [ed] Mattias Heldner, Stockholm: Department of Linguistics, Stockholm University , 2014, 1-4 p.Conference paper (Other academic)
    Abstract [en]

    This study aimed to investigate how different parameters of the voice (jitter, shimmer, LTAS and mean pitch) are affected by a late night out. Three recordings were made: one early evening before the night out, one after midnight, and one on the next day. Each recording consisted of a one minute reading and prolonged vowels. Five students took part in the experiment. Results varied among the participants, but some patterns were noticeable in all parameters. A trend towards increased mean pitch during the second recording was observed among four of the subjects. Somewhat unexpectedly, jitter and shimmer decreased between the first and second recordings and increased in the third one. Due to the lack of ethical testing, only a small number of participants were included. A larger sample is suggested for future research in order to generalize results.

  • 5.
    Beskow, Jonas
    et al.
    KTH Speech, Music and Hearing.
    Carlson, Rolf
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Granström, Björn
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Hjalmarsson, Anna
    KTH Speech, Music and Hearing.
    Skantze, Gabriel
    KTH Speech, Music and Hearing.
    Multimodal Interaction Control2009In: Computers in the Human Interaction Loop / [ed] Waibel, Alex and Stiefelhagen, Rainer, Berlin/Heidelberg: Springer Berlin/Heidelberg, 2009, 143-158 p.Chapter in book (Refereed)
  • 6.
    Beskow, Jonas
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Gustafson, Joakim
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Hjalmarsson, Anna
    KTH Speech, Music and Hearing.
    House, David
    KTH Speech, Music and Hearing.
    Research focus: Interactional aspects of spoken face-to-face communication2010In: Proceedings from Fonetik 2010, Lund: Lund University , 2010, 7-10 p.Conference paper (Other academic)
  • 7.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Alexandersson, Simon
    Beskow, Jonas
    KTH Speech, Music and Hearing.
    Gustavsson, Lisa
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Hjalmarsson, Anna
    KTH Speech, Music and Hearing.
    Kallioinen, Petter
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Marklund, Ellen
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    3rd party observer gaze as a continuous measure of dialogue flow2012Conference paper (Refereed)
    Abstract [en]

    We present an attempt at using 3rd party observer gaze to get a measure of how appropriate each segment in a dialogue is for a speaker change. The method is a step away from the current dependency of speaker turns or talkspurts towards a more general view of speaker changes. We show that 3rd party observers do indeed largely look at the same thing (the speaker), and how this can be captured and utilized to provide insights into human communication. In addition, the results also suggest that there might be differences in the distribution of 3rd party observer gaze depending on how information-rich an utterance is. 

  • 8.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Gustafson, Joakim
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Hjalmarsson, Anna
    KTH Speech, Music and Hearing.
    Towards human-like spoken dialogue systems2008In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 50, no 8-9, 630-645 p.Article in journal (Refereed)
    Abstract [en]

    This paper presents an overview of methods that can be used to collect and analyse data on user responses to spoken dialogue system components intended to increase human-likeness, and to evaluate how well the components succeed in reaching that goal. Wizard-of-Oz variations, human-human data manipulation, and micro-domains are discussed ill this context, as is the use of third-party reviewers to get a measure of the degree of human-likeness. We also present the two-way mimicry target, a model for measuring how well a human-computer dialogue mimics or replicates some aspect of human-human dialogue, including human flaws and inconsistencies. Although we have added a measure of innovation, none of the techniques is new in its entirely. Taken together and described from a human-likeness perspective, however, they form a set of tools that may widen the path towards human-like spoken dialogue systems.

  • 9.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Exploring prosody in interaction control2005In: Phonetica, ISSN 0031-8388, E-ISSN 1423-0321, Vol. 62, no 2-4, 215-226 p.Article in journal (Refereed)
    Abstract [en]

    This paper investigates prosodic aspects of turn-taking in conversation with a view to improving the efficiency of identifying relevant places at which a machine can legitimately begin to talk to a human interlocutor. It examines the relationship between interaction control, the communicative function of which is to regulate the flow of information between interlocutors, and its phonetic manifestation. Specifically, the listener's perception of such interaction control phenomena is modelled. Algorithms for automatic online extraction of prosodic phenomena liable to be relevant for interaction control, such as silent pauses and intonation patterns, are presented and evaluated in experiments using Swedish map task data. We show that the automatically extracted prosodic features can be used to avoid many of the places where current dialogue systems run the risk of interrupting their users, as well as to identify suitable places to take the turn.

  • 10.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    /nailon/ – software for online analysis of prosody2006In: Proceedings Interspeech 2006, Pittsburgh, PA, USA: ISCA , 2006, 2022-2025 p.Conference paper (Refereed)
  • 11.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Underpinning /nailon/: automatic estimation of pitch range and speaker relative pitch2007In: Speaker Classification II / [ed] Müller, Christian, Berlin/Heidelberg, Germany: Springer Berlin/Heidelberg, 2007, 229-242 p.Chapter in book (Refereed)
  • 12.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Al Moubayed, Samer
    KTH Speech, Music and Hearing.
    Gravano, Agustìn
    Hirschberg, Julia
    Columbia University Computer Science.
    Very short utterances in conversation2010In: Proceedings from Fonetik 2010, Lund: Lund University , 2010, 11-16 p.Conference paper (Other academic)
  • 13.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Gustafson, Joakim
    KTH Speech, Music and Hearing.
    On the effect of the acoustic environment on the accuracy of perception of speaker orientation from auditory cues alone2012In: INTERSPEECH 2012: vol.2, Portland, USA: Curran Associates, Inc. , 2012, 1482-1485 p.Conference paper (Refereed)
    Abstract [en]

    The ability of people, and of machines, to determine the position of a sound source in a room is well studied. The related ability to determine the orientation of a directed sound source, on the other hand, is not, but the few studies there are show people to be surprisingly skilled at it. This has bearing for studies of face-to- face interaction and of embodied spoken dialogue systems, as sound source orientation of a speaker is connected to the head pose of the speaker, which is meaningful in a number of ways. The feature most often implicated for detection of sound source orientation is the inter-aural level difference - a feature which it is assumed is more easily exploited in anechoic chambers than in everyday surroundings. We expand here on our previous studies and compare detection of speaker orientation within and outside of the anechoic chamber. Our results show that listeners find the task easier, rather than harder, in everyday surroundings, which suggests that inter-aural level differences is not the only feature at play. 

  • 14.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Gustafson, Joakim
    KTH Speech, Music and Hearing.
    Utterance segmentation and turn-taking in spoken dialogue systems2005In: Sprachtechnologie, mobile kommunikation und linguistische ressourcen, Frankfurt am Main: Peter Lang Publishing Group, 2005, 576-587 p.Chapter in book (Refereed)
  • 15.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Gustafson, Joakim
    KTH Speech, Music and Hearing.
    Who am I speaking at? Perceiving the head orientation of speakers from acoustic cues alone2012In: LREC Workshop on Multimodal Corpora for Machine Learning, Istanbul, Turkey: LREC , 2012Chapter in book (Refereed)
    Abstract [en]

    The ability of people, and of machines, to determine the position of a sound source in a room is well studied. The related ability to determine the orientation of a directed sound source, on the other hand, is not, but the few studies there are show people to be surprisingly skilled at it. This has bearing for studies of face-to-face interaction and of embodied spoken dialogue systems, as sound source orientation of a speaker is connected to the head pose of the speaker, which is meaningful in a number of ways. We describe in passing some preliminary findings that led us onto this line of investigation, and in detail a study in which we extend an experiment design intended to measure perception of gaze direction to test instead for perception of sound source orientation. The results corroborate those of previous studies, and further show that people are very good at performing this skill outside of studio conditions as well. 

  • 16.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Hirschberg, Julia
    Columbia University Computer Science.
    Pause and gap length in face-to-face interaction2009In: Proceedings of Interspeech 2009, Brighton, UK: ISCA , 2009, 2779-2782 p.Conference paper (Refereed)
  • 17.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Pelcé, Antoine
    KTH Speech, Music and Hearing.
    Prosodic features of very short utterances in dialogue2009In: Nordic Prosody: Proceedings of the Xth Conference, Helsinki 2008, Frankfurt am Main: Peter Lang Publishing Group, 2009, 57-68 p.Conference paper (Refereed)
  • 18.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Włodarczak, Marcin
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Catching wind of multiparty conversation2014In: Proceedings of Multimodal Corpora: Combining applied and basic research targets (MMC 2014) / [ed] Jens Edlund, Dirk Heylen, Patrizia Paggio, Reykjavik, Iceland: European Language Resources Association , 2014, 35-36 p.Chapter in book (Other academic)
    Abstract [en]

    The paper describes the design of a novel multimodal corpus of spontaneous multiparty conversations in Swedish. The corpus is collected with the primary goal of investigating the role of breathing and its perceptual cues for interactive control of interaction. Physiological correlates of breathing are captured by means of respiratory belts, which measure changes in cross sectional area of the rib cage and the abdomen. Additionally, auditory and visual correlates of breathing are recorded in parallel to the actual conversations. The corpus allows studying respiratory mechanisms underlying organisation of spontaneous conversation, especially in connection with turn management. As such, it is a valuable resource both for fundamental research and speech techonology applications.

  • 19.
    Edlund, Jens
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics. Department of Linguistics.
    Włodarczak, Marcin
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Is breathing prosody?2014In: International Symposium on Prosody to Commemorate Gösta Bruce, Lund: Lund University , 2014Conference paper (Other academic)
    Abstract [en]

    Even though we may not be aware of it, much breathing in face-to-face conversation is both clearly audible and visible. Consequently, it has been suggested that respiratory activity is used in the joint coordination of conversational flow. For instance, it has been claimed that inhalation is an interactionally salient cue to speech initiation, that exhalation is a turn yielding device, and that breath holding is a marker of turn incompleteness (e.g. Local & Kelly, 1986; Schegloff, 1996). So far, however, few studies have addressed the interactional aspects of breathing (one notable exeption is McFarland, 2001). In this poster, we will describe our ongoing efforts to fill this gap. We will present the design of a novel corpus of respiratory activity in spontaneous multiparty face-to-face conversations in Swedish. The corpus will contain physiological measurements relevant to breathing, high-quality audio, and video. Minimally, the corpus will be annotated with interactional events derived from voice activity detection and (semi-) automatically detected inhalation and exhalation events in the respiratory data. We will also present initial analyses of the material collected. The question is whether breathing is prosody and relevant to this symposium? What we do know is that the turntaking phenomena that of particular interest to us are closely (almost by definition) related to several prosodic phenomena, and in particular to those associated with prosodic phrasing, grouping and boundaries. Thus, we will learn more about respiratory activity in phrasing (and the like) through analyses of breathing in conversation. References Local, John K., & Kelly, John. (1986). Projection and 'silences': Notes on phonetic and conversational structure. Human Studies, 9, 185-204. McFarland, David H. (2001). Respiratory markers of conversational interaction. Journal of Speech, Language, and Hearing Research, 44, 128-143. Schegloff, E. A. (1996). Turn organization: One intersection of grammar and interaction. In E. Ochs, E. A. Schegloff & S. A. Thompson (Eds.), Interaction and Grammar (pp. 52-133), Cambridge: Cambridge University Press.

  • 20.
    Eriksson, Anders
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Bertinetto, Pier Marco
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Nodari, Rosalba
    Lenoci, Giovanna
    The Acoustics of Lexical Stress in Italian as a Function of Stress Level and Speaking Style2016In: Proceedings Interspeech 2016, International Speech Communication Association, 2016, 1059-1063 p.Conference paper (Refereed)
    Abstract [en]

    The study is part of a series of studies, describing the acoustics of lexical stress in a way that should be applicable to any language. The present database of recordings includes Brazilian Portuguese, English, Estonian, German, French, Italian and Swedish. The acoustic parameters examined are F0-level, F0- variation, Duration, and Spectral Emphasis. Values for these parameters, computed for all vowels (a little over 24000 vowels for Italian), are the data upon which the analyses are based. All parameters are examined with respect to their correlation with Stress (primary, secondary, unstressed) and speaking Style (wordlist reading, phrase reading, spontaneous speech) and Sex of the speaker (female, male). For Italian Duration was found to be the dominant factor by a wide margin, in agreement with previous studies. Spectral Emphasis was the second most important factor. Spectral Emphasis has not been studied previously for Italian but intensity, a related parameter, has been shown to correlate with stress. F0-level was also significantly correlated but not to the same degree. Speaker Sex turned out as significant in many comparisons. The differences were, however, mainly a function of the degree to which a given parameter was used, not how it was used to signal lexical stress contrasts. 

  • 21.
    Eriksson, Anders
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    The acoustics of word stress in English as a function of stress level and speaking style2015In: 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015): Speech Beyond Speech Towards a Better Understanding of the Most Important Biosignal, 2015, 41-45 p.Conference paper (Refereed)
    Abstract [en]

    This study of lexical stress in English is part of a series of studies, the goal of which is to describe the acoustics of lexical stress for a number of typologically different languages. When fully developed the methodology should be applicable to any language. The database of recordings so far includes Brazilian Portuguese, English (U.K.), Estonian, German, French, Italian and Swedish. The acoustic parameters examined are f0-level, f0-variation, Duration, and Spectral Emphasis. Values for these parameters, computed for all vowels, are the data upon which the analyses are based. All parameters are tested with respect to their correlation with stress level (primary, secondary, unstressed) and speaking style (wordlist reading, phrase reading, spontaneous speech). For the English data, the most robust results concerning stress level are found for Duration and Spectral Emphasis. f0-level is also significantly correlated but not quite to the same degree. The acoustic effect of phonological secondary stress was significantly different from primary stress only for Duration. In the statistical tests, speaker sex turned out as significant in most cases. Detailed examination showed, however, that the difference was mainly in the degree to which a given parameter was used, not how it was used to signal lexical stress contrasts. 

  • 22.
    Gustafson, Joakim
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Potential benefits of human-like dialogue behaviour in the call routing domain2008In: Perception in Multimodal Dialogue Systems, Berlin/Heidelberg, Germany: Springer Berlin/Heidelberg, 2008, 240-251 p.Chapter in book (Refereed)
    Abstract [en]

    This paper presents a Wizard-of-Oz (Woz) experiment in the call routing domain that took place during the development of a call routing system for the TeliaSonera residential customer care in Sweden. A corpus of 42,000 calls was used as a basis for identifying problematic dialogues and the strategies used by operators to overcome the problems. A new Woz recording was made, implementing some of these strategies. The collected data is described and discussed with a view to explore the possible benefits of more human-like dialogue behaviour in call routing applications.

  • 23. Hammarsten, Jonna
    et al.
    Harris, Roxanne
    Henriksson, Nilla
    Pano, Isabelle
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Włodarczak, Marcin
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Temporal aspects of breathing and turn-taking in Swedish multiparty conversations2015In: Proceedings from Fonetik 2015 / [ed] Malin Svensson Lundmark, Gilbert Ambrazaitis, Joost van de Weijer, Lund: Centre for Languages and Literature, 2015, 47-50 p.Conference paper (Other academic)
    Abstract [en]

    Interlocutors use various signals to make conversations flow smoothly. Recent research has shown that respiration is one of the signals used to indicate the intention to start speaking. In this study, we investigate whether inhalation duration and speech onset delay within one’s own turn differ from when a new turn is initiated. Respiratory activity was recorded in two three-party conversations using Respiratory Inductance Plethysmography. Inhalations were categorised depending on whether they coincided with within-speaker silences or with between- speaker silences. Results showed that within-turn inhalation durations were shorter than inhalations preceding new turns. Similarly, speech onset delays were shorter within turns than before new turns. Both these results suggest that speakers ‘speed up’ preparation for speech inside turns, probably to indicate that they intend to continue. 

  • 24.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Detection thresholds for gaps, overlaps and no-gap-no-overlaps2011In: Journal of the Acoustical Society of America, ISSN 0001-4966, Vol. 130, no 1, 508-513 p.Article in journal (Refereed)
    Abstract [en]

    Detection thresholds for gaps and overlaps, that is acoustic and perceived silences and stretches of overlapping speech in speaker changes, were determined. Subliminal gaps and overlaps were cate- gorized as no-gap-no-overlaps. The established gap and overlap detection thresholds both corre- sponded to the duration of a long vowel, or about 120 ms. These detection thresholds are valuable for mapping the perceptual speaker change categories gaps, overlaps, and no-gap-no-overlaps into the acoustic domain. Furthermore, the detection thresholds allow generation and understanding of gaps, overlaps, and no-gap-no-overlaps in human-like spoken dialogue systems.

  • 25.
    Heldner, Mattias
    Umeå University, Department of Philosophy and Linguistics.
    Is an F0-rise a necessary or a sufficient cue to perceived focus in Swedish?1998In: Nordic prosody: prodeedings of the VIIth conference, Joensuu 1996 / [ed] Stefan Werner, Frankfurt am Main: Peter Lang Publishing Group, 1998, 109-125 p.Conference paper (Refereed)
  • 26.
    Heldner, Mattias
    Umeå University, Department of Philosophy and Linguistics.
    On the non-linear lengthening of focally accented Swedish words2001In: Nordic Prosody: proceedings of the VIIIth Conference, Trondheim 2000 / [ed] Wim A. van Dommelen, Thorstein Fretheim, Frankfurt am Main: Peter Lang Publishing Group, 2001, 103-112 p.Conference paper (Refereed)
  • 27.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    On the reliability of overall intensity and spectral emphasis as acoustic correlates of focal accents in Swedish2003In: Journal of Phonetics, ISSN 0095-4470, E-ISSN 1095-8576, Vol. 31, no 1, 39-62 p.Article in journal (Refereed)
    Abstract [en]

    This study shows that increases in overall intensity and spectral emphasis are reliable acoustic correlates of focal accents in Swedish. They are both reliable in the sense that there are statistically significant differences between focally accented words and nonfocal ones for a variety of words, in any position of the phrase and for all speakers in the analyzed materials, and in the sense of their being useful for automatic detection of focal accents. Moreover, spectral emphasis turns out to be the more reliable correlate, as the influence on it of position in the phrase, word accent and vowel height was less pronounced and as it proved a better predictor of focal accents in general and for a majority of the speakers. Finally, the study has resulted in data for overall intensity and spectral emphasis that might prove important in modeling for speech synthesis.

  • 28.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Proceedings from FONETIK 2014: Stockholm, June 9-11, 20142014Conference proceedings (editor) (Other academic)
  • 29.
    Heldner, Mattias
    Umeå University, Department of Philosophy and Linguistics.
    Spectral emphasis as a perceptual cue to prominence2001In: TMH-QPSR 42, Stockholm: KTH , 2001, 51-57 p.Chapter in book (Other academic)
  • 30.
    Heldner, Mattias
    Umeå University, Department of Philosophy and Linguistics.
    Spectral emphasis as an additional source of information in accent detection2001In: Prosody 2001: ISCA Tutorial and Research Workshop on Prosody in Speech Recognition and Understanding, Red Bank, NJ, USA: ISCA , 2001, 57-60 p.Chapter in book (Refereed)
  • 31.
    Heldner, Mattias
    Umeå University, Department of Philosophy and Linguistics.
    To what extent is perceived focus determined by F0-cues?1997In: Eurospeech 97, Proceedings / [ed] G. Kokkinakis, N. Fakotakis, E. Dermatas, Rhodes, Greece: ESCA , 1997, 875-877 p.Conference paper (Refereed)
  • 32.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Pauses, gaps and overlaps in conversations2010In: Journal of Phonetics, ISSN 0095-4470, E-ISSN 1095-8576, Vol. 38, no 4, 555-568 p.Article in journal (Refereed)
    Abstract [en]

    is paper explores durational aspects of pauses gaps and overlaps in three different conversational corpora with a view to challenge claims about precision timing in turn-taking Distributions of pause gap and overlap durations in conversations are presented and methodological issues regarding the statistical treatment of such distributions are discussed The results are related to published minimal response times for spoken utterances and thresholds for detection of acoustic silences in speech It is shown that turn-taking is generally less precise than is often claimed by researchers in the field of conversation analysis or interactional linguistics These results are discussed in the light of their implications for models of timing in turn-taking and for interaction control models in speech technology In particular it is argued that the proportion of speaker changes that could potentially be triggered by information immediately preceding the speaker change is large enough for reactive interaction controls models to be viable in speech technology

  • 33.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    What turns speech into conversation?: A project description2007In: Quarterly progress and status report: proceedings from Fonetik 2007, May 30-June 1, 2007, Stockholm: Department of Speech, Music and Hearing, KTH , 2007, 45-48 p.Conference paper (Other academic)
  • 34.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Carlson, Rolf
    KTH Speech, Music and Hearing.
    Interruption impossible2006In: Nordic Prosody: Proceedings of the IXth Conference, Lund 2004, Frankfurt am Main: Peter Lang Publishing Group, 2006, 97-105 p.Conference paper (Refereed)
  • 35.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Hirschberg, Julia
    Columbia University Computer Science.
    Pitch similarity in the vicinity of backchannels2010In: Proceedings Interspeech 2010, Makuhari, Japan: ISCA , 2010, 3054-3057 p.Conference paper (Refereed)
  • 36.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Hjalmarsson, Anna
    KTH Speech, Music and Hearing.
    Laskowski, Kornel
    KTH Speech, Music and Hearing.
    Very short utterances and timing in turn-taking2011In: Proceedings Interspeech 2011, Florence, Italy: ISCA , 2011, 2837-2840 p.Conference paper (Refereed)
  • 37.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Laskowski, Kornel
    KTH Speech, Music and Hearing.
    Pelcé, Antoine
    KTH Speech, Music and Hearing.
    Prosodic features in the vicinity of silences and overlaps2009In: Nordic Prosody: Proceedings of the Xth Conference, Helsinki 2008, Frankfurt am Main: Peter Lang , 2009, 95-105 p.Conference paper (Refereed)
  • 38.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Megyesi, Beáta
    KTH Speech, Music and Hearing.
    Exploring the prosody-syntax interface in conversations2003In: ceedings ICPhS 2003, Barcelona, Spain: ICPhS , 2003, 2501-2504 p.Conference paper (Refereed)
  • 39.
    Heldner, Mattias
    et al.
    KTH Speech, Music and Hearing.
    Strangert, Eva
    Umeå University.
    Temporal effects of focus in Swedish2001In: Journal of Phonetics, ISSN 0095-4470, E-ISSN 1095-8576, Vol. 29, no 3, 329-361 p.Article in journal (Refereed)
    Abstract [en]

    The four experiments reported concern the amount and domain of lengthening associated with focal accents in Swedish. Word, syllable and segment durations were measured in read sentences with focus in different positions. As expected, words with focal accents were longer than nonfocal words in general, but the amount of lengthening varied greatly, primarily due to speaker differences but also to position in the phrase and the word accent distinction. Most of the lengthening occurred within the stressed syllable. An analysis of the internal structure of stressed syllables showed that the phonologically long segments-whether vowels or consonants-were lengthened most, while the phonologically short vowels were hardly affected at all. Through this nonlinear lengthening, the contrast between long and short vowels in stressed syllables was sharpened in focus. Thus, the domain of focal accent lengthening includes at least the stressed syllable. Also, an unstressed syllable immediately to the right of the stressed one was lengthened in focus, while initial unstressed syllables, as well as unstressed syllables to the right of the first unstressed one, were not lengthened. Thus, we assume the domain of focal accent lengthening in Swedish to be restricted to the stressed syllable and the immediately following unstressed one.

  • 40.
    Heldner, Mattias
    et al.
    Umeå University, Department of Philosophy and Linguistics.
    Strangert, Eva
    Umeå University, Department of Philosophy and Linguistics.
    Deschamps, Thierry
    Umeå University, Department of Philosophy and Linguistics.
    A focus detector using overall intensity and high frequency emphasis1999In: Proceedings of the XIVth International Congress of Phonetic Sciences: San Francisco, 1-7 August 1999 / [ed] John J. Ohala, Berkeley, Calif: Linguistics department, Univ. of California , 1999, 1491-1493 p.Conference paper (Refereed)
  • 41.
    Heldner, Mattias
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Włodarczak, Marcin
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Pitch Slope and End Point as Turn-Taking Cues in Swedish2015In: Proceedings of the 18th International Congress of Phonetic Sciences / [ed] Maria Wolters, Judy Livingstone, Bernie Beattie, Rachel Smith, Mike MacMahon, Jane Stuart-Smith, Jim Scobbie, Glasgow: University of Glasgow , 2015Conference paper (Refereed)
    Abstract [en]

    This paper examines the relevance of parameters related to slope and end-point of pitch segments for indicating turn-taking intentions in Swedish. Perceptually motivated stylization in Prosogram was used to characterize the last pitch segment in talkspurts involved in floor-keeping and turn- yielding events. The results suggest a limited contribution of pitch pattern direction and position of its endpoint in the speaker’s pitch range to signaling turn-taking intentions in Swedish. 

  • 42.
    Horne, Merle
    et al.
    Lund University, Department of Linguistics and Phonetics.
    Strangert, Eva
    Umeå University, Department of Philosophy and Linguistics.
    Heldner, Mattias
    Umeå University, Department of Philosophy and Linguistics.
    Prosodic boundary strength in Swedish: Final lengthening and silent interval duration1995In: Proceedings ICPhS 95, Stockholm, Sweden: Stockholm University , 1995, 170-173 p.Conference paper (Refereed)
  • 43.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    A single-port non-parametric model of turn-taking in multi-party conversation2011In: Proceedings ICASSP 2011, Prague, Czech Republic: ICASSP , 2011, 5600-5603 p.Conference paper (Refereed)
  • 44.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    An instantaneous vector representation of delta pitch for speaker-change prediction in conversational dialogue systems2008In: Proceedings of ICASSP 2008, Las Vegas, Nevada, USA: ICASSP , 2008, 5041-5044 p.Conference paper (Refereed)
  • 45.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Incremental learning and forgetting in stochastic turn-taking models2011In: Proceedings Interspeech 2011, Florence, Italy: ISCA , 2011, 2069-2072 p.Conference paper (Refereed)
  • 46.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Machine learning of prosodic sequences using the fundamental frequency variation spectrum2008In: Proceedings of the Speech Prosody 2008 Conference, Campinas, Brazil: Editora RG/CNPq , 2008, 151-154 p.Conference paper (Refereed)
  • 47.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    The fundamental frequency variation spectrum2008In: Proceedings FONETIK 2008, Gothenburg: Göteborg University , 2008, 29-32 p.Conference paper (Other academic)
  • 48.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    A general-purpose 32 ms prosodic vector for hidden Markov modeling2009In: Proceedings Interspeech 2009, Brighton, UK: ISCA , 2009, 724-727 p.Conference paper (Refereed)
  • 49.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    KTH Speech, Music and Hearing.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    Exploring the prosody of floor mechanisms in English using the fundamental frequency variation spectrum2009In: Proceedings of EUSIPCO 2009, Glasgow, Scotland: ISCA , 2009, 2539-2543 p.Conference paper (Refereed)
    Abstract [en]

    A basic requirement for participation in conversation is the ability to jointly manage interaction, and to recognize the attempts of interlocutors to do same. Examples of management activity include efforts to acquire, re-acquire, hold, release, and acknowledge floor ownership, and they are often implemented using dedicated dialog act types. In this work, we explore the prosody of one class of such dialog acts, known as floor mechanisms, using a methodology based on a recently proposed representation of fundamental frequency variation. Models over the representation illustrate significant differences between floor mechanisms and other dialog act types, and lead to automatic detection accuracies in equal-prior test data of up to 75%. description of floor mechanism prosody. We note that this work is also the first attempt to compute and model FFV spectra for multiparty rather than two-party conversation, as well as the first attempt to infer dialogue structure from non-anechoic-chamber recordings.

  • 50.
    Laskowski, Kornel
    et al.
    KTH Speech, Music and Hearing.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Edlund, Jens
    KTH Speech, Music and Hearing.
    On the dynamics of overlap in multi-party conversation2012In: INTERSPEECH 2012: vol.1, Portland, USA: Curran Associates, Inc , 2012, 846-849 p.Conference paper (Refereed)
    Abstract [en]

    Overlap, although short in duration, occurs frequently in multi- party conversation. We show that its duration is approximately log-normal, and inversely proportional to the number of simul- taneously speaking parties. Using a simple model, we demon- strate that simultaneous talk tends to end simultaneously less frequently than in begins simultaneously, leading to an arrow of time in chronograms constructed from speech activity alone. The asymmetry is significant and discriminative. It appears to be due to dialog acts which do not carry propositional content, and those which are not brought to completion. 

12 1 - 50 of 68
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf