Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 52) Show all publications
Bruggeman, A., Włodarczak, M. & Wagner, P. (2025). A comparison of discrete and continuous prominence perception methods in German. Speech Communication, 168, Article ID 103165.
Open this publication in new window or tab >>A comparison of discrete and continuous prominence perception methods in German
2025 (English)In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 168, article id 103165Article in journal (Refereed) Published
Abstract [en]

In this paper we report on three methods to investigate syllable-based prominence identification on a set of German read sentences: prosodic expert annotations of pitch accentuation, as well as a Rapid Prosody Transcription (RPT) style task and a finger-tapping task performed by naive listeners. In the present study, audio recordings of the speech materials used to elicit prominence judgments were supplemented by signals collected with miniature accelerometers placed on the throat skin below the glottis, allowing for a more reliable investigation of the contribution of voice quality. Various other signal-based parameters are correlated with prominence judgments to confirm if findings from previous work on word-based prominence judgments also hold for judgments at the syllable level.Results replicate past findings for German and other languages: the presence and type of pitch accentuation are reliable predictors of prominence judgments by naive listeners, on both the RPT and tapping task. Various individual acoustic parameters such as f0, duration and intensity were once again found to systematically covary with greater perceived prominence. A direct comparison of tapping and RPT results moreover indicated that beyond pitch accent related factors, listeners may employ different strategies to judge prominence as a function of task. In the RPT, they rely more on their knowledge of whether a given syllable carries lexical stress, whereas in the tapping task, they attend relatively strongly to acoustic duration. It is also shown that voice quality varies along with prominence ratings, but less strongly than other features such as duration.

Keywords
German, Pitch accentuation, Prominence perception, RPT, Tapping task, Voice quality
National Category
Comparative Language Studies and Linguistics
Identifiers
urn:nbn:se:su:diva-241530 (URN)10.1016/j.specom.2024.103165 (DOI)001411452400001 ()2-s2.0-85214339158 (Scopus ID)
Available from: 2025-04-09 Created: 2025-04-09 Last updated: 2025-04-09Bibliographically approved
Włodarczak, M., Ludusan, B., Sundberg, J. & Heldner, M. (2025). Classification of voice quality using neck-surface acceleration: Comparison with glottal flow and radiated sound. Journal of Voice, 39(1), 10-24
Open this publication in new window or tab >>Classification of voice quality using neck-surface acceleration: Comparison with glottal flow and radiated sound
2025 (English)In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 39, no 1, p. 10-24Article in journal (Refereed) Published
Abstract [en]

Objectives: The aim of the present study is to investigate the usefulness of features extracted from miniature accelerometers attached to speaker's tracheal wall below the glottis for classification of phonation type. The performance of the accelerometer features is evaluated relative to features obtained from inverse filtered and radiated sound. While the former is a good proxy for the voice source, obtaining robust voice source features from the latter is considered difficult since it also contains information about the vocal tract filter. By contrast, the accelerometer signal is largely unaffected by the vocal tract and although it is shaped by subglottal resonances and the transfer properties of the neck tissue, these properties remain constant within a speaker. For this reason, we expect it to provide a better approximation of the voice source than the raw audio. We also investigate which aspects of the voice source are derivable from the accelerometer and microphone signals. Methods: Five trained singers (two females and three males) were recorded producing the syllable [pæ:] in three voice qualities (neutral, breathy and pressed) and at three pitch levels as determined by the participants’ personal preference. Features extracted from the three signals were used for classification of phonation type using a random forest classifier. In addition, accelerometer and microphone features with highest correlation with the voice source features were identified. Results: The three signals showed comparable classification error rates, with considerable differences across speakers both with respect to the overall performance and the importance of individual features. The speaker-specific differences notwithstanding, variation of phonation type had consistent effects on the voice source, accelerometer and audio signals. With regard to the voice source, AQ, NAQ, L1L2 and CQ all showed a monotonic variation along the breathy – neutral – pressed continuum. Several features were also found to vary systematically in the accelerometer and audio signals: HRF, L1L2 and CPPS (both the accelerometer and the audio), as well as the sound level (for the audio). The random forest analysis revealed that all of these features were also among the most important for the classification of voice quality. Conclusion: Both the accelerometer and the audio signals were found to discriminate between phonation types with an accuracy approaching that of the voice source. Thus, the accelerometer signal, which is largely uncontaminated by vocal tract resonances, offered no advantage over the signal collected with a normal microphone. 

Keywords
accelerometer, audio, phonation type classification, voice source
National Category
Natural Language Processing
Identifiers
urn:nbn:se:su:diva-212725 (URN)10.1016/j.jvoice.2022.06.034 (DOI)001414592600001 ()36028369 (PubMedID)2-s2.0-85136510333 (Scopus ID)
Available from: 2023-01-11 Created: 2023-01-11 Last updated: 2025-02-20Bibliographically approved
Wikse Barrow, C., Strömbergsson, S., Włodarczak, M. & Heldner, M. (2024). Individual variation in the realisation and contrast of Swedish children’s word-initial voiceless fricatives. Journal of Phonetics, 106, Article ID 101351.
Open this publication in new window or tab >>Individual variation in the realisation and contrast of Swedish children’s word-initial voiceless fricatives
2024 (English)In: Journal of Phonetics, ISSN 0095-4470, E-ISSN 1095-8576, Vol. 106, article id 101351Article in journal (Refereed) Published
Abstract [en]

In this study, we explore individual variation and contrast in Swedish children’s voiceless fricatives. Thirty-one children between three and eight years of age participated in a picture-prompted word repetition task, wherein they repeated fricative-initial words in a variety of vowel contexts. The fricatives were transcribed and acoustically analysed, using spectral moments 1–4, spectral peak and spectral balance measures. Random forests were used to estimate the relative importance of each spectral feature in the classification of correct fricative productions, as well as to measure robustness of the late-emerging contrast between sibilants [s] and [ɕ] in individual children. Transcription analysis revealed that substitutions involving a more anterior place of articulation were common. Acoustic analysis showed individual differences in variability and contrast in the children’s fricative systems across and within age groups. Cue weighting of spectral characteristics in classification was similar in all age groups for correct productions, while the magnitude of the acoustic contrast between sibilants increased with age. This paper provides a description of individual variation in Swedish children’s acquisition of fricatives which can inform future large-scale speech-acquisition research.

 

Keywords
Speech acquisition, Fricatives, Acoustic analysis, Speech-language development, Phonological development, Swedish
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:su:diva-231894 (URN)10.1016/j.wocn.2024.101351 (DOI)001288755000001 ()2-s2.0-85200234851 (Scopus ID)
Available from: 2024-07-03 Created: 2024-07-03 Last updated: 2025-01-15Bibliographically approved
Bruggeman, A., Schade, L., Włodarczak, M. & Wagner, P. (2022). Beware of the individual: Evaluating prominence perception in spontaneous speech. In: Proceedings of Speech Prosody 2022: . Paper presented at Speech Prosody 2022, Lisbon, Portugal, May 23-26, 2022.
Open this publication in new window or tab >>Beware of the individual: Evaluating prominence perception in spontaneous speech
2022 (English)In: Proceedings of Speech Prosody 2022, 2022Conference paper, Published paper (Refereed)
Abstract [en]

Much of the existing research on prominence perception has focused on read speech in American English and German. The present paper presents two experiments that build on and extend insights from these studies in two ways. Firstly, we elicit prominence judgments on spontaneous speech. Secondly, we investigate gradient rather than binary prominence judgments by introducing a finger tapping task. We additionally provide a within-participant comparison of gradient prominence results with binary prominence judgments to evaluate their correspondence. Our results show that participants exhibit different success rates in tapping the prominence pattern of spontaneous data, but generally tapping results correlate well with binary prominence judgments within individuals. Random forest analysis of the acoustic parameters involved shows that pitch accentuation and duration play important roles in both binary judgments and prominence tapping patterns. We can also confirm earlier findings from read speech that differences exist between participants in the relative importance rankings of various signal and systematic properties.

National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:su:diva-206632 (URN)10.21437/SpeechProsody.2022-55 (DOI)
Conference
Speech Prosody 2022, Lisbon, Portugal, May 23-26, 2022
Projects
Prosodic functions of voice quality dynamics
Funder
Swedish Research Council, 2019-02932
Available from: 2022-06-20 Created: 2022-06-20 Last updated: 2022-06-21Bibliographically approved
Włodarczak, M. & Heldner, M. (2022). Contribution of voice quality to prediction of turn-taking events. In: S. Frota, M. Cruz, & M. Vigário (Ed.), Proceedings of Speech Prosody 2022: . Paper presented at Speech Prosody 2022, Lisbon, Portugal (pp. 485-489).
Open this publication in new window or tab >>Contribution of voice quality to prediction of turn-taking events
2022 (English)In: Proceedings of Speech Prosody 2022 / [ed] S. Frota, M. Cruz, & M. Vigário, 2022, p. 485-489Conference paper, Published paper (Refereed)
Abstract [en]

This paper evaluates the contribution of acoustic voice quality measures to prediction of upcoming floor change and retention. In order to minimize the influence of vocal tract resonances, the measures were calculated from miniature accelerometers attached to the tracheal wall. Overall, speaker changes accom- panied by silence were characterized by lower periodicity and steeper spectral slope than turn-holds and speaker changes in- volving overlapping speech. When used on their own, voice quality features contributed to prediction of turn-taking category, this was particularly true of smoothed cepstral peak prominence (CPPS). At the same time, their importance was limited when used in combination with fundamental frequency and intensity, especially compared to the joint effect of these two predictors.

Keywords
spontaneous conversation, turn-taking, voice qual- ity, accelerometer
National Category
General Language Studies and Linguistics
Research subject
Phonetics
Identifiers
urn:nbn:se:su:diva-205271 (URN)10.21437/SpeechProsody.2022-99 (DOI)
Conference
Speech Prosody 2022, Lisbon, Portugal
Projects
Prosodic functions of voice quality dynamics
Funder
Swedish Research Council, 2019-02932
Available from: 2022-05-31 Created: 2022-05-31 Last updated: 2022-06-15Bibliographically approved
Wikse Barrow, C., Włodarczak, M., Thörn, L. & Heldner, M. (2022). Static and dynamic spectral characteristics of Swedish voiceless fricatives. Journal of the Acoustical Society of America, 152(5), 2588-2600
Open this publication in new window or tab >>Static and dynamic spectral characteristics of Swedish voiceless fricatives
2022 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 152, no 5, p. 2588-2600Article in journal (Refereed) Published
Abstract [en]

Descriptions of the acoustic characteristics of Swedish voiceless fricatives are scarce and are limited to static measures derived from the speech of a small number of speakers. The current study provides an updated acoustic description of the static (spectral, temporal, and intensity) characteristics of word-initial voiceless fricatives in Central Standard Swedish. In addition, temporal variation of spectral centre of gravity is modelled using a generalized additive mixed model. Results show that fricatives were differentiated in terms of spectral properties, duration, and intensity level, such that sibilant fricatives were generally longer and more intense than non-sibilant fricatives. Spectral centre of gravity differentiated between all places of articulation apart from labio-dental /f/. Gender differences were found for centre of gravity in /s/ but overall, sex/gender differences were small. Dynamic analyses revealed differences in curvature as well as overall level of spectral centre of gravity across the duration of the fricative, associated with place of articulation and mediated by vowel context, fricative duration, and speaker specific patterns. The results from the present study are valuable for future cross-linguistic research, and as reference for investigations concerning children's acquisition of Swedish voiceless fricatives.

Keywords
frikativor, svenska, akustisk analys
National Category
General Language Studies and Linguistics
Research subject
Phonetics
Identifiers
urn:nbn:se:su:diva-213790 (URN)10.1121/10.0014947 (DOI)36456287 (PubMedID)2-s2.0-85143184693 (Scopus ID)
Note

For erratum, see: Wikse Barrow, C. , Włodarczak, M. , Thörn, L. , and Heldner, M. Erratum: Static and dynamic spectral characteristics of Swedish voiceless fricatives J. Acoust. Soc. Am. 153, 1933 (2023) https://doi.org/10.1121/10.0017651

Available from: 2023-01-17 Created: 2023-01-17 Last updated: 2024-12-06Bibliographically approved
Ward, N., Kirkland, A., Włodarczak, M. & Székely, É. (2022). Two pragmatic functions of breathy voice in American English conversation. In: Proceedings of Speech Prosody 2022: . Paper presented at Speech Prosody 2022, Lisbon, Portugal, May 23-26, 2022.
Open this publication in new window or tab >>Two pragmatic functions of breathy voice in American English conversation
2022 (English)In: Proceedings of Speech Prosody 2022, 2022Conference paper, Published paper (Refereed)
Abstract [en]

Although the paralinguistic and phonological significance of breathy voice is well known, its pragmatic roles have been little studied. We report a systematic exploration of the pragmatic functions of breathy voice in American English, using a small corpus of casual conversations, using the Cepstral Peak Prominence Smoothed measure as an indicator of breathy voice, and using a common workflow to find prosodic constructions and identify their meanings. We found two prosodic constructions involving breathy voice. The first involves a short region of breathy voice in the midst of a region of low pitch, functioning to mark self-directed speech. The second involves breathy voice over several seconds, combined with a moment of wider pitch range leading to a high pitch over about a second, functioning to mark an attempt to establish common ground. These interpretations were confirmed by a perception experiment.

National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:su:diva-206630 (URN)10.21437/SpeechProsody.2022-17 (DOI)
Conference
Speech Prosody 2022, Lisbon, Portugal, May 23-26, 2022
Projects
Prosodic functions of voice quality dynamics
Funder
Swedish Research Council, 2019-02932
Available from: 2022-06-20 Created: 2022-06-20 Last updated: 2022-06-21Bibliographically approved
Ludusan, B., Wagner, P. & Włodarczak, M. (2021). Cue interaction in the perception of prosodic prominence: The role of voice quality.. In: Proceedings of Interspeech 2021: . Paper presented at Interspeech 2021, Brno, Czechia, August 30 - September 3, 2021.
Open this publication in new window or tab >>Cue interaction in the perception of prosodic prominence: The role of voice quality.
2021 (English)In: Proceedings of Interspeech 2021, 2021Conference paper, Published paper (Refereed)
Abstract [en]

Voice quality is an important dimension in human communication, used to mark a variety of phenomena in speech, including prosodic prominence. Even though numerous studies have shown that speakers modify their voice quality parameters for marking prosodic prominence, the impact of these modifications on perceived prominence is less studied. Our investigation looks at the effect of a well-known measure of voice quality, cepstral peak prominence (CPP), on syllabic prominence ratings given by both naive and expert listeners. Employing read speech materials in German, we quantify the role of CPP alone and in combination with other acoustic cues marking prominence, namely intensity, duration and fundamental frequency. While CPP, by itself, had a significant effect on the perceived prominence for most of the listeners, when used in conjunction with the other cues, its impact was reduced. Moreover, when assessing the importance of each of these four cues for determining the perceived prominence score we found important individual variation, as well as differences between naive and expert listeners.

National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:su:diva-206628 (URN)10.21437/Interspeech.2021-1357 (DOI)2-s2.0-85119201336 (Scopus ID)
Conference
Interspeech 2021, Brno, Czechia, August 30 - September 3, 2021
Projects
Prosodic functions of voice quality dynamics
Funder
Swedish Research Council
Available from: 2022-06-20 Created: 2022-06-20 Last updated: 2022-06-21Bibliographically approved
Gilmartin, E. & Włodarczak, M. (2021). Getting from A to B: Exploring floor state transitions in conversation. In: Proceedings of SemDial 2021: . Paper presented at SemDial 2021, Potsdam, Germany, September 20-22, 2021.
Open this publication in new window or tab >>Getting from A to B: Exploring floor state transitions in conversation
2021 (English)In: Proceedings of SemDial 2021, 2021Conference paper, Poster (with or without abstract) (Other academic)
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:su:diva-206634 (URN)
Conference
SemDial 2021, Potsdam, Germany, September 20-22, 2021
Projects
Prosodic functions of voice quality dynamics
Funder
Swedish Research Council, 2019-02932
Available from: 2022-06-20 Created: 2022-06-20 Last updated: 2022-06-21Bibliographically approved
Kirkland, A., Włodarczak, M., Gustafson, J. & Szekely, E. (2021). Perception of smiling voice in spontaneous speech synthesis. In: Proceedings of Speech Synthesis Workshop (SSW11): . Paper presented at Speech Synthesis Workshop (SSW11), Budapest, Hungary August 26-28, 2021.
Open this publication in new window or tab >>Perception of smiling voice in spontaneous speech synthesis
2021 (English)In: Proceedings of Speech Synthesis Workshop (SSW11), 2021Conference paper, Published paper (Refereed)
Abstract [en]

Smiling during speech production has been shown to result in perceptible acoustic differences compared to non-smiling speech. However, there is a scarcity of research on the perception of “smiling voice” in synthesized spontaneous speech. In this study, we used a sequence-to-sequence neural text-tospeech system built on conversational data to produce utterances with the characteristics of spontaneous speech. Segments of speech following laughter, and the same utterances not preceded by laughter, were compared in a perceptual experiment after removing laughter and/or breaths from the beginning of the utterance to determine whether participants perceive the utterances preceded by laughter as sounding as if they were produced while smiling. The results showed that participants identified the post-laughter speech as smiling at a rate significantly greater than chance. Furthermore, the effect of content (positive/neutral/negative) was investigated. These results show that laughter, a spontaneous, non-elicited phenomenon in our model’s training data, can be used to synthesize expressive speech with the perceptual characteristics of smiling.

National Category
Natural Language Processing
Identifiers
urn:nbn:se:su:diva-206627 (URN)10.21437/SSW.2021-19 (DOI)
Conference
Speech Synthesis Workshop (SSW11), Budapest, Hungary August 26-28, 2021
Projects
Prosodic functions of voice quality dynamicsPerception of speaker stance – using sponta- neous speech synthesis to explore the contribution of prosody, context and speakerConnected: context-aware speech synthesis for conversational AICAPTivating – Comparative Analysis of Public speaking with Text-to-speech
Funder
Swedish Research Council, 2019-02932
Available from: 2022-06-20 Created: 2022-06-20 Last updated: 2025-02-07Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3824-2980

Search in DiVA

Show all publications