Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Perception of smiling voice in spontaneous speech synthesis
Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.ORCID iD: 0000-0003-3824-2980
2021 (English)In: Proceedings of Speech Synthesis Workshop (SSW11), 2021Conference paper, Published paper (Refereed)
Abstract [en]

Smiling during speech production has been shown to result in perceptible acoustic differences compared to non-smiling speech. However, there is a scarcity of research on the perception of “smiling voice” in synthesized spontaneous speech. In this study, we used a sequence-to-sequence neural text-tospeech system built on conversational data to produce utterances with the characteristics of spontaneous speech. Segments of speech following laughter, and the same utterances not preceded by laughter, were compared in a perceptual experiment after removing laughter and/or breaths from the beginning of the utterance to determine whether participants perceive the utterances preceded by laughter as sounding as if they were produced while smiling. The results showed that participants identified the post-laughter speech as smiling at a rate significantly greater than chance. Furthermore, the effect of content (positive/neutral/negative) was investigated. These results show that laughter, a spontaneous, non-elicited phenomenon in our model’s training data, can be used to synthesize expressive speech with the perceptual characteristics of smiling.

Place, publisher, year, edition, pages
2021.
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:su:diva-206627DOI: 10.21437/SSW.2021-19OAI: oai:DiVA.org:su-206627DiVA, id: diva2:1672997
Conference
Speech Synthesis Workshop (SSW11), Budapest, Hungary August 26-28, 2021
Projects
Prosodic functions of voice quality dynamicsPerception of speaker stance – using sponta- neous speech synthesis to explore the contribution of prosody, context and speakerConnected: context-aware speech synthesis for conversational AICAPTivating – Comparative Analysis of Public speaking with Text-to-speech
Funder
Swedish Research Council, 2019-02932Available from: 2022-06-20 Created: 2022-06-20 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Włodarczak, Marcin

Search in DiVA

By author/editor
Włodarczak, Marcin
By organisation
Phonetics
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 187 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf