Classification of affect in speech using normalized time-frequency cepstra
2010 (English)In: Speech Prosody 2010, 2010, 100071-1-4 p.Conference paper (Refereed)
Subtle temporal and spectral differences between categorical realizations of para-linguistic phenomena (e.g., affective vocal expressions) are hard to capture and describe. In this paper we present a signal representation based on Time Varying Constant-Q Cepstral Coeffcients (TVCQCC) derived for this purpose. A method which utilizes the special properties of the constant Q-transform for mean F0 estimation and normalization is described. The coeffcients are invariant to segment length, and as a special case, a representation for prosody is considered. Speaker independent classifcation results using v-SVM with the Berlin EMO-DB and two closed sets of basic (anger, disgust, fear, happiness, sadness, neutral) and social/interpersonal (affection, pride, shame) emotions recorded by forty professional actors from two English dialect areas are reported. The accuracy for the Berlin EMO-DB is 71.2 %, and the accuracies for the first set including basic emotions was 44.6% and for the second set including basic and social emotions the accuracy was 31.7% . It was found that F0 normalization boosts the performance and a combined feature set shows the best performance.
Place, publisher, year, edition, pages
2010. 100071-1-4 p.
Research subject Psychology
IdentifiersURN: urn:nbn:se:su:diva-47381ISBN: 978-0-557-51931-6OAI: oai:DiVA.org:su-47381DiVA: diva2:373844
Speech Prosody 2010
FunderSwedish Research Council, 2006-1360
This work was partly funded by the Swedish Research Council under contract 2006-1360.2010-12-012010-12-012010-12-08Bibliographically approved