Change search
Refine search result
1 - 31 of 31
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Cavalcanti, Julio Cesar
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    da Silva, Ronaldo Rodrigues
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Barbosa, Plinio A.
    Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks2024In: Frontiers in Artificial Intelligence, E-ISSN 2624-8212, Vol. 7, article id 1287877Article in journal (Refereed)
    Abstract [en]

    This study assessed the influence of speaker similarity and sample length on the performance of an automatic speaker recognition (ASR) system utilizing the SpeechBrain toolkit. The dataset comprised recordings from 20 male identical twin speakers engaged in spontaneous dialogues and interviews. Performance evaluations involved comparing identical twins, all speakers in the dataset (including twin pairs), and all speakers excluding twin pairs. Speech samples, ranging from 5 to 30 s, underwent assessment based on equal error rates (EER) and Log cost-likelihood ratios (Cllr). Results highlight the substantial challenge posed by identical twins to the ASR system, leading to a decrease in overall speaker recognition accuracy. Furthermore, analyses based on longer speech samples outperformed those using shorter samples. As sample size increased, standard deviation values for both intra and inter-speaker similarity scores decreased, indicating reduced variability in estimating speaker similarity/dissimilarity levels in longer speech stretches compared to shorter ones. The study also uncovered varying degrees of likeness among identical twins, with certain pairs presenting a greater challenge for ASR systems. These outcomes align with prior research and are discussed within the context of relevant literature.

    Download full text (pdf)
    fulltext
  • 2.
    Cavalcanti, Julio Cesar
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics. Campinas State University, Brazil.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Barbosa, Plinio A.
    Multiparametric Analysis of Speaking Fundamental Frequency in Genetically Related Speakers Using Different Speech Materials: Some Forensic Implications2024In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 38, no 1, p. 243.e11-243.e29Article in journal (Refereed)
    Abstract [en]

    Objective: To assess the speaker-discriminatory potential of a set of fundamental frequency estimates in intraidentical twin pair comparisons and cross-pair comparisons (i.e., among all speakers). Participants: A total of 20 Brazilian Portuguese speakers of the same dialect, namely 10 male identical twin pairs aged between 19 and 35, were recruited. Method: the participants were recorded directly through professional microphones while taking part in a spontaneous dialogue over mobile phones. Acoustic measurements were performed in connected speech samples, and in lengthened vowels, at least 160 ms long produced during spontaneous speech. Results: f0 baseline, central tendency, and extreme values were found mostly discriminatory in intra-twin pairand cross-pair comparisons. These were also the estimates displaying the largest effect sizes. Overall, only three identical twins were found statistically different regarding their f0 patterns in connected speech, but not for lengthened vowel-based f0 metrics. Estimates off 0 variation and modulation were found the least discriminatory across speakers, which may signal the control of speaking style and dialect on dynamic patterns off 0. Concerning system performance, the base value off 0 (f0 baseline) was found the most reliable metric, displaying the lowest equal error rate (EER). Conclusions: the outcomes suggest that, although identical twins were very closely related regarding their f0 patterns, some pairs could still be differentiated acoustically, only in connected speech. Such findings reinforce the relevance of analyzing long-term f0 metrics for speaker comparison purposes, with particular consideration to f0 baseline. Furthermore, f0 differences across subjects were suggested as more expressive in connected speech than in lengthened vowels.

    Download full text (pdf)
    fulltext
  • 3.
    Crochiquia, Alice
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Barbosa, Plinio A
    Madureira, Sandra
    Animated character profiles: The role of voice and lexical content2023In: Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023 / [ed] Radek Skarnitzl; Jan Volín, Prague, 2023, p. 466-470Conference paper (Refereed)
    Abstract [en]

    Our study examines the attribution of physical and psychosocial features to the characters of a dubbed animated film by comparing the results of a listening test experiment with the results of a reading test experiment. The aim of this study was to investigate the potential influence of lexical content on the perceptual evaluation of speaker characteristics. In the listening test, respondents were asked to listen to audio samples of dubbed dialogues in the film produced by voice actors and rate the characters on fourteen bipolar continuous scales. The task in the reading test was the same, except for the stimuli, which were transcriptions of the audio files. Results from both tests are congruent, as the character profiles based on the perception scores from each test are nearly identical for all characters. Differences are mainly due to the degree judgement scores are amplified by the characters' voice quality profiles.

  • 4.
    Cavalcanti, Julio Cesar
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Barbosa, Plinio A.
    University of Campinas.
    On the speaker discriminatory power asymmetry regarding acoustic-phonetic parameters and the impact of speaking style2023In: Frontiers in Psychology, E-ISSN 1664-1078, Vol. 14Article in journal (Refereed)
    Abstract [en]

    This study aimed to assess what we refer to as the speaker discriminatory power asymmetry and its forensic implications in comparisons performed in different speaking styles: spontaneous dialogues vs. interviews. We also addressed the impact of data sampling on the speaker's discriminatory performance concerning different acoustic-phonetic estimates. The participants were 20 male speakers, Brazilian Portuguese speakers from the same dialectal area. The speech material consisted of spontaneous telephone conversations between familiar individuals, and interviews conducted between each individual participant and the researcher. Nine acoustic-phonetic parameters were chosen for the comparisons, spanning from temporal and melodic to spectral acoustic-phonetic estimates. Ultimately, an analysis based on the combination of different parameters was also conducted. Two speaker discriminatory metrics were examined: Cost Log-likelihood-ratio (Cllr) and Equal Error Rate (EER) values. A general speaker discriminatory trend was suggested when assessing the parameters individually. Parameters pertaining to the temporal acoustic-phonetic class depicted the weakest performance in terms of speaker contrasting power as evidenced by the relatively higher Cllr and EER values. Moreover, from the set of acoustic parameters assessed, spectral parameters, mainly high formant frequencies, i.e., F3 and F4, were the best performing in terms of speaker discrimination, depicting the lowest EER and Cllr scores. The results appear to suggest a speaker discriminatory power asymmetryconcerning parameters from different acoustic-phonetic classes, in which temporal parameters tended to present a lower discriminatory power. The speaking style mismatch also seemed to considerably impact the speaker comparison task, by undermining the overall discriminatory performance. A statistical model based on the combination of different acoustic-phonetic estimates was found to perform best in this case. Finally, data sampling has proven to be of crucial relevance for the reliability of discriminatory power assessment.

    Download full text (pdf)
    fulltext
  • 5.
    Crochiquia, Alice
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Barbosa, Plinio A
    Madureira, Sandra
    A perceptual and acoustic study of dubbed voices in an animated film2022In: Proceedings of the 11th International Conference on Speech Prosody / [ed] Sónia Frota; Marisa Cruz; Marina Vigário, The International Speech Communication Association (ISCA), 2022, p. 565-569Conference paper (Refereed)
    Abstract [en]

    Listeners rely on speech vocal cues to judge speakers’ age, size, personality, and other paralinguistic and extralinguistic features. These judgements are often based on vocal stereotypes which may be universally or culturally determined. This study examines how physical, psychological, social, and vocal features are perceived by listeners and which acoustic features may influence their judgements. An experiment integrating a perceptual test and acoustic measurements was performed. The corpus consisted of speech utterances produced by five animated film characters, dubbed in Brazilian Portuguese. The stimuli were judged by 77 Brazilian Portuguese native speakers, 46 women and 31 men, aged 20 to 50. The acoustic analysis was performed automatically. Acoustic measures included mean f0, f0 baseline, spectral emphasis and H1-H2. For inter-rater agreement analysis, Cronbach's Alpha was chosen. The results indicated close agreements among judges for all characters. Overall scores obtained for all characters were above .90. In interpreting the results, the influence sound symbolism codes may have on listeners’ judgments and the factors influencing vocal stereotypes have been considered. The discussion of the acoustic and perceptual analysis results takes into consideration if voice actors adapt their voices to fit the characters or otherwise are cast because of their natural voice characteristics

  • 6.
    Cavalcanti, Julio Cesar
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Barbosa, Plinio A.
    University of Campinas - UNICAMP.
    Assessing the speaker discriminatory power asymmetry of different acoustic- phonetic parameters2022In: ISAPh 2022, 4th International Symposium on Applied Phonetics, Lund, Sweden, 2022Conference paper (Refereed)
    Abstract [en]

    This pilot study set out to assess the speaker discriminatory power asymmetry regarding parameters from different phonetic dimensions in spontaneous speech, i.e., spectral, melodic, and temporal. The speech material consisted of spontaneous telephone conversations between siblings. The participants were 20 male subjects, Brazilian Portuguese speakers from the same dialectal area. Six acoustic-phonetic parameters were chosen for the comparison: f0 median, f0 baseline, speech rate, articulation rate, F3, and F4. Overall, acoustic parameters pertaining to the speech tempo category depicted the worse performance in terms of speaker discriminatory power when assessed in isolation. Such a trend was indicated by the relatively higher median and mean Cllr and EER values. Moreover, from the set of parameters assessed, high formant frequencies, i.e., F3 and F4, were the best- performing estimates in terms of discriminability depicting the lowest EER and Cllr values. The results suggested a speaker discriminatory power asymmetry concerning different acoustic-phonetic parameters, in which speech tempo estimates presented a lower discriminatory power when compared to melodic and spectral parameters. The findings also suggest that data sampling is crucial for the reliability of Cllr and EER calculations.

    Download full text (pdf)
    fulltext
  • 7.
    Cavalcanti, Julio Cesar
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Barbosa, Plinio Almeida
    University of Campinas.
    Measuring the impact of data size on the speaker discriminatory performance: a spontaneous speech-based study2022In: 13th Nordic Prosody Conference: Applied and Multimodal Prosody Research / [ed] Oliver Niebuhr, Sonderborg, Denmark, 2022Conference paper (Refereed)
    Abstract [en]

    This study aimed to analyze the impact of the amount of data on the discriminatory performance of acoustic-phonetic parameters, some of which are frequently assessed in forensic speaker comparisons. Parameters from three distinct phonetic domains were considered, namely, spectral, melodic, and temporal, which were assessed separately within the same phonetic domain and in combination.

    The speech material consisted of spontaneous telephone conversations between two subjects. During the recording sessions, the participants were placed in different rooms, not directly seeing, hearing, or interacting with each other. The speakers were encouraged to start a conversation using a mobile phone while being simultaneously recorded.

    All recordings were carried out with a high resolution (44.1 kHz and 16-bit). Data segmentation and transcription were performed in the Praat software [1]. The participants were 20 male subjects, Brazilian Portuguese speakers from the same dialectal area. Their age ranged from 19 to 35 years, with a mean of 26.4 years. Although the subjects (10 identical twin pairs) were recruited from a twin research project, cf. [2, 3, 4], the focus here was comparisons among all speakers (i.e., 190 inter-speaker comparisons) rather than on individual twin pairs

    Download full text (pdf)
    fulltext
  • 8.
    Cavalcanti, Julio Cesar
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics. Campinas State University, Brazil.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Barbosa, Plinio A.
    Multi-parametric analysis of speech timing in inter-talker identical twin pairs and cross-pair comparisons: Some forensic implications2022In: PLOS ONE, E-ISSN 1932-6203, Vol. 17, no 1, article id e0262800Article in journal (Refereed)
    Abstract [en]

    The purpose of this study was to assess the speaker-discriminatory potential of a set of speech timing parameters while probing their suitability for forensic speaker comparison applications. The recordings comprised of spontaneous dialogues between twin pairs through mobile phones while being directly recorded with professional headset microphones. Speaker comparisons were performed with twins speakers engaged in a dialogue (i.e., intra-twin pairs) and among all subjects (i.e., cross-twin pairs). The participants were 20 Brazilian Portuguese speakers, ten male identical twin pairs from the same dialectal area. A set of 11 speech timing parameters was extracted and analyzed, including speech rate, articulation rate, syllable duration (V-V unit), vowel duration, and pause duration. Three system performance estimates were considered for assessing the suitability of the parameters for speaker comparison purposes, namely global Cllr, EER, and AUC values. These were interpreted while also taking into consideration the analysis of effect sizes. Overall, speech rate and articulation rate were found the most reliable parameters, displaying the largest effect sizes for the factor “speaker” and the best system performance outcomes, namely lowest Cllr, EER, and highest AUC values. Conversely, smaller effect sizes were found for the other parameters, which is compatible with a lower explanatory potential of the speaker identity on the duration of such units and a possibly higher linguistic control regarding their temporal variation. In addition, there was a tendency for speech timing estimates based on larger temporal intervals to present larger effect sizes and better speaker-discriminatory performance. Finally, identical twin pairs were found remarkably similar in their speech temporal patterns at the macro and micro levels while engaging in a dialogue, resulting in poor system discriminatory performance. Possible underlying factors for such a striking convergence in identical twins’ speech timing patterns are presented and discussed.

    Download full text (pdf)
    fulltext
  • 9.
    Cavalcanti, Julio Cesar
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics. Campinas State University, Brazil.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Barbosa, Plinio A.
    Acoustic analysis of vowel formant frequencies in genetically-related and non-genetically related speakers with implications for forensic speaker comparison2021In: PLOS ONE, E-ISSN 1932-6203, Vol. 16, no 2, article id e0246645Article in journal (Refereed)
    Abstract [en]

    The purpose of this study was to explore the speaker-discriminatory potential of vowel formant mean frequencies in comparisons of identical twin pairs and non-genetically related speakers. The influences of lexical stress and the vowels’ acoustic distances on the discriminatory patterns of formant frequencies were also assessed. Acoustic extraction and analysis of the first four speech formants F1-F4 were carried out using spontaneous speech materials. The recordings comprise telephone conversations between identical twin pairs while being directly recorded through high-quality microphones. The subjects were 20 male adult speakers of Brazilian Portuguese (BP), aged between 19 and 35. As for comparisons, stressed and unstressed oral vowels of BP were segmented and transcribed manually in the Praat software. F1-F4 formant estimates were automatically extracted from the middle points of each labeled vowel. Formant values were represented in both Hertz and Bark. Comparisons within identical twin pairs using the Bark scale were performed to verify whether the measured differences would be potentially significant when following a psychoacoustic criterion. The results revealed consistent patterns regarding the comparison of low-frequency and high-frequency formants in twin pairs and non-genetically related speakers, with high-frequency formants displaying a greater speaker-discriminatory power compared to low-frequency formants. Among all formants, F4 seemed to display the highest discriminatory potential within identical twin pairs, followed by F3. As for non-genetically related speakers, both F3 and F4 displayed a similar high discriminatory potential. Regarding vowel quality, the central vowel /a/ was found to be the most speaker-discriminatory segment, followed by front vowels. Moreover, stressed vowels displayed a higher inter-speaker discrimination than unstressed vowels in both groups; however, the combination of stressed and unstressed vowels was found even more explanatory in terms of the observed differences. Although identical twins displayed a higher phonetic similarity, they were not found phonetically identical.

    Download full text (pdf)
    fulltext
  • 10. Crochiquia, Alice
    et al.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Fontes, Mario A. S.
    Madureira, Sandra
    A phonetic study of Zootopia characters’ voices inBrazilian Portuguese dubbing: the role of stereotypes2020In: DELTA: Documentação de Estudos em Lingüística Teórica e Aplicada, ISSN 0102-4450, E-ISSN 1678-460X, Vol. 36, no 3Article in journal (Refereed)
    Abstract [en]

    This work comprises an experimental investigation approach of expressive speech that integrates methodological procedures of perceptual and acoustic analyses. As the object of this work, we have focused on voice quality and vocal dynamics. Speech samples from the four main personality-distinct characters in the animated feature film “Zootopia” dubbed by Brazilian voice actors have been analysed. Due to the expressive function of voice quality, we have posed the following question: what types of voice quality and vocal dynamics settings were used by the voice actors in the Brazilian dubbing of “Zootopia” to compose the vocal profiles of the characters? Perceptual evaluation of the 54 speech stimuli was performed using the Vocal Profile Analysis protocol (Laver & Mackenzie Beck, 2007). Acoustic measures were automatically extracted using the Expression Evaluator script (Barbosa, 2008) for PRAAT. The profiles for each of the four characters were composed based on the psychological traits described in the film script. The results of the acoustic analysis, the perceptual analysis of voice quality and vocal dynamics settings were correlated using the MFA (Multiple Factor Analysis) method in the R environment based on 40 variables (quantitative and qualitative) and it turned out that the speech stimuli were distributed in 6 clusters according to the variables analysed. The quantitative variables that presented the highest correlation percentage were: Standard Deviation of f0 Derivative, Standard Deviation of Spectral Tilt, f0 Median. The qualitative variables that presented the highest correlation percentage were: Lowered Larynx, Lip Rounding, Breathy Voice and Minimised Pitch Range. The research has presented evidence in favor of the symbolic use of phonic matter and contributions to the understanding of how vocal stereotypes are established.

  • 11.
    Eriksson, Anders
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Šimko, Juraj
    University of Helsinki, Finland.
    Antti, Suni
    University of Helsinki, Finland.
    Vainio, Martti
    University of Helsinki, Finland.
    Rosalba, Nodari
    Scuola Normale Superiore di Pisa, Italy.
    Lexical stress perception as a function of acoustic properties and the native language of the listener2020In: Speech prosody, ISSN 2333-2042, p. 449-453Article in journal (Refereed)
    Abstract [en]

    The study is part of a series investigating production and perception of lexical stress in a number of languages including Brazilian Portuguese, English, Estonian, French, Italian and Swedish. The production database contains data representing male and female speakers in the above languages in three speaking styles – spontaneous speech, phrase reading, and wordlist reading. Keywords from these recordings, representing male and female speakers and all speaking styles are used. The participants’ task is to judge the relative syllable prominences of the keywords presented one by one. In a previous study, subjects were native Swedish speakers. In the present study subjects are native speakers of Italian. In the analyses, perception results are correlated with acoustic variables shown to be important in the production studies. From the previous perception study we know that acoustic syllable prominence affects perceived syllable prominence. But there is also a possibility that listeners’ perception may be biased by expectations based on the listeners’ native language. The main result is that there are great similarities between the Swedish and Italian listeners in the way acoustic prominence affects perceived prominence, but we are also able to demonstrate a case of native language bias.

  • 12. Arantes, Pablo
    et al.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Quantifying Fundamental Frequency Modulation as a Function of Language, Speaking Style and Speaker2019In: Interspeech 2019, Graz: The International Speech Communication Association (ISCA), 2019Conference paper (Refereed)
    Abstract [en]

    In this study, we outline a methodology to quantify the degree of similarity between pairs of f0 distributions based on the Anderson-Darling measure that underlies its namesake goodness-of-fit test. The procedure emphasizes differences due to more fine-grained f0 modulations rather than differences in measures of central tendency, such as the mean and median. In order to assess the procedure’s usefulness for speaker comparison, we applied it to a multilingual corpus in which participants contributed speech delivered in three speaking styles. The similarity measure was calculated separately as function of speaking style and speaker. Between-speaker variability (different speakers, same style) in distribution similarity varied significantly between styles — spontaneous interview shows greater variability than read sentences and word list in five languages (English, French, Italian, Portuguese and Swedish); in Estonian and German, read sentences yield more variability. Within-speaker variability (same speaker, different styles) levels are lower than between-speaker in the style that exhibit the greatest variability. The results point to the potential use of the proposed methodology as a way to identify possible idiosyncratic traits in f0 distributions. Also, they further demonstrate the effect of speaking styles on intonation patterns.

  • 13.
    Arantes, Pablo
    et al.
    São Carlos Federal University, Brazil.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Lima, Verônica
    Stockholm University, Faculty of Humanities, Department of Linguistics. São Carlos Federal University, Brazil.
    Minimum Sample Length for the Estimation of Long-term Speaking Rate2018In: Speech prosody, ISSN 2333-2042, p. 661-665Article in journal (Refereed)
    Abstract [en]

    In this study, we expand on previous experiments designed with the aim of determining the minimum length that an audio sample should have in order for the speaking rate derived from it to be representative of the sample as a whole. We compare two different approaches to establishing that the time series of the cumulative speaking rate calculated over the audio sample has reached stability. We also compare the effect on stabilization time of four other factors that may affect the way speaking rate is calculated. The results show that all factors tested have significant effects, although of limited practical concern. Overall, average stability time is 12.1 seconds, with the bulk of the distribution lying between 7.9 and 16.2 s.

  • 14.
    Eriksson, Anders
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Antti, Suni
    University of Helsinki, Finland.
    Vainio, Martti
    University of Helsinki, Finland.
    Šimko, Juraj
    University of Helsinki, Finland.
    The acoustic basis of lexical stress perception2018In: Speech prosody, ISSN 2333-2042, p. 70-74Article in journal (Refereed)
    Abstract [en]

    The present study is the first in a series of studies exploring the perception of lexical stress in a number of languages. As stimuli, key words extracted from recordings in Brazilian Portuguese, English, Estonian, French, Italian and Swedish are used. The data represent male and female speakers in all languages and three different speaking styles – spontaneous speech, phrase reading, and wordlist reading. The ultimate goal of the perception studies is to explore the perception of prominence as a function of the acoustic properties of the stimuli and the native language of the listeners. In this paper we compare the prominence scores assigned to syllables by 44 native Swedish speakers with two automatic methods: acoustic feature analysis using acoustic properties of syllables and continuous wavelet transform. Both methods use duration, F0 and spectral emphasis characteristics of speech signal or a subset thereof. Our results demonstrate a strong language dependency of the way acoustic characteristics correlate with prominence. Correlations between prominence scores and phonological word stress patterns show that the human raters resolve this language-dependency better than the automatic signal-based methods. Also, the signal feature combinations for which the raters’ judgements correlate best with the automatically assigned prominence scores depend on stimulus language to a larger extent that on the signal-based method used.

  • 15. Arantes, Pablo
    et al.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Gutzeit, Suska
    Effect of Language, Speaking Style and Speaker on Long-Term F0 Estimation2017In: Interspeech, ISSN 2308-457X, p. 3897-3901Article in journal (Refereed)
    Abstract [en]

    In this study, we compared three long-term fundamental frequency estimates — mean, median and base value — with respect to how fast they approach a stable value, as a function of language, speaking style and speaker. The base value concept was developed in search for an f0 value which should be invariant under prosodic variation. It has since also been tested in forensic phonetics as a possible speaker-specific f0 value. Data used in this study — recorded speech by male and female speakers in seven languages and three speaking styles, spontaneous, phrase reading and word list reading — had been recorded for a previous project. Average stabilisation times for the mean, median and base value are 9.76, 9.67 and 8.01 s. Base values stabilise significantly faster. Languages differ in both average and variability of the stabilisation times. Values range from 7.14 to 11.41 (mean), 7.5 to 11.33 (median) and 6.74 to 9.34 (base value). Spontaneous speech yields the most variable stabilisation times for the three estimators in Italian and Swedish, for the median in French and Portuguese and base value in German. Speakers within each language do not differ significantly in terms of stabilisation time variability for the three estimators.

     

  • 16. Skarnitzl, Radek
    et al.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    The Acoustics of Word Stress in Czech as a Function of Speaking Style2017In: Proceedings of Interspeech 2017 / [ed] Francisco Lacerda, David House, Mattias Heldner, Joakim Gustafson, Sofia Strömbergsson, Marcin Włodarczak, The International Speech Communication Association (ISCA), 2017, p. 3221-3225Conference paper (Refereed)
    Abstract [en]

    The study is part of a series of studies which examine the acoustic correlates of lexical stress in several typologically different languages, in three speech styles: spontaneous speech, phrase reading, and wordlist reading. This study focuses on Czech, a language with stress fixed on the first syllable of a prosodic word, with no contrastive function at the level of individual words. The acoustic parameters examined here are F0-level, F0-variation, Duration, Sound Pressure Level, and Spectral Emphasis. Values for over 6,000 vowels were analyzed.

    Unlike the other languages examined so far, lexical stress in Czech is not manifested by clear prominence markings on the first, stressed syllable: the stressed syllable is neither higher, realized with greater F0 variation, longer; nor does it have a higher SPL or higher Spectral Emphasis. There are slight, but insignificant tendencies pointing to a delayed rise, that is, to higher values of some of the acoustic parameters on the second, post-stressed syllable. Since lexical stress does not serve a contrastive function in Czech, the absence of acoustic marking on the stressed syllable is not surprising.

    Download full text (pdf)
    fulltext
  • 17.
    Eriksson, Anders
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Bertinetto, Pier Marco
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Nodari, Rosalba
    Lenoci, Giovanna
    The Acoustics of Lexical Stress in Italian as a Function of Stress Level and Speaking Style2016In: Proceedings of Interspeech 2016 / [ed] Nelson Morgan, The International Speech Communication Association (ISCA), 2016, p. 1059-1063Conference paper (Refereed)
    Abstract [en]

    The study is part of a series of studies, describing the acoustics of lexical stress in a way that should be applicable to any language. The present database of recordings includes Brazilian Portuguese, English, Estonian, German, French, Italian and Swedish. The acoustic parameters examined are F0-level, F0- variation, Duration, and Spectral Emphasis. Values for these parameters, computed for all vowels (a little over 24000 vowels for Italian), are the data upon which the analyses are based. All parameters are examined with respect to their correlation with Stress (primary, secondary, unstressed) and speaking Style (wordlist reading, phrase reading, spontaneous speech) and Sex of the speaker (female, male). For Italian Duration was found to be the dominant factor by a wide margin, in agreement with previous studies. Spectral Emphasis was the second most important factor. Spectral Emphasis has not been studied previously for Italian but intensity, a related parameter, has been shown to correlate with stress. F0-level was also significantly correlated but not to the same degree. Speaker Sex turned out as significant in many comparisons. The differences were, however, mainly a function of the degree to which a given parameter was used, not how it was used to signal lexical stress contrasts. 

    Download full text (pdf)
    fulltext
  • 18.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Syllable prominence: An experimental study2015In: Lingue e linguaggio, ISSN 1720-9331, Vol. XIV, no 1, p. 43-60Article in journal (Refereed)
    Abstract [en]

    There are many studies of word stress (or lexical stress) in different languages. One problem if one wants to compare the acoustics of word stress in different languages is that the studies are often made in such a way that the results are not immediately comparable. One goal of the project described here is to develop a framework for analysing the acoustics of word stress that can be applied in the same way to any language. A second goal is to examine the perception of syllable prominence as a cue to lexical stress perception. The acoustic properties are obviously a factor to be considered, but we have reasons to believe, based on results from a previous experiment (Eriksson et al. 2002), that the native language of the listener may also influence perceived prominence and thus lexical stress perception. The languages included in the study so far are Brazilian Portuguese, English, Estonian, French, German, Italian and Swedish. At present only the Swedish material has been analysed using the complete set of recordings. In this paper I will therefore only give a full presentation of the Swedish result. Results based on subsets of the data from the other languages (usually 10 speakers) will be referred to as “preliminary results”. Some of these results have been presented in more detail in conference proceedings (see references).

  • 19.
    Eriksson, Anders
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    The acoustics of word stress in English as a function of stress level and speaking style2015In: 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015): Speech Beyond Speech Towards a Better Understanding of the Most Important Biosignal, 2015, p. 41-45Conference paper (Refereed)
    Abstract [en]

    This study of lexical stress in English is part of a series of studies, the goal of which is to describe the acoustics of lexical stress for a number of typologically different languages. When fully developed the methodology should be applicable to any language. The database of recordings so far includes Brazilian Portuguese, English (U.K.), Estonian, German, French, Italian and Swedish. The acoustic parameters examined are f0-level, f0-variation, Duration, and Spectral Emphasis. Values for these parameters, computed for all vowels, are the data upon which the analyses are based. All parameters are tested with respect to their correlation with stress level (primary, secondary, unstressed) and speaking style (wordlist reading, phrase reading, spontaneous speech). For the English data, the most robust results concerning stress level are found for Duration and Spectral Emphasis. f0-level is also significantly correlated but not quite to the same degree. The acoustic effect of phonological secondary stress was significantly different from primary stress only for Duration. In the statistical tests, speaker sex turned out as significant in most cases. Detailed examination showed, however, that the difference was mainly in the degree to which a given parameter was used, not how it was used to signal lexical stress contrasts. 

    Download full text (pdf)
    fulltext
  • 20. Arantes, Pablo
    et al.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Temporal stability of long-term measures of fundamental frequency2014In: Social and Linguistic Speech Prosody: Proceedings of the 7th international conference on Speech Prosody / [ed] Nick Campbell; Daniel Hirst; Dafydd Gibbon, Speech Prosody Special Interest Group (SProSIG) , 2014, p. 1149-1152Conference paper (Refereed)
    Abstract [en]

    We investigated long-term mean, median and base value of F0 to estimate how long it takes their variability to stabilize. Change point analysis was used to locate stabilization points. In one experiment, stabilization points were calculated in recordings of the same text spoken in 26 languages. Average stabilization points are 5 seconds for base value and 10 seconds for mean and median. Variance after the stabilization point was reduced around 40 times for mean and median and more than 100 times for the base value. In another experiment, four speakers read two different texts each. Stabilization points for the same speaker across the texts do not exactly coincide as would be ideally expected. Average change point dislocation is 2.5 seconds for the base value, 3.4 for the median and 9.5 for the mean. After stabilization, individual differences in the three measures obtained from the two texts are 2% on average. Present results show that stabilization points in long-term measures of F0 occur earlier than suggested in the previous literature.

    Download full text (pdf)
    fulltext
  • 21.
    Lacerda, Francisco
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Eriksson, Anders
    Institutionen för lingvistik, Götebors universitet.
    Reportage om lögndetektorer (SVT, Vetenskapmagasinet)2009Other (Other (popular science, discussion, etc.))
    Abstract [en]

    I Storbritannien har försäkringskassan börjat använda en ny lögndetektor. Vetenskapsmagasinet träffar två svenska forskare som hävdar att tekniken är rent nonsens. Företaget som utvecklat den nya lögndetektorn har svarat med att stämma forskarna för förtal.

  • 22. Ibsén, Maria
    et al.
    Gustavsson, Lisa
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Forensisk lingvistik: Lite CSI i verkligheten2008Other (Other (popular science, discussion, etc.))
  • 23.
    Eriksson, Anders
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics. Fonetik.
    Lacerda, Francisco
    Stockholm University, Faculty of Humanities, Department of Linguistics. Fonetik.
    Charlatanry in forensic speech science: A problem to be taken seriously2007In: International Journal of Speech, Language and the Law: (formerly Forensic Linguistics: ISSN 1350-1771), ISSN 1748-8885, Vol. 14, no 2, p. 169-193Article in journal (Refereed)
    Abstract [en]

    A lie detector which can reveal lie and deception in some automatic and perfectly reliable way is an old idea we have often met with in science fiction books and comic strips. This is all very well. It is when machines claimed to be lie detectors appear in the context of criminal investigations or security applications that we need to be concerned. In the present paper we will describe two types of ‘deception’ or ‘stress detectors’ (euphemisms to refer to what quite clearly are known as ‘lie detectors’). Both types of detection are claimed to be based on voice analysis but we found no scientific evidence to support the manufacturers’ claims. Indeed, our review of scientific studies will show that these machines perform at chance level when tested for reliability. Given such results and the absence of scientific support for the underlying principles it is justified to view the use of these machines as charlatanry and we argue that there are serious ethical and security reasons to demand that responsible authorities and institutions should not get involved in such practices.

  • 24.
    Traunmüller, Hartmut
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics. Fonetik.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics. Fonetik.
    Ménard, Lucie
    Perception of speaker age, sex and vowel quality inves­tigated using stimuli produced with an articulatory model2003In: Proceedings of the XVth ICPhS, 2003, p. 1739-1742Conference paper (Other academic)
    Abstract [en]

    This paper deals with the perception of linguistic and paralinguistic qualities conveyed by synthetic vowels produced with an articulatory model in which transfer func­tions of the French vowels /i y e ø E œ/ characteristic of five growth stages were each combined with five different F0 values. Listeners had to judge the speaker's age and sex in addition to vowel quality. Four subgroups of listeners were distinguished, according to sex and frequency of contact with children. The results were subjected to regression analy­sis based on ctitical band rate (z) and logarithmic values of F0, F1 to F5 and calculated values of F2’. This showed (Z1 -0.6 Z0) to correlate highly with vowel openness and 0.8 (Z4 -Z3) with roundedness in addition to Z2'. F0 and the formants above F1 contributed equally to age percep­tion. There were slight but significant differences between listener groups and there was a tendency to perceive vowels as produced by a younger speaker when perceived as rounded - older when not. This can be understood as due to a choice listeners have in interpreting lower formants as due to liprounding or a permanently longer vocal tract indicative of a higher age.

  • 25.
    Eriksson, Anders
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Grabe, Esther
    Traunmüller, Hartmut
    Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics.
    Perception of syllable prominence by listeners with and without competence in the tested language2002In: Proceedings of the Speech Prosody 2002 Conference, 2002, p. 275-278Conference paper (Other academic)
    Abstract [sv]

    In an experiment reported previously, subjects rated perceived syllable prominence in a Swedish utterance produced by ten speakers at various levels of vocal effort. The analysis showed that about half of the variance could be accounted for by acoustic factors. Slightly more than half could be accounted for by linguistic factors. Here, we report two additional ex-periments. In the first, we attempted to eliminate the linguistic factors by repeating the Swedish listening experiment with English listeners who had no knowledge of Swedish. In the second, we investigated the prominence pattern Swedish sub-jects expect by presenting the utterance only in written form. The results from these subjects and from the Swedish listeners were very similar but for two of the syllables where the promi-nence pattern did not coincide with the expectations of the readers. Swedish and English listeners perceived the promi-nence of the syllables to be almost identical in most cases, but where there was a conflict between expected and produced prominence, the Swedish listeners appeared to be influenced by their expectations. There was also a difference in the weights the Swedish and English listeners attached to different acoustic cues in the listening experiments.

  • 26.
    Eriksson, Anders
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Traunmüller, Hartmut
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Perception of vocal effort and distance from the speaker on the basis of vowel utterances.2002In: Percept Psychophys, ISSN 0031-5117, Vol. 64, no 1, p. 131-9Article in journal (Refereed)
    Abstract [en]

    The sound pressure level of vowels reflects several non-linguistic and linguistic factors: distance from the speaker, vocal effort, and vowel quality. Increased vocal effort also involves an emphasis of higher frequency components and increases in F0 and F1. This should allow listeners to distinguish it from decreased distance, which does not have these additional effects. It is shown that listeners succeed in doing so on the basis of single vowels if phonated, but not if whispered. The results agree with a theory according to which listeners demodulate speech signals and evaluate the properties of the carrier signal, which reflects most of the para- and extra-linguistic information, apart from those of its linguistic modulation. It is observed that listeners allow for between-vowel variation, but tend to substantially underestimate changes in both kinds of distance.

  • 27.
    Eriksson, Anders
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Thunberg, Gunilla C.
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Traunmüller, Hartmut
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Syllable prominence: A matter of vocal effort, phonetic distinctness and top-down processing2001In: Proceedings of EuroSpeech-2001, 2001, p. 399-402Conference paper (Other academic)
    Abstract [en]

    In this experiment, subjects had to rate the "prominence" of each of the syllables of 20 versions of the same utterance produced by men, women and children at various levels of vocal effort. The ratings were correlated with measurements of the SPL of the fundamental, spectral emphasis, vowel duration, F0max and F0 rise from the previous syllable. Together with ratings of the perceived vocal effort at which the utterances had been produced, these measurements were used to obtain the possible contributions of vocal effort, prosodic distinctness, and vowel duration to the perceived prominence. Together, these accounted for half of the variance. This was compared with the possible contribution of the linguistic structure of the utterance, which accounted for slightly more of the variance. The predictions of a model based on this analysis came closer to the mean than the average subject.

  • 28.
    Traunmüller, Hartmut
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Acoustic effects of variation in vocal effort by men, women, and children.2000In: J Acoust Soc Am, ISSN 0001-4966, Vol. 107, no 6, p. 3438-51Article in journal (Refereed)
    Abstract [en]

    The acoustic effects of the adjustment in vocal effort that is required when the distance between speaker and addressee is varied over a large range (0.3-187.5 m) were investigated in phonated and, at shorter distances, also in whispered speech. Several characteristics were studied in the same sentence produced by men, women, and 7 year-old boys and girls: duration of vowels and consonants, pausing and occurrence of creaky voice, mean and range of F0, certain formant frequencies (F1 in [a] and F3), SPL of voiced segments and [s], and spectral emphasis. In addition to levels and emphasis, vowel duration, F0, and F1 were substantially affected. “Vocal effort” was defined as the communicational distance estimated by a group of listeners for each utterance. Most of the observed effects correlated better with this measure than with the actual distance, since some additional factors affected the speakers’ choice. Differences between speaker groups emerged in segment durations, pausing behavior, and in the extent to which the SPL of [s] was affected. The whispered versions are compared with the phonated versions produced by the same speakers at the same distance. Several effects of whispering are found to be similar to those of increasing vocal effort.

  • 29.
    Engstrand, Olle
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics. fonetik.
    Björsten, Sven
    Stockholm University, Faculty of Humanities, Department of Linguistics. fonetik.
    Lindblom, Björn
    Stockholm University, Faculty of Humanities, Department of Linguistics. fonetik.
    Bruce, Gösta
    Eriksson, Anders
    Hur udda är Viby-i? Experimentella och typologiska observationer2000In: Folkmålsstudier, Vol. 39, p. 83-95Article in journal (Other academic)
  • 30.
    Traunmüller, Hartmut
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Eriksson, Anders
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    The perceptual evaluation of F0 excursions in speech as evidenced in liveliness estimations.1995In: J Acoust Soc Am, ISSN 0001-4966, Vol. 97, no 3, p. 1905-15Article in journal (Refereed)
    Abstract [en]

    In order to learn how listeners evaluate F0 excursions, a set of experiments was performed in which subjects had to estimate the liveliness of utterances. The stimuli were obtained by LPC analysis of one natural utterance that was modified by resynthesizing F0 , the formant frequencies, and the time scale in order to simulate some of the natural extra- and paralinguistic variations that affect F0 and/or liveliness, namely the speaker's age, sex, articulation rate, and voice register. In each case, the extent of the F0 excursions was varied in seven steps. The results showed that, as long as the stimuli appeared to have been produced in the modal register (of men, women, and children), listeners judged F0 intervals to be equivalent if they were equal in semitones. When the voice register was shifted without adjustment in articulation , listeners appeared to judge the F0 excursions in relation to the spectral space available below F1 . The liveliness ratings were found to be strongly dependent on articulation rate and to be affected by the perceived age of the speaker which, with the manipulated stimuli used here, turned out to be significantly affected by the sex of the listener.

  • 31.
    Traunmüller, Hartmut
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics. Fonetik.
    Eriksson, Anders
    The frequency range of the voice fundamental in the speech of male and female adults1993Other (Other academic)
    Abstract [en]

    Published data on the frequency of the voice fundamental (F0) in speech show its range of variation, often expressed in terms of two standard deviations (SD) of the F0-distribution, to be approximately the same for men and women if expressed in semitones, but the observed SD varies substantially between different investigations. Most of the differences can be attributed to the following factors: SD is increased in tone languages and it varies with the type of discourse. The more ‘lively’ the type of discourse, the larger it is. The dependence of SD on the type of discourse tends to be mom pronounced in the speech of women than of men. Based on an analysis of various production data A is shown that speakers normally achieve an increased SD by increasing the excursions of F0 from a ‘base-value’ that lies about 1.5 SD below their mean F0. This is relevant to applications in speech technology as well as to general theories of speech communication such as the ‘modulation theory’ in which the base-value of F0 is seen as a carrier frequency.

1 - 31 of 31
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf