Change search
Link to record
Permanent link

Direct link
Eriksson, Anders, ProfessorORCID iD iconorcid.org/0000-0002-6844-4834
Alternative names
Publications (10 of 32) Show all publications
Cavalcanti, J. C., da Silva, R. R., Eriksson, A. & Barbosa, P. A. (2024). Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks. Frontiers in Artificial Intelligence, 7, Article ID 1287877.
Open this publication in new window or tab >>Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks
2024 (English)In: Frontiers in Artificial Intelligence, E-ISSN 2624-8212, Vol. 7, article id 1287877Article in journal (Refereed) Published
Abstract [en]

This study assessed the influence of speaker similarity and sample length on the performance of an automatic speaker recognition (ASR) system utilizing the SpeechBrain toolkit. The dataset comprised recordings from 20 male identical twin speakers engaged in spontaneous dialogues and interviews. Performance evaluations involved comparing identical twins, all speakers in the dataset (including twin pairs), and all speakers excluding twin pairs. Speech samples, ranging from 5 to 30 s, underwent assessment based on equal error rates (EER) and Log cost-likelihood ratios (Cllr). Results highlight the substantial challenge posed by identical twins to the ASR system, leading to a decrease in overall speaker recognition accuracy. Furthermore, analyses based on longer speech samples outperformed those using shorter samples. As sample size increased, standard deviation values for both intra and inter-speaker similarity scores decreased, indicating reduced variability in estimating speaker similarity/dissimilarity levels in longer speech stretches compared to shorter ones. The study also uncovered varying degrees of likeness among identical twins, with certain pairs presenting a greater challenge for ASR systems. These outcomes align with prior research and are discussed within the context of relevant literature.

Keywords
speech analysis, phonetics, acoustic-phonetics, forensic phonetics, automatic speaker recognition
National Category
Natural Language Processing
Research subject
Phonetics; Computer Science
Identifiers
urn:nbn:se:su:diva-226393 (URN)10.3389/frai.2024.1287877 (DOI)001169098900001 ()2-s2.0-85185478728 (Scopus ID)
Available from: 2024-02-08 Created: 2024-02-08 Last updated: 2025-02-07Bibliographically approved
Cavalcanti, J. C., Eriksson, A. & Barbosa, P. A. (2024). Multiparametric Analysis of Speaking Fundamental Frequency in Genetically Related Speakers Using Different Speech Materials: Some Forensic Implications. Journal of Voice, 38(1), 243.e11-243.e29
Open this publication in new window or tab >>Multiparametric Analysis of Speaking Fundamental Frequency in Genetically Related Speakers Using Different Speech Materials: Some Forensic Implications
2024 (English)In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 38, no 1, p. 243.e11-243.e29Article in journal (Refereed) Published
Abstract [en]

Objective: To assess the speaker-discriminatory potential of a set of fundamental frequency estimates in intraidentical twin pair comparisons and cross-pair comparisons (i.e., among all speakers). Participants: A total of 20 Brazilian Portuguese speakers of the same dialect, namely 10 male identical twin pairs aged between 19 and 35, were recruited. Method: the participants were recorded directly through professional microphones while taking part in a spontaneous dialogue over mobile phones. Acoustic measurements were performed in connected speech samples, and in lengthened vowels, at least 160 ms long produced during spontaneous speech. Results: f0 baseline, central tendency, and extreme values were found mostly discriminatory in intra-twin pairand cross-pair comparisons. These were also the estimates displaying the largest effect sizes. Overall, only three identical twins were found statistically different regarding their f0 patterns in connected speech, but not for lengthened vowel-based f0 metrics. Estimates off 0 variation and modulation were found the least discriminatory across speakers, which may signal the control of speaking style and dialect on dynamic patterns off 0. Concerning system performance, the base value off 0 (f0 baseline) was found the most reliable metric, displaying the lowest equal error rate (EER). Conclusions: the outcomes suggest that, although identical twins were very closely related regarding their f0 patterns, some pairs could still be differentiated acoustically, only in connected speech. Such findings reinforce the relevance of analyzing long-term f0 metrics for speaker comparison purposes, with particular consideration to f0 baseline. Furthermore, f0 differences across subjects were suggested as more expressive in connected speech than in lengthened vowels.

Keywords
Speaking fundamental frequency, Acoustic phonetics, Forensic phonetics, Identical twins
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:su:diva-197634 (URN)10.1016/j.jvoice.2021.08.013 (DOI)001164381200001 ()34629229 (PubMedID)2-s2.0-85116724047 (Scopus ID)
Available from: 2021-10-12 Created: 2021-10-12 Last updated: 2024-03-08Bibliographically approved
Cavalcanti, J. C., Eriksson, A., Barbosa, P. A. & Madureira, S. (2024). Revisiting the speaker discriminatory power of vowel formant frequencies under a likelihood ratio-based paradigm: The case of mismatched speaking styles. PLOS ONE, 19(12), e0311363-e0311363
Open this publication in new window or tab >>Revisiting the speaker discriminatory power of vowel formant frequencies under a likelihood ratio-based paradigm: The case of mismatched speaking styles
2024 (English)In: PLOS ONE, E-ISSN 1932-6203, Vol. 19, no 12, p. e0311363-e0311363Article in journal (Refereed) Published
Abstract [en]

 Differentiating subjects through the comparison of their recorded speech is a common endeavor in speaker characterization. When using an acoustic-based approach, this task typically involves scrutinizing specific acoustic parameters and assessing their discriminatory capacity. This experimental study aimed to evaluate the speaker discriminatory power of vowel formants—resonance peaks in the vocal tract—in two different speaking styles: Dialogue and Interview. Different testing procedures were applied, specifically metrics compatible with the likelihood ratio paradigm. Only high-quality recordings were analyzed in this study. The participants were 20 male Brazilian Portuguese (BP) speakers from the same dialectal area. Two speaker-discriminatory power estimates were examined through Multivariate Kernel Density analysis: Log cost-likelihood ratios (Cllr) and equal error rates (EER). As expected, the discriminatory performance was stronger for style-matched analyses than for mismatched-style analyses. In order of relevance, F3, F4, and F1 performed the best in style-matched comparisons, as suggested by lower Cllr and EER values. F2 performed the worst intra-style in both Dialogue and Interview. The discriminatory power of all individual formants (F1-F4) appeared to be affected in the mismatched condition, demonstrating that discriminatory power is sensitive to style-driven changes in speech production. The combination of higher formants ‘F3 + F4’ outperformed the combination of lower formants ‘F1 + F2’. However, in mismatched-style analyses, the magnitude of improvement in Cllr and EER scores increased as more formants were incorporated into the model. The best discriminatory performance was achieved when most formants were combined. Applying multivariate analysis not only reduced average Cllr and EER scores but also influenced the overall probability distribution, shifting the probability density distribution towards lower Cllr and EER values. In general, front and central vowels were found more speaker discriminatory than back vowels as far as the ‘F1 + F2’ relation was concerned.

Keywords
Acoustic Phonetics, Formants, Vowels, Speech
National Category
General Language Studies and Linguistics
Research subject
Phonetics
Identifiers
urn:nbn:se:su:diva-237133 (URN)10.1371/journal.pone.0311363 (DOI)001375443900014 ()39656685 (PubMedID)2-s2.0-85212298703 (Scopus ID)
Available from: 2024-12-11 Created: 2024-12-11 Last updated: 2025-02-25
Crochiquia, A., Eriksson, A., Barbosa, P. A. & Madureira, S. (2023). Animated character profiles: The role of voice and lexical content. In: Radek Skarnitzl; Jan Volín (Ed.), Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023: . Paper presented at The 20th International Congress of Phonetic Sciences, Prague, Czechoslovakia, 2023 (pp. 466-470). Prague
Open this publication in new window or tab >>Animated character profiles: The role of voice and lexical content
2023 (English)In: Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023 / [ed] Radek Skarnitzl; Jan Volín, Prague, 2023, p. 466-470Conference paper, Published paper (Refereed)
Abstract [en]

Our study examines the attribution of physical and psychosocial features to the characters of a dubbed animated film by comparing the results of a listening test experiment with the results of a reading test experiment. The aim of this study was to investigate the potential influence of lexical content on the perceptual evaluation of speaker characteristics. In the listening test, respondents were asked to listen to audio samples of dubbed dialogues in the film produced by voice actors and rate the characters on fourteen bipolar continuous scales. The task in the reading test was the same, except for the stimuli, which were transcriptions of the audio files. Results from both tests are congruent, as the character profiles based on the perception scores from each test are nearly identical for all characters. Differences are mainly due to the degree judgement scores are amplified by the characters' voice quality profiles.

Place, publisher, year, edition, pages
Prague: , 2023
National Category
General Language Studies and Linguistics
Research subject
Linguistics
Identifiers
urn:nbn:se:su:diva-220545 (URN)
Conference
The 20th International Congress of Phonetic Sciences, Prague, Czechoslovakia, 2023
Available from: 2023-08-30 Created: 2023-08-30 Last updated: 2023-09-18Bibliographically approved
Cavalcanti, J. C., Eriksson, A. & Barbosa, P. A. (2023). On the speaker discriminatory power asymmetry regarding acoustic-phonetic parameters and the impact of speaking style. Frontiers in Psychology, 14
Open this publication in new window or tab >>On the speaker discriminatory power asymmetry regarding acoustic-phonetic parameters and the impact of speaking style
2023 (English)In: Frontiers in Psychology, E-ISSN 1664-1078, Vol. 14Article in journal (Refereed) Published
Abstract [en]

This study aimed to assess what we refer to as the speaker discriminatory power asymmetry and its forensic implications in comparisons performed in different speaking styles: spontaneous dialogues vs. interviews. We also addressed the impact of data sampling on the speaker's discriminatory performance concerning different acoustic-phonetic estimates. The participants were 20 male speakers, Brazilian Portuguese speakers from the same dialectal area. The speech material consisted of spontaneous telephone conversations between familiar individuals, and interviews conducted between each individual participant and the researcher. Nine acoustic-phonetic parameters were chosen for the comparisons, spanning from temporal and melodic to spectral acoustic-phonetic estimates. Ultimately, an analysis based on the combination of different parameters was also conducted. Two speaker discriminatory metrics were examined: Cost Log-likelihood-ratio (Cllr) and Equal Error Rate (EER) values. A general speaker discriminatory trend was suggested when assessing the parameters individually. Parameters pertaining to the temporal acoustic-phonetic class depicted the weakest performance in terms of speaker contrasting power as evidenced by the relatively higher Cllr and EER values. Moreover, from the set of acoustic parameters assessed, spectral parameters, mainly high formant frequencies, i.e., F3 and F4, were the best performing in terms of speaker discrimination, depicting the lowest EER and Cllr scores. The results appear to suggest a speaker discriminatory power asymmetryconcerning parameters from different acoustic-phonetic classes, in which temporal parameters tended to present a lower discriminatory power. The speaking style mismatch also seemed to considerably impact the speaker comparison task, by undermining the overall discriminatory performance. A statistical model based on the combination of different acoustic-phonetic estimates was found to perform best in this case. Finally, data sampling has proven to be of crucial relevance for the reliability of discriminatory power assessment.

Keywords
Speech analysis, phonetics, acoustic phonetics, forensic phonetics, speaker comparison
National Category
General Language Studies and Linguistics
Research subject
Phonetics
Identifiers
urn:nbn:se:su:diva-216555 (URN)10.3389/fpsyg.2023.1101187 (DOI)000977779300001 ()2-s2.0-85159925236 (Scopus ID)
Available from: 2023-04-19 Created: 2023-04-19 Last updated: 2024-10-15
Crochiquia, A., Eriksson, A., Barbosa, P. A. & Madureira, S. (2022). A perceptual and acoustic study of dubbed voices in an animated film. In: Sónia Frota; Marisa Cruz; Marina Vigário (Ed.), Proceedings of the 11th International Conference on Speech Prosody: . Paper presented at Speech Prosody 2022, Lisbon, Portugal, May 23-26, 2022 (pp. 565-569). The International Speech Communication Association (ISCA)
Open this publication in new window or tab >>A perceptual and acoustic study of dubbed voices in an animated film
2022 (English)In: Proceedings of the 11th International Conference on Speech Prosody / [ed] Sónia Frota; Marisa Cruz; Marina Vigário, The International Speech Communication Association (ISCA), 2022, p. 565-569Conference paper, Published paper (Refereed)
Abstract [en]

Listeners rely on speech vocal cues to judge speakers’ age, size, personality, and other paralinguistic and extralinguistic features. These judgements are often based on vocal stereotypes which may be universally or culturally determined. This study examines how physical, psychological, social, and vocal features are perceived by listeners and which acoustic features may influence their judgements. An experiment integrating a perceptual test and acoustic measurements was performed. The corpus consisted of speech utterances produced by five animated film characters, dubbed in Brazilian Portuguese. The stimuli were judged by 77 Brazilian Portuguese native speakers, 46 women and 31 men, aged 20 to 50. The acoustic analysis was performed automatically. Acoustic measures included mean f0, f0 baseline, spectral emphasis and H1-H2. For inter-rater agreement analysis, Cronbach's Alpha was chosen. The results indicated close agreements among judges for all characters. Overall scores obtained for all characters were above .90. In interpreting the results, the influence sound symbolism codes may have on listeners’ judgments and the factors influencing vocal stereotypes have been considered. The discussion of the acoustic and perceptual analysis results takes into consideration if voice actors adapt their voices to fit the characters or otherwise are cast because of their natural voice characteristics

Place, publisher, year, edition, pages
The International Speech Communication Association (ISCA), 2022
Keywords
animation dubbing, vocal stereotypes, voice analysis, personality traits, voice quality
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:su:diva-213620 (URN)10.21437/SpeechProsody.2022-115 (DOI)
Conference
Speech Prosody 2022, Lisbon, Portugal, May 23-26, 2022
Available from: 2023-01-11 Created: 2023-01-11 Last updated: 2023-01-26Bibliographically approved
Cavalcanti, J. C., Eriksson, A. & Barbosa, P. A. (2022). Assessing the speaker discriminatory power asymmetry of different acoustic- phonetic parameters. In: ISAPh 2022, 4th International Symposium on Applied Phonetics: . Paper presented at 4th International Symposium on Applied Phonetics. Lund, Sweden
Open this publication in new window or tab >>Assessing the speaker discriminatory power asymmetry of different acoustic- phonetic parameters
2022 (English)In: ISAPh 2022, 4th International Symposium on Applied Phonetics, Lund, Sweden, 2022Conference paper, Published paper (Refereed)
Abstract [en]

This pilot study set out to assess the speaker discriminatory power asymmetry regarding parameters from different phonetic dimensions in spontaneous speech, i.e., spectral, melodic, and temporal. The speech material consisted of spontaneous telephone conversations between siblings. The participants were 20 male subjects, Brazilian Portuguese speakers from the same dialectal area. Six acoustic-phonetic parameters were chosen for the comparison: f0 median, f0 baseline, speech rate, articulation rate, F3, and F4. Overall, acoustic parameters pertaining to the speech tempo category depicted the worse performance in terms of speaker discriminatory power when assessed in isolation. Such a trend was indicated by the relatively higher median and mean Cllr and EER values. Moreover, from the set of parameters assessed, high formant frequencies, i.e., F3 and F4, were the best- performing estimates in terms of discriminability depicting the lowest EER and Cllr values. The results suggested a speaker discriminatory power asymmetry concerning different acoustic-phonetic parameters, in which speech tempo estimates presented a lower discriminatory power when compared to melodic and spectral parameters. The findings also suggest that data sampling is crucial for the reliability of Cllr and EER calculations.

Place, publisher, year, edition, pages
Lund, Sweden: , 2022
Keywords
Acoustic phonetics, forensic phonetics, speech tempo, melodic parameters, spectral parameters
National Category
General Language Studies and Linguistics
Research subject
Phonetics; Linguistics
Identifiers
urn:nbn:se:su:diva-216067 (URN)10.21437/ISAPh.2022-2 (DOI)
Conference
4th International Symposium on Applied Phonetics
Available from: 2023-03-31 Created: 2023-03-31 Last updated: 2023-04-05Bibliographically approved
Cavalcanti, J. C., Eriksson, A. & Barbosa, P. A. (2022). Measuring the impact of data size on the speaker discriminatory performance: a spontaneous speech-based study. In: Oliver Niebuhr (Ed.), 13th Nordic Prosody Conference: Applied and Multimodal Prosody Research. Paper presented at 13th Nordic Prosody Conference. Sonderborg, Denmark
Open this publication in new window or tab >>Measuring the impact of data size on the speaker discriminatory performance: a spontaneous speech-based study
2022 (English)In: 13th Nordic Prosody Conference: Applied and Multimodal Prosody Research / [ed] Oliver Niebuhr, Sonderborg, Denmark, 2022Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

This study aimed to analyze the impact of the amount of data on the discriminatory performance of acoustic-phonetic parameters, some of which are frequently assessed in forensic speaker comparisons. Parameters from three distinct phonetic domains were considered, namely, spectral, melodic, and temporal, which were assessed separately within the same phonetic domain and in combination.

The speech material consisted of spontaneous telephone conversations between two subjects. During the recording sessions, the participants were placed in different rooms, not directly seeing, hearing, or interacting with each other. The speakers were encouraged to start a conversation using a mobile phone while being simultaneously recorded.

All recordings were carried out with a high resolution (44.1 kHz and 16-bit). Data segmentation and transcription were performed in the Praat software [1]. The participants were 20 male subjects, Brazilian Portuguese speakers from the same dialectal area. Their age ranged from 19 to 35 years, with a mean of 26.4 years. Although the subjects (10 identical twin pairs) were recruited from a twin research project, cf. [2, 3, 4], the focus here was comparisons among all speakers (i.e., 190 inter-speaker comparisons) rather than on individual twin pairs

Place, publisher, year, edition, pages
Sonderborg, Denmark: , 2022
Keywords
Phonetics, Acoustic Phonetics, Data size
National Category
General Language Studies and Linguistics
Research subject
Phonetics; Linguistics
Identifiers
urn:nbn:se:su:diva-208194 (URN)10.6084/m9.figshare.20509167.v1 (DOI)
Conference
13th Nordic Prosody Conference
Available from: 2022-08-23 Created: 2022-08-23 Last updated: 2022-09-08Bibliographically approved
Cavalcanti, J. C., Eriksson, A. & Barbosa, P. A. (2022). Multi-parametric analysis of speech timing in inter-talker identical twin pairs and cross-pair comparisons: Some forensic implications. PLOS ONE, 17(1), Article ID e0262800.
Open this publication in new window or tab >>Multi-parametric analysis of speech timing in inter-talker identical twin pairs and cross-pair comparisons: Some forensic implications
2022 (English)In: PLOS ONE, E-ISSN 1932-6203, Vol. 17, no 1, article id e0262800Article in journal (Refereed) Published
Abstract [en]

The purpose of this study was to assess the speaker-discriminatory potential of a set of speech timing parameters while probing their suitability for forensic speaker comparison applications. The recordings comprised of spontaneous dialogues between twin pairs through mobile phones while being directly recorded with professional headset microphones. Speaker comparisons were performed with twins speakers engaged in a dialogue (i.e., intra-twin pairs) and among all subjects (i.e., cross-twin pairs). The participants were 20 Brazilian Portuguese speakers, ten male identical twin pairs from the same dialectal area. A set of 11 speech timing parameters was extracted and analyzed, including speech rate, articulation rate, syllable duration (V-V unit), vowel duration, and pause duration. Three system performance estimates were considered for assessing the suitability of the parameters for speaker comparison purposes, namely global Cllr, EER, and AUC values. These were interpreted while also taking into consideration the analysis of effect sizes. Overall, speech rate and articulation rate were found the most reliable parameters, displaying the largest effect sizes for the factor “speaker” and the best system performance outcomes, namely lowest Cllr, EER, and highest AUC values. Conversely, smaller effect sizes were found for the other parameters, which is compatible with a lower explanatory potential of the speaker identity on the duration of such units and a possibly higher linguistic control regarding their temporal variation. In addition, there was a tendency for speech timing estimates based on larger temporal intervals to present larger effect sizes and better speaker-discriminatory performance. Finally, identical twin pairs were found remarkably similar in their speech temporal patterns at the macro and micro levels while engaging in a dialogue, resulting in poor system discriminatory performance. Possible underlying factors for such a striking convergence in identical twins’ speech timing patterns are presented and discussed.

Keywords
Speech, Vowels, Syllables, Forensics, Monozygotic twins, Twins, Phonology, Language
National Category
General Language Studies and Linguistics
Research subject
Linguistics; Phonetics
Identifiers
urn:nbn:se:su:diva-201223 (URN)10.1371/journal.pone.0262800 (DOI)000791072800171 ()35061853 (PubMedID)2-s2.0-85123374385 (Scopus ID)
Available from: 2022-01-21 Created: 2022-01-21 Last updated: 2022-06-28Bibliographically approved
Cavalcanti, J. C., Eriksson, A. & Barbosa, P. A. (2021). Acoustic analysis of vowel formant frequencies in genetically-related and non-genetically related speakers with implications for forensic speaker comparison. PLOS ONE, 16(2), Article ID e0246645.
Open this publication in new window or tab >>Acoustic analysis of vowel formant frequencies in genetically-related and non-genetically related speakers with implications for forensic speaker comparison
2021 (English)In: PLOS ONE, E-ISSN 1932-6203, Vol. 16, no 2, article id e0246645Article in journal (Refereed) Published
Abstract [en]

The purpose of this study was to explore the speaker-discriminatory potential of vowel formant mean frequencies in comparisons of identical twin pairs and non-genetically related speakers. The influences of lexical stress and the vowels’ acoustic distances on the discriminatory patterns of formant frequencies were also assessed. Acoustic extraction and analysis of the first four speech formants F1-F4 were carried out using spontaneous speech materials. The recordings comprise telephone conversations between identical twin pairs while being directly recorded through high-quality microphones. The subjects were 20 male adult speakers of Brazilian Portuguese (BP), aged between 19 and 35. As for comparisons, stressed and unstressed oral vowels of BP were segmented and transcribed manually in the Praat software. F1-F4 formant estimates were automatically extracted from the middle points of each labeled vowel. Formant values were represented in both Hertz and Bark. Comparisons within identical twin pairs using the Bark scale were performed to verify whether the measured differences would be potentially significant when following a psychoacoustic criterion. The results revealed consistent patterns regarding the comparison of low-frequency and high-frequency formants in twin pairs and non-genetically related speakers, with high-frequency formants displaying a greater speaker-discriminatory power compared to low-frequency formants. Among all formants, F4 seemed to display the highest discriminatory potential within identical twin pairs, followed by F3. As for non-genetically related speakers, both F3 and F4 displayed a similar high discriminatory potential. Regarding vowel quality, the central vowel /a/ was found to be the most speaker-discriminatory segment, followed by front vowels. Moreover, stressed vowels displayed a higher inter-speaker discrimination than unstressed vowels in both groups; however, the combination of stressed and unstressed vowels was found even more explanatory in terms of the observed differences. Although identical twins displayed a higher phonetic similarity, they were not found phonetically identical.

Keywords
twins
National Category
General Language Studies and Linguistics
Research subject
Phonetics
Identifiers
urn:nbn:se:su:diva-190481 (URN)10.1371/journal.pone.0246645 (DOI)000620625100045 ()
Available from: 2021-02-19 Created: 2021-02-19 Last updated: 2022-02-25Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-6844-4834

Search in DiVA

Show all publications