Endre søk
Link to record
Permanent link

Direct link
Publikasjoner (9 av 9) Visa alla publikasjoner
Persson, A., Barreda, S. & Jaeger, T. F. (2025). Comparing accounts of formant normalization against US English listeners' vowel perception. Journal of the Acoustical Society of America
Åpne denne publikasjonen i ny fane eller vindu >>Comparing accounts of formant normalization against US English listeners' vowel perception
2025 (engelsk)Inngår i: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524Artikkel i tidsskrift (Fagfellevurdert) Published
Emneord
speech perception, normalization, vowels, formants, ideal observers
HSV kategori
Identifikatorer
urn:nbn:se:su:diva-234281 (URN)10.1121/10.0035476 (DOI)001434099500001 ()39998127 (PubMedID)2-s2.0-85218621262 (Scopus ID)
Tilgjengelig fra: 2024-10-14 Laget: 2024-10-14 Sist oppdatert: 2025-04-08
Persson, A. (2025). The acoustic characteristics of Swedish vowels. Phonetica, 81(6), 599-643
Åpne denne publikasjonen i ny fane eller vindu >>The acoustic characteristics of Swedish vowels
2025 (engelsk)Inngår i: Phonetica, ISSN 0031-8388, E-ISSN 1423-0321, Vol. 81, nr 6, s. 599-643Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The Swedish vowel space is relatively densely populated with 21 categories that differ in quality and quantity. Existing descriptions of the entire space rest on recordings made in the late 1990s or earlier, while recent work in general has focused on subsets of the space. The present paper reports on static and dynamic acoustic analyses of the entire vowel space using a recently released database of h-VOWEL-d words (SwehVd). The results highlight the importance of static and dynamic spectral and temporal cues for Swedish vowel category distinction. The first two formants and vowel duration are the primary acoustic cues to vowel identity, however, the third formant contributes to increased category separability for neighboring contrasts presumed to differ in lip-rounding. In addition, even though all long-short vowel pairs differ systematically in duration, they also display considerable spectral differences, suggesting that quantity distinctions are not separate from quality distinctions in Swedish. The dynamic analysis further suggests formant movements in both long and short vowels, with [e:] and [o:] displaying clearer patterns of diphthongization.

Emneord
vowels, category separability, formant dynamics
HSV kategori
Forskningsprogram
nordiska språk
Identifikatorer
urn:nbn:se:su:diva-234280 (URN)10.1515/phon-2024-0011 (DOI)001339318600001 ()39443329 (PubMedID)2-s2.0-85208373200 (Scopus ID)
Tilgjengelig fra: 2024-10-14 Laget: 2024-10-14 Sist oppdatert: 2025-02-24bibliografisk kontrollert
Persson, A. (2024). Comparing theories of pre-linguistic normalization for vowel perception. (Doctoral dissertation). Stockholm: Department of Swedish Language and Multilingualism, Stockholm University
Åpne denne publikasjonen i ny fane eller vindu >>Comparing theories of pre-linguistic normalization for vowel perception
2024 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

The present thesis compares competing theories of pre-linguistic normalization for the perception of Swedish and English vowels. Specifically, the overall aim is to investigate whether normalization might be key to understanding the mechanisms supporting robust cross-talker perception, and to gain more insights into the specific computations involved. The thesis is based on three articles that employ acoustic analysis, behavioral experiments and computational modeling to address the question of vowel normalization.

Article I uses a novel phonetically annotated database of Swedish vowel recordings, the SwehVd, to provide an updated acoustic description of the Central Swedish vowel system and to evaluate certain claims of cue-to-category mappings introduced by previous work. Replicating previous studies, the results of Article I suggest that F1, F2 and vowel duration are the most important cues to vowel identity in Central Swedish. In addition, the results highlight the importance of formant dynamics for reliable category distinctions. The acoustic characteristics of Article I further constitute the input to the computational modeling presented in Article II.

Article II evaluates 15 competing normalization accounts in terms of how well they predict the intended vowel category of Central Swedish, as represented by the talkers in SwehVd. Specifically, a computational model of vowel perception, a Bayesian ideal observer, is used to assess the predicted consequences of normalization. The results indicate that normalization accounts that assume the learning and storing of talker-specific acoustics (i.e., extrinsic accounts) achieve the best fit against vowel production data. The evaluation against the SwehVd database further contributes to the insight that languages with dense vowel spaces do not necessarily require more complex normalization mechanisms.

Article III evaluates 20 different normalization accounts in how well they predict listeners' categorization behavior in two vowel categorization experiments on US English vowels. Paralleling the results from Article II, the results indicate that more complex extrinsic normalization is needed for robust cross-talker perception. However, it is a computationally minimalist extrinsic account – uniform scaling – that provides the best fit when evaluated against listeners' responses. This would seem to suggest that more complex computations (as in, e.g., Lobanov normalization) are not required for human speech perception.

The thesis aimed for a broad-scale evaluation of competing theories of pre-linguistic normalization, assessing the predictions of different accounts using different types of experiment stimuli, different vowel spaces, and different sets of acoustic cues. This broad-scale evaluation was made possible through the implementation of a holistic and stringent computational framework, for an unbiased comparison of accounts. The main contributions of this thesis include the open-access publication of the framework and the vowel database, to facilitate replication and future studies.

sted, utgiver, år, opplag, sider
Stockholm: Department of Swedish Language and Multilingualism, Stockholm University, 2024. s. 82
Serie
Stockholm studies in Scandinavian philology, ISSN 0562-1097 ; 73
Emneord
vowels, speech perception, formants, normalization, ideal observers, spectral acoustics
HSV kategori
Forskningsprogram
nordiska språk
Identifikatorer
urn:nbn:se:su:diva-234282 (URN)978-91-8014-973-0 (ISBN)978-91-8014-974-7 (ISBN)
Disputas
2024-11-29, hörsal 3, hus B, Universitetsvägen 10, Stockholm, 13:00 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2024-11-06 Laget: 2024-10-14 Sist oppdatert: 2024-10-28bibliografisk kontrollert
Persson, A. & Jaeger, T. F. (2024). Measuring the informativity of F3 for rounded and unrounded high-front vowels in Central Swedish. In: Mattias Heldner; Marcin Włodarczak; Christine Ericsdotter Nordgren; Carla Wikse Barrow (Ed.), Proceedings from FONETIK 2024, Department of Linguistics, Stockholm University: . Paper presented at Fonetik 2024, Stockholm, Sweden, June 3-6, 2024 (pp. 13-18). Department of Linguistics, Stockholm University
Åpne denne publikasjonen i ny fane eller vindu >>Measuring the informativity of F3 for rounded and unrounded high-front vowels in Central Swedish
2024 (engelsk)Inngår i: Proceedings from FONETIK 2024, Department of Linguistics, Stockholm University / [ed] Mattias Heldner; Marcin Włodarczak; Christine Ericsdotter Nordgren; Carla Wikse Barrow, Department of Linguistics, Stockholm University , 2024, s. 13-18Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
Department of Linguistics, Stockholm University, 2024
HSV kategori
Forskningsprogram
nordiska språk
Identifikatorer
urn:nbn:se:su:diva-232676 (URN)10.5281/zenodo.11396050 (DOI)
Konferanse
Fonetik 2024, Stockholm, Sweden, June 3-6, 2024
Tilgjengelig fra: 2024-08-21 Laget: 2024-08-21 Sist oppdatert: 2024-08-29bibliografisk kontrollert
Persson, A. & Jaeger, T. F. (2023). Comparing pre-linguistic normalization models against US English listeners' perception of natural and resynthesized vowels. In: 184th meeting of the acoustical society of America: Abstracts. Paper presented at 184th Meeting of the Acoustical Society of America, Chicago, Illinois, May 8-12, 2023. , 153, Article ID A77.
Åpne denne publikasjonen i ny fane eller vindu >>Comparing pre-linguistic normalization models against US English listeners' perception of natural and resynthesized vowels
2023 (engelsk)Inngår i: 184th meeting of the acoustical society of America: Abstracts, 2023, Vol. 153, artikkel-id A77Konferansepaper, Poster (with or without abstract) (Annet vitenskapelig)
Abstract [en]

Talkers vary in their vowel pronunciation. One hypothesis holds that listeners achieve robust speech perception through pre-linguistic normalization. In recent work (also submitted to ASA), we modeled listeners’ perception of naturally produced /h-VOWEL-d/ words. The best-performing normalization models accounted for ∼90% of the explainable variance in listeners’ responses. Here, we investigate whether the remaining 10% follow from (1) other mechanisms or whether (2) they reflect listeners’ ability to use more cues than available to models. We constructed a new set of *synthesized* /h-VOWEL-d/ stimuli that varied only in F1 and F2. Unsurprisingly, listeners (N = 24) performed worse on these synthesized stimuli than on the natural stimuli (estimated as inter-listener agreement in categorization). Critically though, we find (1) that the same normalization accounts that best explained listeners’ responses to natural stimuli also perform best explaining responses to synthesized stimuli; (2) the best performing model again accounted for ∼90% of explainable variance. This suggests that the ‘failure’ of normalization accounts to fully explain listeners’ categorization behavior is *not* due to restrictions in the ability to feed our models all available cues. Rather, normalization alone—while critical to perception—seems insufficient to fully explain listeners’ ability to adapt based on recent input.

HSV kategori
Forskningsprogram
nordiska språk
Identifikatorer
urn:nbn:se:su:diva-233436 (URN)10.1121/10.0018219 (DOI)
Konferanse
184th Meeting of the Acoustical Society of America, Chicago, Illinois, May 8-12, 2023
Tilgjengelig fra: 2024-09-13 Laget: 2024-09-13 Sist oppdatert: 2024-09-13bibliografisk kontrollert
Persson, A. & Jaeger, T. F. (2023). Comparing pre-linguistic normalization models against US English listeners' vowel perception. In: 184th Meeting of the Acoustical Society of America: Abstracts. Paper presented at 184th Meeting of the Acoustical Society of America, Chicago, Illinois, May 8-12, 2023. , 153, Article ID A77.
Åpne denne publikasjonen i ny fane eller vindu >>Comparing pre-linguistic normalization models against US English listeners' vowel perception
2023 (engelsk)Inngår i: 184th Meeting of the Acoustical Society of America: Abstracts, 2023, Vol. 153, artikkel-id A77Konferansepaper, Poster (with or without abstract) (Annet vitenskapelig)
Abstract [en]

One of the central computational challenges for speech perception is that talkers differ in pronunciation--i.e., how they map linguistic categories and meanings onto the acoustic signal. Yet, listeners typically overcome these difficulties within minutes (Clarke & Garrett, 2004; Xie et al., 2018). The mechanisms that underlie these adaptive abilities remain unclear. One influential hypothesis holds that listeners achieve robust speech perception across talkers through low-level pre-linguistic normalization. We investigate the role of normalization in the perception of L1-US English vowels. We train ideal observers (IOs) on unnormalized or normalized acoustic cues using a phonetic database of 8 /h-VOWEL-d/ words of US English (N = 1240 recordings from 16 talkers, Xie & Jaeger, 2020). All IOs had 0 DFs in predicting perception—i.e., their predictions are completely determined by pronunciation statistics. We compare the IOs’ predictions against L1-US English listeners’ 8-way categorization responses for /h-VOWEL-d/ words in a web-based experiment. We find that (1) pre-linguistic normalization substantially improves the fit to human responses from 74% to 90% of best-possible performance (chance = 12.5%); (2) the best-performing normalization accounts centered and/or scaled formants by talker; and (3) general purpose normalization (C-CuRE, McMurray & Jongman, 2011) performed as well as vowel-specific normalization. © 2023 Acoustical Society of America.

  

 

HSV kategori
Forskningsprogram
nordiska språk
Identifikatorer
urn:nbn:se:su:diva-233435 (URN)10.1121/10.0018218 (DOI)
Konferanse
184th Meeting of the Acoustical Society of America, Chicago, Illinois, May 8-12, 2023
Tilgjengelig fra: 2024-09-13 Laget: 2024-09-13 Sist oppdatert: 2024-09-13bibliografisk kontrollert
Persson, A. & Jaeger, T. F. (2023). Evaluating normalization accounts against the dense vowel space of Central Swedish. Frontiers in Psychology, 14, Article ID 1165742.
Åpne denne publikasjonen i ny fane eller vindu >>Evaluating normalization accounts against the dense vowel space of Central Swedish
2023 (engelsk)Inngår i: Frontiers in Psychology, E-ISSN 1664-1078, Vol. 14, artikkel-id 1165742Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Talkers vary in the phonetic realization of their vowels. One influential hypothesis holds that listeners overcome this inter-talker variability through pre-linguistic auditory mechanisms that normalize the acoustic or phonetic cues that form the input to speech recognition. Dozens of competing normalization accounts exist-including both accounts specific to vowel perception and general purpose accounts that can be applied to any type of cue. We add to the cross-linguistic literature on this matter by comparing normalization accounts against a new phonetically annotated vowel database of Swedish, a language with a particularly dense vowel inventory of 21 vowels differing in quality and quantity. We evaluate normalization accounts on how they differ in predicted consequences for perception. The results indicate that the best performing accounts either center or standardize formants by talker. The study also suggests that general purpose accounts perform as well as vowel-specific accounts, and that vowel normalization operates in both temporal and spectral domains.

Emneord
vowel normalization, ideal observers, speech production, speech perception, category separability
HSV kategori
Identifikatorer
urn:nbn:se:su:diva-221128 (URN)10.3389/fpsyg.2023.1165742 (DOI)001022324900001 ()37416548 (PubMedID)2-s2.0-85164513909 (Scopus ID)
Tilgjengelig fra: 2023-09-15 Laget: 2023-09-15 Sist oppdatert: 2024-10-14bibliografisk kontrollert
Persson, A. & Jaeger, T. F. (2023). Evaluating normalization accounts against the dense vowel space of Stockholm Swedish. In: 184th meeting of the acoustical society of America: Abstracts. Paper presented at 184th Meeting of the Acoustical Society of America, Chicago, Illinois, May 8-12, 2023. , 153, Article ID A370.
Åpne denne publikasjonen i ny fane eller vindu >>Evaluating normalization accounts against the dense vowel space of Stockholm Swedish
2023 (engelsk)Inngår i: 184th meeting of the acoustical society of America: Abstracts, 2023, Vol. 153, artikkel-id A370Konferansepaper, Poster (with or without abstract) (Annet vitenskapelig)
Abstract [en]

Talkers vary in the phonetic realization of their vowels. One influential hypothesis holds that listeners overcome this inter-talker variability through pre-linguistic auditory mechanisms that normalize the acoustic or phonetic cues that form the input to speech recognition. Dozens of competing normalization accounts exist —including both vowel-specific (e.g., Lobanov, 1971; Nearey, 1978; Syrdal and Gopal, 1986) and general-purpose accounts applicable to any type of phonetic cue (McMurray and Jongman, 2011). We add to the cross-linguistic literature by comparing normalization accounts against a new database of Swedish, a language with a particularly dense vowel inventory of 21 vowels differing in quality and quantity. We train Bayesian ideal observers (IOs) on unnormalized or normalized vowel data under different assumptions about the relevant cues to vowel identity (F0-F3, vowel duration), and evaluate their performance in predicting the category intended by talker. The results indicate that the best-performing normalization accounts centered and/or scaled formants by talker (e.g., Lobanov), replicating previous findings for other languages with less dense vowel spaces. The relative advantage of Lobanov decreased when including additional cues, indicating that simple centering relative to the talker’s mean might be sufficient to achieve robust inter-talker perception (e.g., C-CuRE).

HSV kategori
Forskningsprogram
nordiska språk
Identifikatorer
urn:nbn:se:su:diva-233434 (URN)10.1121/10.0019201 (DOI)
Konferanse
184th Meeting of the Acoustical Society of America, Chicago, Illinois, May 8-12, 2023
Tilgjengelig fra: 2024-09-13 Laget: 2024-09-13 Sist oppdatert: 2024-09-13bibliografisk kontrollert
Persson, A. & Jaeger, T. F. (2022). Comparing pre-linguistic normalization models against US Enligsh listeners' vowel perception. In: : . Paper presented at 13th International Conference of Experimental Linguistics (ExLing 2022), Paris, France, October 17-19, 2022.
Åpne denne publikasjonen i ny fane eller vindu >>Comparing pre-linguistic normalization models against US Enligsh listeners' vowel perception
2022 (engelsk)Konferansepaper, Oral presentation only (Annet vitenskapelig)
Abstract [en]

We investigate the role of pre-linguistic normalization in the perception of US English vowels. We train Bayesian ideal observer (IO) models on unnormalized or normalized acoustic cues to vowel identity using a phonetic database of 8 /h-VOWEL-d/ words of US English. We then compare the IOs’ predictions for vowel categorization against L1 US English listeners’ 8-way categorization responses for recordings of /h-VOWEL-d/ words in a web-based experiment. Results indicate that pre-linguistic normalization substantially improves the fit to human responses from 74% to 90% of best-possible performance.

HSV kategori
Forskningsprogram
nordiska språk
Identifikatorer
urn:nbn:se:su:diva-214406 (URN)
Konferanse
13th International Conference of Experimental Linguistics (ExLing 2022), Paris, France, October 17-19, 2022
Tilgjengelig fra: 2023-02-02 Laget: 2023-02-02 Sist oppdatert: 2023-02-03bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0001-5226-8568