Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Prediction of three articulatory categories in vocal sound imitations using models for auditory receptive fields
Stockholm University, Faculty of Humanities, Department of Linguistics, Phonetics. Stockholm University, Faculty of Humanities, Department of Linguistics, SUBIC - Stockholm University Brain Imaging Centre. Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology 1 , Lindstedtsvägen 24, 10044 Stockholm, Sweden.ORCID iD: 0000-0002-1495-7773
Show others and affiliations
2018 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 144, no 3, p. 1467-1483Article in journal (Refereed) Published
Abstract [en]

Vocal sound imitations provide a new challenge for understanding the coupling between articulatory mechanisms and the resulting audio. In this study, the classification of three articulatory categories, phonation, supraglottal myoelastic vibrations, and turbulence, have been modeled from audio recordings. Two data sets were assembled, consisting of different vocal imitations by four professional imitators and four non-professional speakers in two different experiments. The audio data were manually annotated by two experienced phoneticians using a detailed articulatory description scheme. A separate set of audio features was developed specifically for each category using both time-domain and spectral methods. For all time-frequency transformations, and for some secondary processing, the recently developed Auditory Receptive Fields Toolbox was used. Three different machine learning methods were applied for predicting the final articulatory categories. The result with the best generalization was found using an ensemble of multilayer perceptrons. The cross-validated classification accuracy was 96.8% for phonation, 90.8% for supraglottal myoelastic vibrations, and 89.0% for turbulence using all the 84 developed features. A final feature reduction to 22 features yielded similar results.

Place, publisher, year, edition, pages
2018. Vol. 144, no 3, p. 1467-1483
National Category
Signal Processing
Identifiers
URN: urn:nbn:se:su:diva-221037DOI: 10.1121/1.5052438ISI: 000457802200049PubMedID: 30424637Scopus ID: 2-s2.0-85053873907OAI: oai:DiVA.org:su-221037DiVA, id: diva2:1796899
Funder
EU, FP7, Seventh Framework Programme, 618067Swedish Research Council, 2012-4685Available from: 2023-09-13 Created: 2023-09-13 Last updated: 2023-09-17Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records

Salomão, Gláucia Laís

Search in DiVA

By author/editor
Salomão, Gláucia Laís
By organisation
PhoneticsSUBIC - Stockholm University Brain Imaging Centre
In the same journal
Journal of the Acoustical Society of America
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 16 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf