Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Identification of taxon through classification with partial reject options
Stockholms universitet, Naturvetenskapliga fakulteten, Matematiska institutionen.ORCID-id: 0000-0001-9662-507x
Stockholms universitet, Naturvetenskapliga fakulteten, Matematiska institutionen.ORCID-id: 0000-0003-2767-8818
2023 (engelsk)Inngår i: The Journal of the Royal Statistical Society, Series C: Applied Statistics, ISSN 0035-9254, E-ISSN 1467-9876, Vol. 72, nr 4, s. 937-975Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Identification of taxa can significantly be assisted by statistical classification based on trait measurements either individually or by phylogenetic (clustering) methods. In this article, we present a general Bayesian approach for classifying species individually based on measurements of a mixture of continuous and ordinal traits, and any type of covariates. The trait vector is derived from a latent variable with a multivariate Gaussian distribution. Decision rules based on supervised learning are presented that estimate model parameters through blocked Gibbs sampling. These decision regions allow for uncertainty (partial rejection), so that not necessarily one specific category (taxon) is output when new subjects are classified, but rather a set of categories including the most probable taxa. This type of discriminant analysis employs reward functions with a set-valued input argument, so that an optimal Bayes classifier can be defined. We also present a way of safeguarding against outlying new observations, using an analogue of a p-value within our Bayesian setting. We refer to our Bayesian set-valued classifier as the Karlsson–Hössjer method, and it is illustrated on an original ornithological data set of birds. We also incorporate model selection through cross-validation, exemplified on another original data set of birds. 

sted, utgiver, år, opplag, sider
2023. Vol. 72, nr 4, s. 937-975
Emneord [en]
Bayesian classification, classification with covariates, partial observations, set-valued classifiers, species identification, statistical ornithology
HSV kategori
Forskningsprogram
matematisk statistik
Identifikatorer
URN: urn:nbn:se:su:diva-203752DOI: 10.1093/jrsssc/qlad036ISI: 001019502200001OAI: oai:DiVA.org:su-203752DiVA, id: diva2:1653234
Tilgjengelig fra: 2022-04-21 Laget: 2022-04-21 Sist oppdatert: 2023-12-19bibliografisk kontrollert
Inngår i avhandling
1. Statistical Methods for Taxon Classification and Bird Migration Phenology
Åpne denne publikasjonen i ny fane eller vindu >>Statistical Methods for Taxon Classification and Bird Migration Phenology
2022 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

The connection between ecology and statistics is deep. Methodological advancement in statistics open up new possibilities to understand the distribution of life on earth, and research questions in ecology cause new statistical methods to be developed. The four papers of this thesis examplify this exchange in providing a statistical approach to taxon classification, and developing novel measures of distributional properties driven by the application area of phenology.

Paper I contains a comprehensive Bayesian approach to phenotypical taxon classification with covariates. We formulate a multivariate regression model for a collection of phenotypical traits, which are assumed to be partial observations of latent variables with a Gaussian distribution. Through blocked Gibbs sampling we estimate the parameters of these distributions for a real data set, and derive decision regions of new observations in terms of set-valued classifiers, called Karlsson-Hössjer (K-H) classifiers, analogous to partial reject options. We introduce model selection through cross-validation and compare the K-H classifier’s performance with other existing methods on real data.

Paper II introduces a general Bayesian framework for K-H classification. This is achieved by using a reward function with a set-valued argument, and in this context we derive the optimal Bayes classifier, for a homogeneous block of hypotheses as well as for scenarios where the hypotheses are divided into blocks, and where misclassification or ambiguity within blocks is less or more serious than between. These reward functions include tuning parameters which we choose using cross-validation, and we apply the method to a real data set with block structure.

In Paper III a large class of L-functionals is studied for the response variable in regression models. These L-functionals are given order numbers through an orthogonal series expansion of the quantile function of the response variable. We apply the framework to quantile regression models with and without transformations of the outcome variable, and present a unified asymptotic theory for estimates of L-functionals. The derived estimators are applied to a quantile regression model for phenological analysis, and in this context a novel version of the coefficient of determination is introduced.

In Paper IV two statistical approaches for phenological analysis are compared, for singular as well as for multiple species models. For singular species, we show that the estimates from linear models fitted to empirical quantiles of the response distribution give less detailed results on the effects of covariates compared to non-parametric quantile regression. For multiple species models, we highlight an identifiability issue in quantile regression with random effects, and deduce similarity of performance of a mixed effects linear model for empirical quantiles and a quantile regression model with species as one of the covariates.

sted, utgiver, år, opplag, sider
Stockholm: Department of Mathematics, Stockholm University, 2022. s. 39
Emneord
Classification, quantile regression, phenology, statistical ornithology, L-functionals, set-valued classification, species identification, statistical ecology, multispecies modelling
HSV kategori
Forskningsprogram
matematisk statistik
Identifikatorer
urn:nbn:se:su:diva-204128 (URN)978-91-7911-892-1 (ISBN)978-91-7911-893-8 (ISBN)
Disputas
2022-06-07, sal 15, hus 5, Kräftriket, Roslagsvägen 101, online via Zoom, public link is available at the department website, Stockholm, 09:00 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2022-05-13 Laget: 2022-04-21 Sist oppdatert: 2022-05-06bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekst

Person

Karlsson, MånsHössjer, Ola

Søk i DiVA

Av forfatter/redaktør
Karlsson, MånsHössjer, Ola
Av organisasjonen
I samme tidsskrift
The Journal of the Royal Statistical Society, Series C: Applied Statistics

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 75 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf