Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Statistical Methods for Taxon Classification and Bird Migration Phenology
Stockholm University, Faculty of Science, Department of Mathematics.ORCID iD: 0000-0001-9662-507x
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The connection between ecology and statistics is deep. Methodological advancement in statistics open up new possibilities to understand the distribution of life on earth, and research questions in ecology cause new statistical methods to be developed. The four papers of this thesis examplify this exchange in providing a statistical approach to taxon classification, and developing novel measures of distributional properties driven by the application area of phenology.

Paper I contains a comprehensive Bayesian approach to phenotypical taxon classification with covariates. We formulate a multivariate regression model for a collection of phenotypical traits, which are assumed to be partial observations of latent variables with a Gaussian distribution. Through blocked Gibbs sampling we estimate the parameters of these distributions for a real data set, and derive decision regions of new observations in terms of set-valued classifiers, called Karlsson-Hössjer (K-H) classifiers, analogous to partial reject options. We introduce model selection through cross-validation and compare the K-H classifier’s performance with other existing methods on real data.

Paper II introduces a general Bayesian framework for K-H classification. This is achieved by using a reward function with a set-valued argument, and in this context we derive the optimal Bayes classifier, for a homogeneous block of hypotheses as well as for scenarios where the hypotheses are divided into blocks, and where misclassification or ambiguity within blocks is less or more serious than between. These reward functions include tuning parameters which we choose using cross-validation, and we apply the method to a real data set with block structure.

In Paper III a large class of L-functionals is studied for the response variable in regression models. These L-functionals are given order numbers through an orthogonal series expansion of the quantile function of the response variable. We apply the framework to quantile regression models with and without transformations of the outcome variable, and present a unified asymptotic theory for estimates of L-functionals. The derived estimators are applied to a quantile regression model for phenological analysis, and in this context a novel version of the coefficient of determination is introduced.

In Paper IV two statistical approaches for phenological analysis are compared, for singular as well as for multiple species models. For singular species, we show that the estimates from linear models fitted to empirical quantiles of the response distribution give less detailed results on the effects of covariates compared to non-parametric quantile regression. For multiple species models, we highlight an identifiability issue in quantile regression with random effects, and deduce similarity of performance of a mixed effects linear model for empirical quantiles and a quantile regression model with species as one of the covariates.

Place, publisher, year, edition, pages
Stockholm: Department of Mathematics, Stockholm University , 2022. , p. 39
Keywords [en]
Classification, quantile regression, phenology, statistical ornithology, L-functionals, set-valued classification, species identification, statistical ecology, multispecies modelling
National Category
Probability Theory and Statistics Ecology
Research subject
Mathematical Statistics
Identifiers
URN: urn:nbn:se:su:diva-204128ISBN: 978-91-7911-892-1 (print)ISBN: 978-91-7911-893-8 (electronic)OAI: oai:DiVA.org:su-204128DiVA, id: diva2:1653368
Public defence
2022-06-07, sal 15, hus 5, Kräftriket, Roslagsvägen 101, online via Zoom, public link is available at the department website, Stockholm, 09:00 (English)
Opponent
Supervisors
Available from: 2022-05-13 Created: 2022-04-21 Last updated: 2022-05-06Bibliographically approved
List of papers
1. Identification of taxon through classification with partial reject options
Open this publication in new window or tab >>Identification of taxon through classification with partial reject options
2023 (English)In: The Journal of the Royal Statistical Society, Series C: Applied Statistics, ISSN 0035-9254, E-ISSN 1467-9876, Vol. 72, no 4, p. 937-975Article in journal (Refereed) Published
Abstract [en]

Identification of taxa can significantly be assisted by statistical classification based on trait measurements either individually or by phylogenetic (clustering) methods. In this article, we present a general Bayesian approach for classifying species individually based on measurements of a mixture of continuous and ordinal traits, and any type of covariates. The trait vector is derived from a latent variable with a multivariate Gaussian distribution. Decision rules based on supervised learning are presented that estimate model parameters through blocked Gibbs sampling. These decision regions allow for uncertainty (partial rejection), so that not necessarily one specific category (taxon) is output when new subjects are classified, but rather a set of categories including the most probable taxa. This type of discriminant analysis employs reward functions with a set-valued input argument, so that an optimal Bayes classifier can be defined. We also present a way of safeguarding against outlying new observations, using an analogue of a p-value within our Bayesian setting. We refer to our Bayesian set-valued classifier as the Karlsson–Hössjer method, and it is illustrated on an original ornithological data set of birds. We also incorporate model selection through cross-validation, exemplified on another original data set of birds. 

Keywords
Bayesian classification, classification with covariates, partial observations, set-valued classifiers, species identification, statistical ornithology
National Category
Probability Theory and Statistics
Research subject
Mathematical Statistics
Identifiers
urn:nbn:se:su:diva-203752 (URN)10.1093/jrsssc/qlad036 (DOI)001019502200001 ()2-s2.0-85178275269 (Scopus ID)
Available from: 2022-04-21 Created: 2022-04-21 Last updated: 2024-10-16Bibliographically approved
2. Classification under partial reject options
Open this publication in new window or tab >>Classification under partial reject options
2024 (English)In: Journal of Classification, ISSN 0176-4268, E-ISSN 1432-1343, Vol. 41, no 1, p. 2-37Article in journal (Refereed) Published
Abstract [en]

In many applications there is ambiguity about which (if any) of a finite number N of hypotheses that best fits an observation. It is of interest then to possibly output a whole set of categories, that is, a scenario where the size of the classified set of categories ranges from 0 to N. Empty sets correspond to an outlier, sets of size 1 represent a firm decision that singles out one hypothesis, sets of size N correspond to a rejection to classify, whereas sets of sizes 2,…,N−1 represent a partial rejection to classify, where some hypotheses are excluded from further analysis. In this paper, we review and unify several proposed methods of Bayesian set-valued classification, where the objective is to find the optimal Bayesian classifier that maximizes the expected reward. We study a large class of reward functions with rewards for sets that include the true category, whereas additive or multiplicative penalties are incurred for sets depending on their size. For models with one homogeneous block of hypotheses, we provide general expressions for the accompanying Bayesian classifier, several of which extend previous results in the literature. Then, we derive novel results for the more general setting when hypotheses are partitioned into blocks, where ambiguity within and between blocks are of different severity. We also discuss how well-known methods of classification, such as conformal prediction, indifference zones, and hierarchical classification, fit into our framework. Finally, set-valued classification is illustrated using an ornithological data set, with taxa partitioned into blocks and parameters estimated using MCMC. The associated reward function’s tuning parameters are chosen through cross-validation.

Keywords
Blockwise cross-validation, Bayesian classification, Conformal prediction · Classes of hypotheses, Indifference zones, Markov Chain Monte Carlo, Reward functions with set-valued inputs, Set-valued classifiers
National Category
Probability Theory and Statistics
Research subject
Mathematical Statistics
Identifiers
urn:nbn:se:su:diva-203754 (URN)10.1007/s00357-023-09455-x (DOI)001113203500001 ()2-s2.0-85178310510 (Scopus ID)
Note

 J Classif 41, 38 (2024). DOI: 10.1007/s00357-023-09459-7

Available from: 2022-04-21 Created: 2022-04-21 Last updated: 2024-10-21Bibliographically approved
3. On the use of L-functionals in regression models
Open this publication in new window or tab >>On the use of L-functionals in regression models
2023 (English)In: Open Mathematics, ISSN 2391-5455, Vol. 21, no 1, article id 20220597Article, review/survey (Refereed) Published
Abstract [en]

In this article, we survey and unify a large class or L -functionals of the conditional distribution of the response variable in regression models. This includes robust measures of location, scale, skewness, and heavytailedness of the response, conditionally on covariates. We generalize the concepts of L -moments (G. Sillito, Derivation of approximants to the inverse distribution function of a continuous univariate population from the order statistics of a sample, Biometrika 56 (1969), no. 3, 641–650.), L -skewness, and L -kurtosis (J. R. M. Hosking, L-moments: analysis and estimation of distributions using linear combinations or order statistics, J. R. Stat. Soc. Ser. B Stat. Methodol. 52 (1990), no. 1, 105–124.) and introduce order numbers for a large class of L -functionals through orthogonal series expansions of quantile functions. In particular, we motivate why location, scale, skewness, and heavytailedness have order numbers 1, 2, (3,2), and (4,2), respectively, and describe how a family of L -functionals, with different order numbers, is constructed from Legendre, Hermite, Laguerre, or other types of polynomials. Our framework is applied to models where the relationship between quantiles of the response and the covariates follows a transformed linear model, with a link function that determines the appropriate class of L -functionals. In this setting, the distribution of the response is treated parametrically or nonparametrically, and the response variable is either censored/truncated or not. We also provide a framework for asymptotic theory of estimates of L -functionals and illustrate our approach by analyzing the arrival time distribution of migrating birds. In this context, a novel version of the coefficient of determination is introduced, which makes use of the abovementioned orthogonal series expansion.

Keywords
bird phenology, coefficient of determination, L-functionals, L-statistics, order numbers, orthogonal series expansion, quantile function, quantile regression
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:su:diva-203755 (URN)10.1515/math-2022-0597 (DOI)001053084400001 ()2-s2.0-85170428452 (Scopus ID)
Available from: 2022-04-21 Created: 2022-04-21 Last updated: 2023-09-21Bibliographically approved
4. A comparison between quantile regression and linear regression on empirical quantiles for phenological analysis in migratory response to climate change
Open this publication in new window or tab >>A comparison between quantile regression and linear regression on empirical quantiles for phenological analysis in migratory response to climate change
(English)Manuscript (preprint) (Other (popular science, discussion, etc.))
Abstract [en]

It is well established that migratory birds in general have advanced their arrival times in spring, and in this paper we investigate potential ways of enhancing the level of detail in future phenological analyses. We perform single as well as multiple species analyses, using linear models on empirical quantiles, non-parametric quantile regression and likelihood-based parametric quantile regression with asymmetric Laplace distributed error terms. We conclude that non-parametric quantile regression appears most suited for single as well as multiple species analyses.

Keywords
Phenology, quantile regression, mixed effects, arrival times, linear regression, bird observatory
National Category
Probability Theory and Statistics
Research subject
Mathematical Statistics
Identifiers
urn:nbn:se:su:diva-203757 (URN)arXiv.2202.02206 (DOI)
Available from: 2022-04-21 Created: 2022-04-21 Last updated: 2022-04-21

Open Access in DiVA

Statistical Methods for Taxon Classification and Bird Migration Phenology(1368 kB)389 downloads
File information
File name FULLTEXT01.pdfFile size 1368 kBChecksum SHA-512
58d84695db39f2e1924ec53a91273effa26101a9c340f13ed20f44e7e566de21e5c3817810ed8065b89cd0e33dcfe37d98606324129335ac30ccef081c29c086
Type fulltextMimetype application/pdf

Authority records

Karlsson, Måns

Search in DiVA

By author/editor
Karlsson, Måns
By organisation
Department of Mathematics
Probability Theory and StatisticsEcology

Search outside of DiVA

GoogleGoogle Scholar
Total: 389 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 792 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf