Open this publication in new window or tab >>2024 (English)In: Journal of Classification, ISSN 0176-4268, E-ISSN 1432-1343, Vol. 41, no 1, p. 2-37Article in journal (Refereed) Published
Abstract [en]
In many applications there is ambiguity about which (if any) of a finite number N of hypotheses that best fits an observation. It is of interest then to possibly output a whole set of categories, that is, a scenario where the size of the classified set of categories ranges from 0 to N. Empty sets correspond to an outlier, sets of size 1 represent a firm decision that singles out one hypothesis, sets of size N correspond to a rejection to classify, whereas sets of sizes 2,…,N−1 represent a partial rejection to classify, where some hypotheses are excluded from further analysis. In this paper, we review and unify several proposed methods of Bayesian set-valued classification, where the objective is to find the optimal Bayesian classifier that maximizes the expected reward. We study a large class of reward functions with rewards for sets that include the true category, whereas additive or multiplicative penalties are incurred for sets depending on their size. For models with one homogeneous block of hypotheses, we provide general expressions for the accompanying Bayesian classifier, several of which extend previous results in the literature. Then, we derive novel results for the more general setting when hypotheses are partitioned into blocks, where ambiguity within and between blocks are of different severity. We also discuss how well-known methods of classification, such as conformal prediction, indifference zones, and hierarchical classification, fit into our framework. Finally, set-valued classification is illustrated using an ornithological data set, with taxa partitioned into blocks and parameters estimated using MCMC. The associated reward function’s tuning parameters are chosen through cross-validation.
Keywords
Blockwise cross-validation, Bayesian classification, Conformal prediction · Classes of hypotheses, Indifference zones, Markov Chain Monte Carlo, Reward functions with set-valued inputs, Set-valued classifiers
National Category
Probability Theory and Statistics
Research subject
Mathematical Statistics
Identifiers
urn:nbn:se:su:diva-203754 (URN)10.1007/s00357-023-09455-x (DOI)001113203500001 ()2-s2.0-85178310510 (Scopus ID)
Note
J Classif 41, 38 (2024). DOI: 10.1007/s00357-023-09459-7
2022-04-212022-04-212024-10-21Bibliographically approved