Change search
Link to record
Permanent link

Direct link
Publications (10 of 76) Show all publications
Hössjer, O., Laikre, L. & Ryman, N. (2023). Assessment of the Global Variance Effective Size of Subdivided Populations, and Its Relation to Other Effective Sizes. Acta Biotheoretica, 71(3), Article ID 19.
Open this publication in new window or tab >>Assessment of the Global Variance Effective Size of Subdivided Populations, and Its Relation to Other Effective Sizes
2023 (English)In: Acta Biotheoretica, ISSN 0001-5342, E-ISSN 1572-8358, Vol. 71, no 3, article id 19Article in journal (Refereed) Published
Abstract [en]

The variance effective population size (N-eV) is frequently used to quantify the expected rate at which a population's allele frequencies change over time. The purpose of this paper is to find expressions for the global N-eV of a spatially structured population that are of interest for conservation of species. Since N-eV depends on allele frequency change, we start by dividing the cause of allele frequency change into genetic drift within subpopulations (I) and a second component mainly due to migration between subpopulations (II). We investigate in detail how these two components depend on the way in which subpopulations are weighted as well as their dependence on parameters of the model such a migration rates, and local effective and census sizes. It is shown that under certain conditions the impact of II is eliminated, and N-eV of the metapopulation is maximized, when subpopulations are weighted proportionally to their long term reproductive contributions. This maximal N-eV is the sought for global effective size, since it approximates the gene diversity effective size N-eGD, a quantifier of the rate of loss of genetic diversity that is relevant for conservation of species and populations. We also propose two novel versions of N-eV, one of which (the backward version of N-eV) is most stable, exists for most populations, and is closer to N-eGD than the classical notion of N-eV. Expressions for the optimal length of the time interval for measuring genetic change are developed, that make it possible to estimate any version of N-eV with maximal accuracy.

Keywords
Genetic diversity, Length of time interval, Matrix analytic recursions, Metapopulation, Migration-drift equilibrium, Perturbation theory of matrices, Variance effective size
National Category
Evolutionary Biology
Identifiers
urn:nbn:se:su:diva-221119 (URN)10.1007/s10441-023-09470-w (DOI)001032489500001 ()37458852 (PubMedID)2-s2.0-85158004417 (Scopus ID)
Available from: 2023-09-19 Created: 2023-09-19 Last updated: 2023-09-19Bibliographically approved
Karlsson, M. & Hössjer, O. (2023). Classification Under Partial Reject Options. Journal of Classification, Article ID s00357-023-09455-x.
Open this publication in new window or tab >>Classification Under Partial Reject Options
2023 (English)In: Journal of Classification, ISSN 0176-4268, E-ISSN 1432-1343, article id s00357-023-09455-xArticle in journal (Refereed) Epub ahead of print
Abstract [en]

In many applications there is ambiguity about which (if any) of a finite number N of hypotheses that best fits an observation. It is of interest then to possibly output awhole set of categories, that is, a scenario where the size of the classified set of categories ranges from 0 to N. Empty sets correspond to an outlier, sets of size 1 represent a firm decision that singles out one hypothesis, sets of size N correspond to a rejection to classify, whereas sets of sizes 2,..., N - 1 represent a partial rejection to classify, where some hypotheses are excluded from further analysis. In this paper, we review and unify several proposed methods of Bayesian set-valued classification, where the objective is to find the optimal Bayesian classifier that maximizes the expected reward. We study a large class of reward functions with rewards for sets that include the true category, whereas additive or multiplicative penalties are incurred for sets depending on their size. For models with one homogeneous block of hypotheses, we provide general expressions for the accompanying Bayesian classifier, several of which extend previous results in the literature. Then, we derive novel results for the more general setting when hypotheses are partitioned into blocks, where ambiguity within and between blocks are of different severity. We also discuss how well-known methods of classification, such as conformal prediction, indifference zones, and hierarchical classification, fit into our framework. Finally, set-valued classification is illustrated using an ornithological data set, with taxa partitioned into blocks and parameters estimated using MCMC. The associated reward function's tuning parameters are chosen through cross-validation.

Keywords
Blockwise cross-validation, Bayesian classification, Conformal prediction, Classes of hypotheses, Indifference zones, Markov Chain Monte Carlo, Reward functions with set-valued inputs, Set-valued classifiers
National Category
Mathematics Psychology
Identifiers
urn:nbn:se:su:diva-225421 (URN)10.1007/s00357-023-09455-x (DOI)001113203500001 ()
Available from: 2024-01-16 Created: 2024-01-16 Last updated: 2024-01-16
Zhou, L., Díaz-Pachón, D. A., Zhao, C., Rao, J. S. & Hössjer, O. (2023). Correcting prevalence estimation for biased sampling with testing errors. Statistics in Medicine, 42(26), 4713-4737
Open this publication in new window or tab >>Correcting prevalence estimation for biased sampling with testing errors
Show others...
2023 (English)In: Statistics in Medicine, ISSN 0277-6715, E-ISSN 1097-0258, Vol. 42, no 26, p. 4713-4737Article in journal (Refereed) Published
Abstract [en]

Sampling for prevalence estimation of infection is subject to bias by both oversampling of symptomatic individuals and error-prone tests. This results in naïve estimators of prevalence (ie, proportion of observed infected individuals in the sample) that can be very far from the true proportion of infected. In this work, we present a method of prevalence estimation that reduces both the effect of bias due to testing errors and oversampling of symptomatic individuals, eliminating it altogether in some scenarios. Moreover, this procedure considers stratified errors in which tests have different error rate profiles for symptomatic and asymptomatic individuals. This results in easily implementable algorithms, for which code is provided, that produce better prevalence estimates than other methods (in terms of reducing and/or removing bias), as demonstrated by formal results, simulations, and on COVID-19 data from the Israeli Ministry of Health.

Keywords
active information, bias correction, COVID-19, maximum entropy, prevalence, sampling, sampling bias, testing errors
National Category
Probability Theory and Statistics Public Health, Global Health, Social Medicine and Epidemiology
Identifiers
urn:nbn:se:su:diva-225644 (URN)10.1002/sim.9885 (DOI)001122028600001 ()37655557 (PubMedID)2-s2.0-85169446081 (Scopus ID)
Available from: 2024-01-31 Created: 2024-01-31 Last updated: 2024-02-22Bibliographically approved
Kurland, S., Ryman, N., Hössjer, O. & Laikre, L. (2023). Effects of subpopulation extinction on effective size (Ne) of metapopulations. Conservation Genetics, 24(4), 417-433
Open this publication in new window or tab >>Effects of subpopulation extinction on effective size (Ne) of metapopulations
2023 (English)In: Conservation Genetics, ISSN 1566-0621, E-ISSN 1572-9737, Vol. 24, no 4, p. 417-433Article in journal (Refereed) Published
Abstract [en]

Population extinction is ubiquitous in all taxa. Such extirpations can reduce intraspecific diversity, but the extent to which genetic diversity of surviving populations are affected remains largely unclear. A key concept in this context is the effective population size (Ne), which quantifies the rate at which genetic diversity within populations is lost. Ne was developed for single, isolated populations while many natural populations are instead connected to other populations via gene flow. Recent analytical approaches and software permit modelling of Ne of interconnected populations (metapopulations). Here, we apply such tools to investigate how extinction of subpopulations affects Ne of the metapopulation (NeMeta) and of separate surviving subpopulations (NeRx) under different rates and patterns of genetic exchange between subpopulations. We assess extinction effects before and at migration-drift equilibrium. We find that the effect of extinction on NeMeta increases with reduced connectivity, suggesting that stepping stone models of migration are more impacted than island-migration models when the same number of subpopulations are lost. Furthermore, in stepping stone models, after extinction and before a new equilibrium has been reached, NeRx can vary drastically among surviving subpopulations and depends on their initial spatial position relative to extinct ones. Our results demonstrate that extinctions can have far more complex effects on the retention of intraspecific diversity than typically recognized. Metapopulation dynamics need heightened consideration in sustainable management and conservation, e.g., in monitoring genetic diversity, and are relevant to a wide range of species in the ongoing extinction crisis. 

Keywords
Inbreeding effective population size, Eigenvalue effective size, Realized effective size, Substructured populations, Conservation genetics
National Category
Genetics Ecology
Identifiers
urn:nbn:se:su:diva-216315 (URN)10.1007/s10592-023-01510-9 (DOI)000953077900002 ()2-s2.0-85150289396 (Scopus ID)
Available from: 2023-04-12 Created: 2023-04-12 Last updated: 2023-10-04Bibliographically approved
Thorvaldsen, S. & Hössjer, O. (2023). Estimating the information content of genetic sequence data. The Journal of the Royal Statistical Society, Series C: Applied Statistics, 72(5), 1310-1338
Open this publication in new window or tab >>Estimating the information content of genetic sequence data
2023 (English)In: The Journal of the Royal Statistical Society, Series C: Applied Statistics, ISSN 0035-9254, E-ISSN 1467-9876, Vol. 72, no 5, p. 1310-1338Article in journal (Refereed) Published
Abstract [en]

A prominent problem in analysing genetic information has been a lack of mathematical frameworks for doing so. This article offers some new statistical methods to model and analyse information content in proteins, protein families, and their sequences. We discuss how to understand the qualitative aspects of genetic information, how to estimate the quantitative aspects of it, and implement a statistical model where the qualitative genetic function is represented jointly with its probabilistic metric of self-information. The functional information of protein families in the Cath and Pfam databases are estimated using a method inspired by rejection sampling. Scientific work may place these components of information as one of the fundamental aspects of molecular biology.

Keywords
functional information, mutual information, rejection sampling, self-information
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:su:diva-221410 (URN)10.1093/jrsssc/qlad062 (DOI)001032671000001 ()2-s2.0-85183119634 (Scopus ID)
Available from: 2023-09-20 Created: 2023-09-20 Last updated: 2024-03-04Bibliographically approved
Karlsson, M. & Hössjer, O. (2023). Identification of taxon through classification with partial reject options. The Journal of the Royal Statistical Society, Series C: Applied Statistics, 72(4), 937-975
Open this publication in new window or tab >>Identification of taxon through classification with partial reject options
2023 (English)In: The Journal of the Royal Statistical Society, Series C: Applied Statistics, ISSN 0035-9254, E-ISSN 1467-9876, Vol. 72, no 4, p. 937-975Article in journal (Refereed) Published
Abstract [en]

Identification of taxa can significantly be assisted by statistical classification based on trait measurements either individually or by phylogenetic (clustering) methods. In this article, we present a general Bayesian approach for classifying species individually based on measurements of a mixture of continuous and ordinal traits, and any type of covariates. The trait vector is derived from a latent variable with a multivariate Gaussian distribution. Decision rules based on supervised learning are presented that estimate model parameters through blocked Gibbs sampling. These decision regions allow for uncertainty (partial rejection), so that not necessarily one specific category (taxon) is output when new subjects are classified, but rather a set of categories including the most probable taxa. This type of discriminant analysis employs reward functions with a set-valued input argument, so that an optimal Bayes classifier can be defined. We also present a way of safeguarding against outlying new observations, using an analogue of a p-value within our Bayesian setting. We refer to our Bayesian set-valued classifier as the Karlsson–Hössjer method, and it is illustrated on an original ornithological data set of birds. We also incorporate model selection through cross-validation, exemplified on another original data set of birds. 

Keywords
Bayesian classification, classification with covariates, partial observations, set-valued classifiers, species identification, statistical ornithology
National Category
Probability Theory and Statistics
Research subject
Mathematical Statistics
Identifiers
urn:nbn:se:su:diva-203752 (URN)10.1093/jrsssc/qlad036 (DOI)001019502200001 ()
Available from: 2022-04-21 Created: 2022-04-21 Last updated: 2023-12-19Bibliographically approved
Hössjer, O. & Karlsson, M. (2023). On the use of L-functionals in regression models. Open Mathematics, 21(1), Article ID 20220597.
Open this publication in new window or tab >>On the use of L-functionals in regression models
2023 (English)In: Open Mathematics, ISSN 2391-5455, Vol. 21, no 1, article id 20220597Article, review/survey (Refereed) Published
Abstract [en]

In this article, we survey and unify a large class or L -functionals of the conditional distribution of the response variable in regression models. This includes robust measures of location, scale, skewness, and heavytailedness of the response, conditionally on covariates. We generalize the concepts of L -moments (G. Sillito, Derivation of approximants to the inverse distribution function of a continuous univariate population from the order statistics of a sample, Biometrika 56 (1969), no. 3, 641–650.), L -skewness, and L -kurtosis (J. R. M. Hosking, L-moments: analysis and estimation of distributions using linear combinations or order statistics, J. R. Stat. Soc. Ser. B Stat. Methodol. 52 (1990), no. 1, 105–124.) and introduce order numbers for a large class of L -functionals through orthogonal series expansions of quantile functions. In particular, we motivate why location, scale, skewness, and heavytailedness have order numbers 1, 2, (3,2), and (4,2), respectively, and describe how a family of L -functionals, with different order numbers, is constructed from Legendre, Hermite, Laguerre, or other types of polynomials. Our framework is applied to models where the relationship between quantiles of the response and the covariates follows a transformed linear model, with a link function that determines the appropriate class of L -functionals. In this setting, the distribution of the response is treated parametrically or nonparametrically, and the response variable is either censored/truncated or not. We also provide a framework for asymptotic theory of estimates of L -functionals and illustrate our approach by analyzing the arrival time distribution of migrating birds. In this context, a novel version of the coefficient of determination is introduced, which makes use of the abovementioned orthogonal series expansion.

Keywords
bird phenology, coefficient of determination, L-functionals, L-statistics, order numbers, orthogonal series expansion, quantile function, quantile regression
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:su:diva-203755 (URN)10.1515/math-2022-0597 (DOI)001053084400001 ()2-s2.0-85170428452 (Scopus ID)
Available from: 2022-04-21 Created: 2022-04-21 Last updated: 2023-09-21Bibliographically approved
Díaz-Pachón, D. A., Hössjer, O. & Marks II, R. J. (2023). Sometimes Size Does Not Matter. Foundations of physics, 53(1), Article ID 1.
Open this publication in new window or tab >>Sometimes Size Does Not Matter
2023 (English)In: Foundations of physics, ISSN 0015-9018, E-ISSN 1572-9516, Vol. 53, no 1, article id 1Article in journal (Refereed) Published
Abstract [en]

Recently Díaz, Hössjer and Marks (DHM) presented a Bayesian framework to measure cosmological tuning (either fine or coarse) that uses maximum entropy (maxent) distributions on unbounded sample spaces as priors for the parameters of the physical models (https://doi.org/10.1088/1475-7516/2021/07/020). The DHM framework stands in contrast to previous attempts to measure tuning that rely on a uniform prior assumption. However, since the parameters of the models often take values in spaces of infinite size, the uniformity assumption is unwarranted. This is known as the normalization problem. In this paper we explain why and how the DHM framework not only evades the normalization problem but also circumvents other objections to the tuning measurement like the so called weak anthropic principle, the selection of a single maxent distribution and, importantly, the lack of invariance of maxent distributions with respect to data transformations. We also propose to treat fine-tuning as an emergence problem to avoid infinite loops in the prior distribution of hyperparameters (common to all Bayesian analysis), and explain that previous attempts to measure tuning using uniform priors are particular cases of the DHM framework. Finally, we prove a theorem, explaining when tuning is fine or coarse for different families of distributions. The theorem is summarized in a table for ease of reference, and the tuning of three physical parameters is analyzed using the conclusions of the theorem.

Keywords
Bayesian statistics, Constants of nature, Emergence, Fine-tuning, Fundamental constants, Infinites, Maximum entropy, Standard models, Weak anthropic principle
National Category
Mathematics
Identifiers
urn:nbn:se:su:diva-213533 (URN)10.1007/s10701-022-00650-1 (DOI)000887803700001 ()2-s2.0-85142246341 (Scopus ID)
Available from: 2023-01-09 Created: 2023-01-09 Last updated: 2023-01-09Bibliographically approved
Hössjer, O., Díaz-Pachón, D. A. & Rao, J. S. (2022). A Formal Framework for Knowledge Acquisition: Going beyond Machine Learning. Entropy, 24(10), Article ID 1469.
Open this publication in new window or tab >>A Formal Framework for Knowledge Acquisition: Going beyond Machine Learning
2022 (English)In: Entropy, E-ISSN 1099-4300, Vol. 24, no 10, article id 1469Article in journal (Refereed) Published
Abstract [en]

Philosophers frequently define knowledge as justified, true belief. We built a mathematical framework that makes it possible to define learning (increasing number of true beliefs) and knowledge of an agent in precise ways, by phrasing belief in terms of epistemic probabilities, defined from Bayes’ rule. The degree of true belief is quantified by means of active information I+: a comparison between the degree of belief of the agent and a completely ignorant person. Learning has occurred when either the agent’s strength of belief in a true proposition has increased in comparison with the ignorant person (I+>0), or the strength of belief in a false proposition has decreased (I+<0). Knowledge additionally requires that learning occurs for the right reason, and in this context we introduce a framework of parallel worlds that correspond to parameters of a statistical model. This makes it possible to interpret learning as a hypothesis test for such a model, whereas knowledge acquisition additionally requires estimation of a true world parameter. Our framework of learning and knowledge acquisition is a hybrid between frequentism and Bayesianism. It can be generalized to a sequential setting, where information and data are updated over time. The theory is illustrated using examples of coin tossing, historical and future events, replication of studies, and causal inference. It can also be used to pinpoint shortcomings of machine learning, where typically learning rather than knowledge acquisition is in focus.

Keywords
active information, Bayes' rule, counterfactuals, epistemic probability, learning, justified true belief, knowledge acquisition, replication studies
National Category
Computer and Information Sciences Mathematics Philosophy, Ethics and Religion
Identifiers
urn:nbn:se:su:diva-211039 (URN)10.3390/e24101469 (DOI)000872645400001 ()2-s2.0-85140609692 (Scopus ID)
Available from: 2022-11-09 Created: 2022-11-09 Last updated: 2023-03-28Bibliographically approved
Diaz-Pachón, D. A. & Hössjer, O. (2022). Assessing, Testing and Estimating the Amount of Fine-Tuning by Means of Active Information. Entropy, 24(10), Article ID 1323.
Open this publication in new window or tab >>Assessing, Testing and Estimating the Amount of Fine-Tuning by Means of Active Information
2022 (English)In: Entropy, E-ISSN 1099-4300, Vol. 24, no 10, article id 1323Article in journal (Refereed) Published
Abstract [en]

A general framework is introduced to estimate how much external information has been infused into a search algorithm, the so-called active information. This is rephrased as a test of fine-tuning, where tuning corresponds to the amount of pre-specified knowledge that the algorithm makes use of in order to reach a certain target. A function f quantifies specificity for each possible outcome x of a search, so that the target of the algorithm is a set of highly specified states, whereas fine-tuning occurs if it is much more likely for the algorithm to reach the target as intended than by chance. The distribution of a random outcome X of the algorithm involves a parameter θ that quantifies how much background information has been infused. A simple choice of this parameter is to use θf in order to exponentially tilt the distribution of the outcome of the search algorithm under the null distribution of no tuning, so that an exponential family of distributions is obtained. Such algorithms are obtained by iterating a Metropolis–Hastings type of Markov chain, which makes it possible to compute their active information under the equilibrium and non-equilibrium of the Markov chain, with or without stopping when the targeted set of fine-tuned states has been reached. Other choices of tuning parameters θ are discussed as well. Nonparametric and parametric estimators of active information and tests of fine-tuning are developed when repeated and independent outcomes of the algorithm are available. The theory is illustrated with examples from cosmology, student learning, reinforcement learning, a Moran type model of population genetics, and evolutionary programming.

Keywords
active information, exponential tilting, fine-tuning, functional information, large deviations, Markov chains, Metropolis-Hastings, Moran model, statistical estimation and testing
National Category
Mathematics
Identifiers
urn:nbn:se:su:diva-211098 (URN)10.3390/e24101323 (DOI)000872415400001 ()2-s2.0-85140643541 (Scopus ID)
Available from: 2022-11-09 Created: 2022-11-09 Last updated: 2023-03-28Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-2767-8818

Search in DiVA

Show all publications